How to ID a record

R

Ray

Hi,
I am wondering about proper or accepted practise for ID'ing records in a relational database.
My question is this...

If I have a table with a unique feature should I still use an ID column?
Example 1: (Note: SSN = Social Security Number - assumed unique to every person)
My database has a table called Employees, it has fields called "id" "fname" "lname" "SSN" - do I need the "id" field or will the "SSN" field do for ID'ing the record?

Example 2:
My database has a table called Cars, it has fields called "id" "make" "model" "registration" - do I need the "id" field or will the "registration" field do for ID'ing the record?

Now is it sensible to refer to the two tables in a master table like...
Example 3:
My database has a table called CarLog, it has fields called "id" "employee_SSN" "car_registration" "date"
And again do I need the "id" field or should I index off the date+SSN+registration?


I am thinking that in these circumstances can I delete the ID field each time to make the database smaller? In a lot of cases I can find uniqueness in the record without an ID field, is this what I should be looking to do each time?

Any comments appreciated.

Ray,
Dublin, Ireland.

--


---------------------------------------------------------------------
"Are you still wasting your time with spam?...
There is a solution!"

Protected by GIANT Company's Spam Inspector
The most powerful anti-spam software available.
http://mail.spaminspector.com
 
K

Ken Snell [MVP]

Your questions go to the heart of a "natural" primary keys and "surrogate"
primary keys.

Natural keys are composed of one or more fields in the table that contain
data and that, in combination if more than one, uniquely identify that
record.

Surrogate keys are usually autonumber or other ID type fields.

Which you use depends upon your preferences and what works best for you and
who "mentored" you! I personally use both in my databases, depending upon
how many fields it takes to compose the "natural" key and what I'm doing
with the key field(s).

Google search on these newsgroups will show numerous threads that have
debated this issue. I don't intend to start another one here......
< g >

--

Ken Snell
<MS ACCESS MVP>


Hi,
I am wondering about proper or accepted practise for ID'ing records in a
relational database.
My question is this...

If I have a table with a unique feature should I still use an ID column?
Example 1: (Note: SSN = Social Security Number - assumed unique to every
person)
My database has a table called Employees, it has fields called "id" "fname"
"lname" "SSN" - do I need the "id" field or will the "SSN" field do for
ID'ing the record?

Example 2:
My database has a table called Cars, it has fields called "id" "make"
"model" "registration" - do I need the "id" field or will the "registration"
field do for ID'ing the record?

Now is it sensible to refer to the two tables in a master table like...
Example 3:
My database has a table called CarLog, it has fields called "id"
"employee_SSN" "car_registration" "date"
And again do I need the "id" field or should I index off the
date+SSN+registration?


I am thinking that in these circumstances can I delete the ID field each
time to make the database smaller? In a lot of cases I can find uniqueness
in the record without an ID field, is this what I should be looking to do
each time?

Any comments appreciated.

Ray,
Dublin, Ireland.

--


---------------------------------------------------------------------
"Are you still wasting your time with spam?...
There is a solution!"

Protected by GIANT Company's Spam Inspector
The most powerful anti-spam software available.
http://mail.spaminspector.com
 
J

Jeff Boyce

Ken's response points to the two sides of your issue. I'll raise another...

Are you certain that every Employee has an SSN?

Are you certain that there are no duplicate SSNs? (same SSN assigned to
more than one person)

Are you certain that every SSN you use is actually assigned to the Employee
with whom you have it associated?

If you can't answer "Yes, without a question, and now and for ever more" to
all three of these questions, don't use SSN as a primary key.
 
J

John Nurick

Hi Ray,

1: Think of it this way:

Is it conceivable that this database will ever have to handle a person
without a SSN (a visitor from abroad, perhaps)?
Given the nature and functions of the database, is it a big deal if
someone provides a false or incorrect SSN?
If a mistake is made when entering a SSN, would it be hard to correct
when discovered months or years later?

If the answer to any of these is yes, it's probably better not to use
SSN as the primary key.

2: Again, it's a matter of the practicalities of the real-world domain
that your database is modelling and the functions it must serve. In many
jurisdictions registration numbers can be re-used or transferred from
vehicle to vehicle. It's also possible for two vehicles of the same make
and model to have the same registration number but in different
jurisdictions: does the database need to handle cars from other
countries or states? And James Bond isn't the only person to have used
false license plates.

3: if the CarLog table stores which employee had which car on which day,
then the fields it needs are EmployeeID (or SSN if that's what you
decide to use as the primary key for the employees table), CarID (or
registration if that's what you decide to use), and the date, all three
with a unique index. You don't need a separate "ID" field.

By the way, don't call fields things like "Date", "Time", "Name": these
are also the names of common properties and/or functions, and the
resulting ambiguity can cause problems.

Hi,
I am wondering about proper or accepted practise for ID'ing records in a relational database.
My question is this...

If I have a table with a unique feature should I still use an ID column?
Example 1: (Note: SSN = Social Security Number - assumed unique to every person)
My database has a table called Employees, it has fields called "id" "fname" "lname" "SSN"
- do I need the "id" field or will the "SSN" field do for ID'ing the
record?
Example 2:
My database has a table called Cars, it has fields called "id" "make" "model"
"registration" - do I need the "id" field or will the "registration"
field do for ID'ing the record?
 
R

Ray

Yes a search on Google shows that this is a great debate.
It's my first time approaching the subject from a moment of realization
which grew from staring at the Relationships window for hours on end trying
to normalise a database. Now that I've read a few dozen pages on the
subject I'm split on what to do... be a data miser and loose the surrogate
for the sake of saving disk space... or take the surrogate as a concrete way
of identifcation.

Well at least I learned 1 thing... I never knew it was called a "surrogate"
key!

Thanks.

Ray
:)
 
R

Ray

Thank you for your comments John, Jeff and Ken.

I'm voting for the surrogate and I think this seals it for me in no
particular order...

1. The surrogate is a concrete form of ID'ing records.
I don't have to stress about the maybe's... will a vehicle being re-regged,
or trying to alter a primary key later and all the other probabilities.

2. The surrogate is tried and tested everytime.
I don't have to consider every field or field groups as a candidate for
uniqueness and then go through all the what if's?

3. Coding convention.
If I see employess.id or vehicle.id I know what it is everytime without
consideration or second guessing.

4. It's simple to use!


Maybe coding convention is my biggest pull out of these 4.

I'm glad I raised the question and I'm grateful as always for you guys
giving me a help :)

Ray.
 
T

Tim Ferguson

PMFJI but..
1. The surrogate is a concrete form of ID'ing records.
I don't have to stress about the maybe's... will a vehicle being
re-regged, or trying to alter a primary key later and all the other
probabilities.

.... but surrogate keys do not prevent you from creating genuine duplicate
records. By not setting a key on a vehicle registration, for example, you
can have two autonumbers attached to the same car, and then you'll have a
lot of work to merge all the related records.
2. The surrogate is tried and tested everytime.
I don't have to consider every field or field groups as a candidate
for uniqueness and then go through all the what if's?

It has been tried and found wanting previously. There have been bugs in the
implementation of autonumbers before now, and there is no guarantee that
the Access team won't reintroduce others in future versions.

And you _do_ have to consider all the what-ifs anyway because it's part of
the process of systems analysis. You just cannot crash into database design
without knowing _exactly_ what values are going to be unique and what is
going to have to be able to change. You get to learn about this as entity
life history, domain validation, and so on. No short cuts over that, I'm
afraid!
3. Coding convention.
If I see employess.id or vehicle.id I know what it is everytime
without consideration or second guessing.

There's something that comes before Coding Convention, and that is
Documentation! If you are planning to program a database using no clues
other than what a field name looks like it ought to mean, you're in for a
big wake-up.
4. It's simple to use!

Oh no it isn't. Consider the following table:-

PartLocation
============
*ID ' PK autonumber
CatalogNumber ' FK references Parts
Shelf ' FK references Shelves
NumberInStock

and the following one:-

PartLocation
============
*CatalogNumber ' FK references Parts
*Shelf
*Warehouse
*Region
*HoldingCompany
NumberInStock

FK (Shelf, Warehouse, Region, HoldingCompany) references Shelves


Say you want to find out how many Green Widgets are being held by comany
ABYZ. In the first example, you'd need a four-table join; in the second one
a simple SELECT with no joins at all. My grandmother could write the SQL
for that one. So the Shelves table has a four-field PK -- so what? The db
engine is there to look after that so I don't have to.

My point, however, is not that surrogate keys are bad. It's that there is
no single rule that covers all situations. DB design is all about semantics
and about contexts. In some situations, a vehicle reg mark will be more
than adequate as a key; in others it would be hopelessly unstable.
Sometimes an address is just an attribute; but in some places it has to be
an entity on its own.

All I ask is that you don't rush off into the rest of the century thinking
that all tables must always have a AN pseudokey every time, without
thinking about it further. Usually the decision is obvious as soon as you
understand the thing you're modelling: occasionally it's a more balanced
choice. But it's never a blind one.

All the best


Tim F
 
F

Fred Boer

Dear Tim:

I'm curious... what kind of rates does your Grandmother charge for Access
work? I wouldn't mind having these things explained to me by a kindly,
patient granny... And would she happen to bake cookies? That would be nice..
cookies and Access... <g>

Fred Boer
 
L

Lynn Trapp

I'm voting for the surrogate and I think this seals it for me in no
particular order...

Ray,
Before you make your vote final, let me weigh in on the same side as Tim
Ferguson, with a few additional thoughts.
1. The surrogate is a concrete form of ID'ing records.
I don't have to stress about the maybe's... will a vehicle being re-regged,
or trying to alter a primary key later and all the other probabilities.

In reality a surrogate key doesn't really identify a record except so far as
the database engine is concerned. All will agree that a surrogate key is a
meaningless value and, while each value of the surrogate key is unique in a
table, it doesn't identify anything about the rest of the record in a real
world. As Tim mentioned, you can enter multiple duplicate records if a
surrogate key is the only thing you are relying on. Certainly, a surrogate
key is a good choice for creating relationships between tables, but it is a
very poor choice for eliminating redundant data, which is one of the main
things that a primary key is supposed to do.

2. The surrogate is tried and tested everytime.
I don't have to consider every field or field groups as a candidate for
uniqueness and then go through all the what if's?

Tim mentioned some of the implementation flaws with surrogate keys, but let
me re-emphasize some other issues. If you use a single field surrogate key
as your sole primary key and do not place a unique index on other fields in
the table, then you are bound to have duplicates. So, even if you do use a
surrogate, you still need to be concerned with a natural candidate key for
uniqueness.

3. Coding convention.
If I see employess.id or vehicle.id I know what it is everytime without
consideration or second guessing.

Actually, there are no guarantees that the names of fields will be unique,
but referencing a single field in code is one reason for using a surrogate
key, in some instances.

Finally, as with Tim I think surrogate keys are a valuable tool
(particularly for relating tables), but they should be used only when
necessary and should not be considered as a fool proof tool for database
design.
 
T

Tim Ferguson

I wouldn't mind having these things explained to me by a kindly,
patient granny...

Must be thinking about someone else's granny... <g>

All the best


Tim F
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top