FAQ? factors influencing choice of data type for primary key

C

Chris

If this topic is addressed in a FAQ or other online
resource, I'd be glad to have a pointer to it. I've
tried to find it addressed in several books and by
searching these newsgroups...

What factors, if any - performance? db size? other? -
are important in choosing the data type for a primary
key? Since one always has the option of using, for
example, an Autonumber as a primary key, are there
times when that choice is prefered even though there
is some other unique attribute in the data itself?

For example, a simple single-attribute table of allowed
categories (to be used as a lookup table for assigning
records from another table to different categories)
could use the category name itself as a primary key -
all category names will be unique. But does it make
sense to use this text field as a foreign key in
another table, or is it better to create an additional
autonumber field in the category table?

I think what I'm really wondering about is the
implementation of primary/foreign keys: if there is
only a single text attribute in a table, used as a
primary key, what gets stored in the other table where
it is a foreign key? Is it the text itself, which
could be quite sizable? Or does Access use internal
record numbers or the field indexes as the value for
the foreign key? I would expect performance and/or
storage size penalties if the actual primary key
value (text, in this example) is stored in the other
table as the foreign key.

I have this feeling I'm missing some fundamental
concepts here, and appreciate any help in putting
me on the path towards clearer understanding. Thanks
in advance,

Chris
 
A

Anne Troy

You're perfectly right, Chris. Using a text field that can be large is
worse.

I ALWAYS use an autonumber field in every table I make, and it becomes the
primary key. Of course, sometimes you need to select several fields as the
primary key.

Anyway, I'm no Access techno-geek, so I hope some others will also answer.

<-*-><-*-><-*-><-*-><-*-><-*-><-*-><-*->
Hope this helps!
Anne Troy (better known as Dreamboat)
Author: Dreamboat on Word
Email: Dreamboat*at*Piersontech.com
Web: www.TheOfficeExperts.com
<-*-><-*-><-*-><-*-><-*-><-*-><-*-><-*->
 
C

Chris

Ann, thanks for your quick reply. Since posting, I've
tried to test (y)our hunch that the actual key value
(even if a long string of text) is stored when used as
a foreign key in another table; while recognizing there
could be other reasons for this, I found that a .mdb
file containing 2 related tables is significantly larger
(and by about the right magnitude) when the primary key
is a 100-char text string vs. an Autonumber.

Between your response and my experiment, I'm inclined to
think that, at least for this type of example (with long-
ish text strings), your strategy is the right one - add
an autonumber field to use as the primary key, despite
having a unique text attribute that could in principle
be used as the primary key.

Still, I hope someone will provide a more definitive
and illuminating response to my query - one well-
explained concept will go a lot farther for all of us
than a bunch of hunches and anecdotal results.
 
B

Brendan Reynolds

This tends to be one of those 'religious war' issues with some people, who
have very strong views for or against the use of AutoNumbers.

In the past, I tended to use AutoNumbers in all tables, for consistency.
Knowing that the primary key of all my tables was a single long integer
field made it easier to write re-usable code that could be used with any
table.

Lately, however, I'm more likely to use a natural key if it is a) simple (a
single field) and b) rarely changes. The advantage is that queries require
far fewer joins. For example, in a current project, a table 'tblAccident' is
related to four other tables, tblAccidentLevel, tblAccidentLocation,
tblAccidentStatus, and tblAccidentType. Using AutoNumbers, queries look
something like this ...

SELECT tblAccidentWithAutoNumbers.AccidentNumber,
tblAccidentWithAutoNumbers.AccidentDate, tblAccidentLocation.LocationCode,
tblAccidentType.TypeCode, tblAccidentLevel.LevelCode,
tblAccidentStatus.StatusCode, tblAccidentWithAutoNumbers.AccidentCost,
tblAccidentWithAutoNumbers.AccidentNote
FROM tblAccidentStatus INNER JOIN (tblAccidentLevel INNER JOIN
(tblAccidentType INNER JOIN (tblAccidentLocation INNER JOIN
tblAccidentWithAutoNumbers ON tblAccidentLocation.LocationNumber =
tblAccidentWithAutoNumbers.LocationNumber) ON tblAccidentType.TypeNumber =
tblAccidentWithAutoNumbers.TypeNumber) ON tblAccidentLevel.LevelNumber =
tblAccidentWithAutoNumbers.LevelNumber) ON tblAccidentStatus.StatusNumber =
tblAccidentWithAutoNumbers.StatusNumber
WITH OWNERACCESS OPTION;

Using natural keys, the equivalent query can be reduced to ...

SELECT tblAccident.*
FROM tblAccident
WITH OWNERACCESS OPTION;

The primary key of tblAccident itself is still an AutoNumber, however, as
the natural key would have to include at least two fields (AccidentDate and
LocationCode) and possibly more.

In short, in my opinion both AutoNumbers and natural primary keys are useful
tools when used appropriately. What exactly constitutes appropriate use
under specific circumstances is a judgement call that becomes easier with
experience.
 
C

Chris

Brendan,

Thanks - you've pointed out another significant
aspect - the number/complexity of joins - in addition
to the performance and size issues I raised (and I
imagine the issue you raise has its own performance
implications). I'm definitely not interested in
partisan wars, just in knowing the range of factors
that figure into making a reasonable decision, on
a case-by-case if necessary. Perhaps the issues of
performance and size that I raised are not very
significant, even if real, at the scale of medium
or small systems like the one I'm building? So maybe
your considerations are more germane... Still, I'd
like to know as much as possible about the topic -
seems like a basic understanding that one should have.
Hope you or someone else might weigh in with comments
on performance, size, or additional considerations.

Chris
-----Original Message-----
This tends to be one of those 'religious war' issues with some people, who
have very strong views for or against the use of AutoNumbers.

In the past, I tended to use AutoNumbers in all tables, for consistency.
Knowing that the primary key of all my tables was a single long integer
field made it easier to write re-usable code that could be used with any
table.

Lately, however, I'm more likely to use a natural key if it is a) simple (a
single field) and b) rarely changes. The advantage is that queries require
far fewer joins. For example, in a current project, a table 'tblAccident' is
related to four other tables, tblAccidentLevel, tblAccidentLocation,
tblAccidentStatus, and tblAccidentType. Using AutoNumbers, queries look
something like this ...

SELECT tblAccidentWithAutoNumbers.AccidentNumber,
tblAccidentWithAutoNumbers.AccidentDate, tblAccidentLocation.LocationCode,
tblAccidentType.TypeCode, tblAccidentLevel.LevelCode,
tblAccidentStatus.StatusCode, tblAccidentWithAutoNumbers.AccidentCost,
tblAccidentWithAutoNumbers.AccidentNote
FROM tblAccidentStatus INNER JOIN (tblAccidentLevel INNER JOIN
(tblAccidentType INNER JOIN (tblAccidentLocation INNER JOIN
tblAccidentWithAutoNumbers ON
tblAccidentLocation.LocationNumber =
tblAccidentWithAutoNumbers.LocationNumber) ON tblAccidentType.TypeNumber =
tblAccidentWithAutoNumbers.TypeNumber) ON tblAccidentLevel.LevelNumber =
tblAccidentWithAutoNumbers.LevelNumber) ON
tblAccidentStatus.StatusNumber =
 
J

John Vinson

What factors, if any - performance? db size? other? -
are important in choosing the data type for a primary
key? Since one always has the option of using, for
example, an Autonumber as a primary key, are there
times when that choice is prefered even though there
is some other unique attribute in the data itself?

This can be a contentious topic - flamewars have resulted from this
question, though lately we've seen mostly polite disagreement rather
than attacks! said:
For example, a simple single-attribute table of allowed
categories (to be used as a lookup table for assigning
records from another table to different categories)
could use the category name itself as a primary key -
all category names will be unique. But does it make
sense to use this text field as a foreign key in
another table, or is it better to create an additional
autonumber field in the category table?

In that case I'd use the textfield... with some caveats. See below.
I think what I'm really wondering about is the
implementation of primary/foreign keys: if there is
only a single text attribute in a table, used as a
primary key, what gets stored in the other table where
it is a foreign key? Is it the text itself, which
could be quite sizable?

It is the text itself. There's nothing special about a field that's
being used as a foreign key - it's *just a field*, like any other
field; it's a FK in terms of how it's used, not in terms of the nature
of the field.
Or does Access use internal
record numbers or the field indexes as the value for
the foreign key?

No. There are no (usable) internal record numbers.
I would expect performance and/or
storage size penalties if the actual primary key
value (text, in this example) is stored in the other
table as the foreign key.

That's the main argument against them... but what you win on the
swings, you lose on the roundabouts. If you have the actual value in
the table, then only that table needs to be included in a Query; you
don't need to bring the lookup table or any indexes into memory in
order to get the value, so search performance can be better. The
downside is that the database will be bigger. Access doesn't store
trailing blanks, so the allocated size of the field isn't really
relevant; but if your categories average 16 bytes, in a million-record
table you'll use 16 MByte of foreign key values, vs. 4 MByte for
4-byte Long Integers.

In practice, if the Primary Key is *unique* - there can never be
duplicates; *stable* - it will rarely or never change; and (ideally)
short - I'd use a natural key by preference over a surrogate key such
as an autonumber. In many cases (tables of People for instance) there
is no obvious stable, short, unique key, though; and sometimes the
"natural" key might consist of five or six fields. Some folks (Tom
Ellison prominent amongst them) prefer to use such natural keys almost
exclusively and claim that there is little or no performance hit;
others disagree.
 
P

Pieter Wijnen

it is advisable to generate a Unique Index on the *REAL* primary key though
....
the biggest problem with autonumbers is importing data to master/slave
tables + datawarehouse
but you do get a more efficient database in normal use (which generally
makes up... )

Pieter
 
T

Tim Ferguson

Lately, however, I'm more likely to use a natural key if it is a)
simple (a single field) and b) rarely changes.

Close but I would comment:

a) is not necessary: take the classical "junction table" in many-to-many
relationships. A PK make up of the two FKs is typical. A further example is
the one in any text book on db design where fifth normal form is discussed,
and again the three FKs make up a typical PK.

b) make that "never changes"...


B Wishes


Tim F
 
D

david epsom dot com dot au

I tend towards (artificial) autonumber fields because over time even
'stable' natural keys tend to drift: I'm thinking of a database I have where
the natural keys were originally Act Section (law changed) Type Code (linked
database changed) Centre Code (linked database changed) Status Code
(business process changed) Result Code (still the same) and Gender (changed
from M/F to 0/1/2).

In another database where we used natural keys we had to change from 4
characters to 6 characters, because the users could see the key value: we
also had problems because they wanted to change values we had used as
primary key (we could not easily implement cascading update because of the
file structure of the database).

I use my own autonumber code instead of autonumbers because I support a
number of applications where autonumbers cause support problems.

These arguments for using artificial primary keys come mostly from the
'programming' side of the business, where we think about issues like data
hiding and code coherence. As I understand it, the arguments for using
natural primary keys come mostly from the 'database' side of the business,
where we think about internal design efficiency and design logic. In a
theoretical relational database, the problem of interface drift is less of a
problem because a true relational database is infinitely adaptable: the
relationships are just stored in the database.

(david)
 
B

Brendan Reynolds

Some years ago, I developed an app that was part of a loyalty-scheme
change-over, from paper tokens to swipe cards. The paper tokens were to be
posted in pre-printed envelopes, and the equivalent number of points to be
assigned to that person's swipe card.

I was assured that tokens would only be accepted in the pre-printed
envelopes, and that each envelope would have a unique envelope number
printed on it. I considered using that envelope number as a primary key,
but, for reasons that I don't remember now, decided against it. I added a
unique index on the envelope number, but used an AutoNumber primary key.

Because of printing errors, we soon found that there were envelopes with
duplicate envelope numbers, and envelopes with no envelope number at all.
And so many people sent in tokens in ordinary, non-pre-printed envelopes,
that the client company soon abandoned the 'pre-printed envelopes only'
policy.

I had to replace the lookup-by-envelope-number combo box with a multi-field
search form, and write some code to detect and warn about possible
duplicates, again based on comparisons of multiple fields, but other than
removing the unique index on the envelope number, I did not need to make any
changes to the structure of the database.

The experience taught me to be very sceptical when a client assures me that
a particular value will always be known, and always be unique!
 
T

Terry Kreft

You could have shortened that to

"be very sceptical when a client assures"

Or
"be very sceptical when a client"

Or even
"be very sceptical"

<g>
 
B

Brendan Reynolds

LOL! :)
--
Brendan Reynolds (MVP)

Terry Kreft said:
You could have shortened that to

"be very sceptical when a client assures"

Or
"be very sceptical when a client"

Or even
"be very sceptical"
<snip>
 
C

Chris

Thanks, John, for the additional overviews and details
last week (I've been out of town for 5 days). That's
the kind of info I was looking for to help me make the
best design decisions, especially the specifics about
Access implementation as opposed to theoretical
considerations.

Considering that this is, apparently, a recurring
question about a significant (and contentious) issue,
it seems it ought to be in a FAQ somewhere... is there
a FAQ associated with these forums to save seekers
and responders from covering the same ground over
and over again? I found it difficult to zero in
on useful messages on this topic by just searching the
forum archives.

Many thanks,

Chris
 
T

TC

An FAQ would be a good idea, IMO. The same questions are asked repeatedly in
the microsoft.public.access newsgroups.

CDMA has an FAQ, but some have argued that it is more of a netiquette guide,
than a list of frequently asked questions.

TC
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top