David said:
I'd be interested in hearing those two cases.
Since posting, I'm not sure I remember the second, but the first is
in the replication system tables, each replica has a GUID ReplicaID.
That's real data.
Any system that needs globally unique identity keys is going to be
one where a GUID is a proper PK. I naturally gravitate to
replication as the scenario where this is the case, because it's
something I deal with on a daily basis, but you could be maintaining
more than one version of a database without replication and, needing
to merge the data, use a GUID as a PK.
The GUID is created by the computer and is not part of the actual
data - it is an artificial key.
You seem to think this is some kind of indictment. It's only members
of the "natural keys at all costs" cult who think that surrogate
keys are a problem.
If you go into the real world the GUID will
not be part of the data that identifies the entity. I.e., where
is the GUID on an automobile? Where is it on a new-born baby?
Where is it on your house?
In an OLAP system you may very well need to identify multiple
versions of the same data record, so you're actually record
information about the state of the *database* at a certain time, not
about the entities represented in the data tables. Thus, using a
database construct makes perfect sense, because the purpose of the
record in the OLAP data store is *not* to represent the real-world
entity, but to represent the state of a particular version of the
database.
The GUID is like an AutoNumber it is convenient to use it in the
tables rather than the "real" keys. Though I'd not use GUIDs I
admit I do use AutoNumbers as convenient JOINing columns, but I
always have good Primary Keys that uniquely identify each entity -
otherwise the tables would be crap (like many I've had the
misfortune of fixing [orphan records; multiple duplicates, missing
data, etc.] - good job security for me, but bad for the clients).
No one is claiming that one should not have unique indexes on
natural keys that have no Nulls (and can thus *have* a meaninful
unique index). This is the straw man argument made repeatedly by
those devoted to the natural key religion, and it's just a way of
saying "I think all of you who disagree with me are so stupid you
don't understand basic database design." As a matter of fact, I
think we understand it better than those who make that strawman
argument, since we recognize the practical impossibility in so many
circumstances of being able to use a natural key (without stuffing
it with useless default values that then have to be filtered out for
display purposes).
But, please do explain to me exactly what natural columns in the
data would allow you to represent multiple database states in an
OLAP data store. And if you think that date/time fields are going to
do it, then I suggest that you haven't really thought through the
problem very carefully.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
The reason most "natural key" proponents push their agenda so strongly
is the possibility that the data tables will be separated and child
table data would not be able to be re-joined to parent tables based on
surrogate keys. The way DBs are constructed today I believe this
probability is very low, that's why I do use "surrogate" keys. I do
create "real-world" (whatever that means) DBs, and it's easier to use
surrogate keys rather than carry over multi-column PKs into related
(child) tables. My tables do have a unique index for each entity - call
it the PK or a Unique index - same thing different names.
replication system tables, each replica has a GUID ReplicaID.
That's real data.
It's maintenance data, not real-world data. I'm not stating that the
data should only have data items that are only found in the real world.
The DB is a representation of the real world data AND can also have DB
maintenance data items - like LastUpdate, UpdateUserID, GUID, etc. But,
the maintenance data items should not be used as keys to the data, but,
rather as keys to the up-keep of the DB, or for DB status information
purposes. In replication, that's the role of the GUID, it separates
two, or more, rows (records) that have the same PK (or Unique idx), the
only difference being the GUID.
One thing that bugs me about Microsoft Help articles - they recommend
using the GUID and AutoNumbers as PKs; not because it is correct, but
'cuz it is easiest. After all Access was/is marketed as an easy way to
create a DB - why go to the bother of learning all that DB theory/design
stuff! Just jump right in. Dawg save me!
DB status over time: The first thing that came to mind - DateTime data
types identify entities over time - that's the Time part of DateTime.
Is there some other dimension that can do that? Since GUIDs are random
and supposedly globally unique: Two GUIDs from different systems of
0xabcdefg12345 and 0x12345abcde - which one came first? - even if the
GUIDs are generated in order each system will have a different order
'cuz they ARE different (read about phase shifting - a good description
is given in Minimalist music theory [really]).
If you talking about changes to the DB in short periods of time - I
don't know of any data warehouse that can do an complete update in
milliseconds (the smallest fraction of time most DateTime data types
use) - since the OLAP data is usually a summary of real-world data, or
of time intervals that are greater than milliseconds (orders, sales -
i.e., human induced actions). Scientific DBs could have less-than
millisecond intervals, but that data would probably be stored in an OLTP
db not an OLAP db. And the machines that store the data would have
clocks & data types that could measure time intervals less than a
millisecond. They would still be DateTime data types that indicate the
change (status) to the DB over Time - there's that pesky Time word
again.
Non-scientific DB status can be easily maintained by just making
snapshots of the data at set times - backups, really. Easier to do now
a-days w/ the low cost of data stores.
But, please do explain to me ...
Why the raised hackles, Dave? We're just having a difference of
opinion. I don't beat up posters 'cuz they use AutoNumbers or GUIDs as
PKs (unique idx). I just recommend that they don't do that. I've even
been known to provide a solution to their problem using their designs -
even though its esthetically grating.
I like Joe Celko (a true
believer in the natural keys debate) but I don't religiously follow his
dogma.
Regards,
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **
-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv
iQA/AwUBScMWdIechKqOuFEgEQK1rACgr7w9nP7IqudKkgdgdWaKUA0xlZ8AnAyv
l9SJ9FjEEHVGOVA/3A28yt6P
=MxZq
-----END PGP SIGNATURE-----