Query with tables joined by GUID

M

mr tom

Hi,

I've got an access database which is essentially a front end for a data
warehouse.

Many of the tables in the warehouse are indexed by GUIDs, which in SQL
Server 2000 are Binary(16).

In SQL, these show up as format "0xnnnnnnnnnnnnnnnn"

In Access they show up as a string of vaguely oriental characters.

Display isn't an issue, but I need to do a query, where two tables are
joined by guid, and the query isn't working...

Our SQL chap says he can't represent the guid as text or anything like that
(I'd innocently hoped that "0xnnnnnnnnnnnnnnnn" could be turned into a text
string, but it seems not)

Now I understand Access (to an extent...) but I have no knowledge of SQL
etc, so please be gentle.

What if anything can I do?

Thanks in advance, whatever the reply.
 
M

MGFoster

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Use different columns to join to the SQL Server tables. If the SQL'r DB
is truly a data warehouse most of the columns in the "fact" table should
be indexed. Try JOINing on different columns with more usable data
types.

The representation (display) of the GUID shouldn't be important. It's
the actual stored value that is used in the JOIN and that value is a 16
byte number in both SQL server & Access. Supposedly, they are the same
data type. The problem may be that your GUIDs don't match the SQL'r
GUIDs. That's why I recommend using different columns. If the GUID
column is the only unique identifier in the table, then that table, by
definition, is not a valid representation of the data. The data has to
be uniquely identified by "outside references," which are real-world
objects or identifiers; like automobile VIN numbers or Social Security
Numbers or Postal Codes and street numbers, etc.
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBScGISIechKqOuFEgEQK2BwCfZmQhdJS6APHR4bWjLv61uJLpeL4AoMGH
+17E20BqV72A2kmAFcKywU41
=RF+1
-----END PGP SIGNATURE-----
 
M

mr-tom

Thanks for your reply.

What we've got here is a warehouse which is pulling data from a mainframe
system.

The two fact tables in question would be linked many to many and the only
link is unfortunately via guid, but an external one.

Essentially, the data structure of the mainframe itself is pants and the
guids themselves come from the mainframe.

From a design perspective this leaves few options for the data warehouse
(and therefore the access front end).

So it is an external source, but unfortunately one that's still guid.

Interesting though that the join by guid should work. Seems like I need to
do some more testing...
 
M

mr-tom

Thanks for your reply.

What we've got here is a warehouse which is pulling data from a mainframe
system.

The two fact tables in question would be linked many to many and the only
link is unfortunately via guid, but an external one.

Essentially, the data structure of the mainframe itself is pants and the
guids themselves come from the mainframe.

From a design perspective this leaves few options for the data warehouse
(and therefore the access front end).

So it is an external source, but unfortunately one that's still guid.

Interesting though that the join by guid should work. Seems like I need to
do some more testing...
 
D

David W. Fenton

I've got an access database which is essentially a front end for a
data warehouse.

Many of the tables in the warehouse are indexed by GUIDs, which in
SQL Server 2000 are Binary(16).

In SQL, these show up as format "0xnnnnnnnnnnnnnnnn"

In Access they show up as a string of vaguely oriental characters.

Display isn't an issue, but I need to do a query, where two tables
are joined by guid, and the query isn't working...

Our SQL chap says he can't represent the guid as text or anything
like that (I'd innocently hoped that "0xnnnnnnnnnnnnnnnn" could be
turned into a text string, but it seems not)

Now I understand Access (to an extent...) but I have no knowledge
of SQL etc, so please be gentle.

What if anything can I do?

I'm not sure if you can resolve the issue or not, but have a look at
this:

Replication and GUIDs, the Good, the Bad, and the Ugly
http://trigeminal.com/usenet/usenet011.asp?1033

The key takeaway from that is converting the things explicitly using
GUIDFromString() and StringFromGUID() (though that's more a code
issue than a SQL issue).

You might also try an implicit join, using the WHERE clause, instead
of the explicit join. In working with the replication system tables
I have to do that all the time in order to join tables on GUID
fields, and those are native Jet tables.
 
D

David W. Fenton

If the GUID
column is the only unique identifier in the table, then that
table, by definition, is not a valid representation of the data.
The data has to be uniquely identified by "outside references,"
which are real-world objects or identifiers; like automobile VIN
numbers or Social Security Numbers or Postal Codes and street
numbers, etc.

That's crazy theoretical cant, not a real-world judgment. There are
plenty of reasons for a GUID to be the real PK and the only actual
identifying data. Off the top of my head I can think of at least two
different cases where there's no alternative.
 
M

MGFoster

David said:
That's crazy theoretical cant, not a real-world judgment. There are
plenty of reasons for a GUID to be the real PK and the only actual
identifying data. Off the top of my head I can think of at least two
different cases where there's no alternative.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'd be interested in hearing those two cases.

The GUID is created by the computer and is not part of the actual data -
it is an artificial key. If you go into the real world the GUID will
not be part of the data that identifies the entity. I.e., where is the
GUID on an automobile? Where is it on a new-born baby? Where is it on
your house?

The GUID is like an AutoNumber it is convenient to use it in the tables
rather than the "real" keys. Though I'd not use GUIDs I admit I do use
AutoNumbers as convenient JOINing columns, but I always have good
Primary Keys that uniquely identify each entity - otherwise the tables
would be crap (like many I've had the misfortune of fixing [orphan
records; multiple duplicates, missing data, etc.] - good job security
for me, but bad for the clients).

Regards,
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBScLbb4echKqOuFEgEQJzAwCg7eB3zxzMjOaad9O22x9polSdAHAAnAzX
N+/TJQ8ECip1W3z17nyR54BR
=cqPi
-----END PGP SIGNATURE-----
 
D

David W. Fenton

I'd be interested in hearing those two cases.

Since posting, I'm not sure I remember the second, but the first is
in the replication system tables, each replica has a GUID ReplicaID.
That's real data.

Any system that needs globally unique identity keys is going to be
one where a GUID is a proper PK. I naturally gravitate to
replication as the scenario where this is the case, because it's
something I deal with on a daily basis, but you could be maintaining
more than one version of a database without replication and, needing
to merge the data, use a GUID as a PK.
The GUID is created by the computer and is not part of the actual
data - it is an artificial key.

You seem to think this is some kind of indictment. It's only members
of the "natural keys at all costs" cult who think that surrogate
keys are a problem.
If you go into the real world the GUID will
not be part of the data that identifies the entity. I.e., where
is the GUID on an automobile? Where is it on a new-born baby?
Where is it on your house?

In an OLAP system you may very well need to identify multiple
versions of the same data record, so you're actually record
information about the state of the *database* at a certain time, not
about the entities represented in the data tables. Thus, using a
database construct makes perfect sense, because the purpose of the
record in the OLAP data store is *not* to represent the real-world
entity, but to represent the state of a particular version of the
database.
The GUID is like an AutoNumber it is convenient to use it in the
tables rather than the "real" keys. Though I'd not use GUIDs I
admit I do use AutoNumbers as convenient JOINing columns, but I
always have good Primary Keys that uniquely identify each entity -
otherwise the tables would be crap (like many I've had the
misfortune of fixing [orphan records; multiple duplicates, missing
data, etc.] - good job security for me, but bad for the clients).

No one is claiming that one should not have unique indexes on
natural keys that have no Nulls (and can thus *have* a meaninful
unique index). This is the straw man argument made repeatedly by
those devoted to the natural key religion, and it's just a way of
saying "I think all of you who disagree with me are so stupid you
don't understand basic database design." As a matter of fact, I
think we understand it better than those who make that strawman
argument, since we recognize the practical impossibility in so many
circumstances of being able to use a natural key (without stuffing
it with useless default values that then have to be filtered out for
display purposes).

But, please do explain to me exactly what natural columns in the
data would allow you to represent multiple database states in an
OLAP data store. And if you think that date/time fields are going to
do it, then I suggest that you haven't really thought through the
problem very carefully.
 
M

MGFoster

David said:
I'd be interested in hearing those two cases.

Since posting, I'm not sure I remember the second, but the first is
in the replication system tables, each replica has a GUID ReplicaID.
That's real data.

Any system that needs globally unique identity keys is going to be
one where a GUID is a proper PK. I naturally gravitate to
replication as the scenario where this is the case, because it's
something I deal with on a daily basis, but you could be maintaining
more than one version of a database without replication and, needing
to merge the data, use a GUID as a PK.
The GUID is created by the computer and is not part of the actual
data - it is an artificial key.

You seem to think this is some kind of indictment. It's only members
of the "natural keys at all costs" cult who think that surrogate
keys are a problem.
If you go into the real world the GUID will
not be part of the data that identifies the entity. I.e., where
is the GUID on an automobile? Where is it on a new-born baby?
Where is it on your house?

In an OLAP system you may very well need to identify multiple
versions of the same data record, so you're actually record
information about the state of the *database* at a certain time, not
about the entities represented in the data tables. Thus, using a
database construct makes perfect sense, because the purpose of the
record in the OLAP data store is *not* to represent the real-world
entity, but to represent the state of a particular version of the
database.
The GUID is like an AutoNumber it is convenient to use it in the
tables rather than the "real" keys. Though I'd not use GUIDs I
admit I do use AutoNumbers as convenient JOINing columns, but I
always have good Primary Keys that uniquely identify each entity -
otherwise the tables would be crap (like many I've had the
misfortune of fixing [orphan records; multiple duplicates, missing
data, etc.] - good job security for me, but bad for the clients).

No one is claiming that one should not have unique indexes on
natural keys that have no Nulls (and can thus *have* a meaninful
unique index). This is the straw man argument made repeatedly by
those devoted to the natural key religion, and it's just a way of
saying "I think all of you who disagree with me are so stupid you
don't understand basic database design." As a matter of fact, I
think we understand it better than those who make that strawman
argument, since we recognize the practical impossibility in so many
circumstances of being able to use a natural key (without stuffing
it with useless default values that then have to be filtered out for
display purposes).

But, please do explain to me exactly what natural columns in the
data would allow you to represent multiple database states in an
OLAP data store. And if you think that date/time fields are going to
do it, then I suggest that you haven't really thought through the
problem very carefully.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The reason most "natural key" proponents push their agenda so strongly
is the possibility that the data tables will be separated and child
table data would not be able to be re-joined to parent tables based on
surrogate keys. The way DBs are constructed today I believe this
probability is very low, that's why I do use "surrogate" keys. I do
create "real-world" (whatever that means) DBs, and it's easier to use
surrogate keys rather than carry over multi-column PKs into related
(child) tables. My tables do have a unique index for each entity - call
it the PK or a Unique index - same thing different names.
replication system tables, each replica has a GUID ReplicaID.
That's real data.

It's maintenance data, not real-world data. I'm not stating that the
data should only have data items that are only found in the real world.
The DB is a representation of the real world data AND can also have DB
maintenance data items - like LastUpdate, UpdateUserID, GUID, etc. But,
the maintenance data items should not be used as keys to the data, but,
rather as keys to the up-keep of the DB, or for DB status information
purposes. In replication, that's the role of the GUID, it separates
two, or more, rows (records) that have the same PK (or Unique idx), the
only difference being the GUID.

One thing that bugs me about Microsoft Help articles - they recommend
using the GUID and AutoNumbers as PKs; not because it is correct, but
'cuz it is easiest. After all Access was/is marketed as an easy way to
create a DB - why go to the bother of learning all that DB theory/design
stuff! Just jump right in. Dawg save me!

DB status over time: The first thing that came to mind - DateTime data
types identify entities over time - that's the Time part of DateTime.
Is there some other dimension that can do that? Since GUIDs are random
and supposedly globally unique: Two GUIDs from different systems of
0xabcdefg12345 and 0x12345abcde - which one came first? - even if the
GUIDs are generated in order each system will have a different order
'cuz they ARE different (read about phase shifting - a good description
is given in Minimalist music theory [really]).

If you talking about changes to the DB in short periods of time - I
don't know of any data warehouse that can do an complete update in
milliseconds (the smallest fraction of time most DateTime data types
use) - since the OLAP data is usually a summary of real-world data, or
of time intervals that are greater than milliseconds (orders, sales -
i.e., human induced actions). Scientific DBs could have less-than
millisecond intervals, but that data would probably be stored in an OLTP
db not an OLAP db. And the machines that store the data would have
clocks & data types that could measure time intervals less than a
millisecond. They would still be DateTime data types that indicate the
change (status) to the DB over Time - there's that pesky Time word
again.

Non-scientific DB status can be easily maintained by just making
snapshots of the data at set times - backups, really. Easier to do now
a-days w/ the low cost of data stores.
But, please do explain to me ...

Why the raised hackles, Dave? We're just having a difference of
opinion. I don't beat up posters 'cuz they use AutoNumbers or GUIDs as
PKs (unique idx). I just recommend that they don't do that. I've even
been known to provide a solution to their problem using their designs -
even though its esthetically grating. :) I like Joe Celko (a true
believer in the natural keys debate) but I don't religiously follow his
dogma.

Regards,
--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBScMWdIechKqOuFEgEQK1rACgr7w9nP7IqudKkgdgdWaKUA0xlZ8AnAyv
l9SJ9FjEEHVGOVA/3A28yt6P
=MxZq
-----END PGP SIGNATURE-----
 
M

mr-tom

Thanks David,

This looks extremely useful and I'll be going through it with our SQL chap.

Cheers,

Tom.
 
D

David W. Fenton

The reason most "natural key" proponents push their agenda so
strongly is the possibility that the data tables will be separated
and child table data would not be able to be re-joined to parent
tables based on surrogate keys. The way DBs are constructed today
I believe this probability is very low, that's why I do use
"surrogate" keys. I do create "real-world" (whatever that means)
DBs, and it's easier to use surrogate keys rather than carry over
multi-column PKs into related (child) tables. My tables do have a
unique index for each entity - call it the PK or a Unique index -
same thing different names.

Er, what? "call it the PK or a Unizue index -- same thing" is
gibberish. You can have a unique index on something other than the
PK, and when using surrogate keys, it may be *necessary* to have a
unique index on the natural key (though very often the fact that you
need a surrogate key is caused precisely by the fact that you
*can't* have a unique index on the natural key). So, I can't see how
you can think that the two are the same thing -- they aren't.
It's maintenance data, not real-world data.

No, it's not. The ReplicaID is stored in each replica. A replica can
have the same UNC name and not be the same replica. There is no
other data in MSysReplicas that can identify the unique replica.
I'm not stating that the
data should only have data items that are only found in the real
world.

Replicas don't exist in the "real world" in the narrow sense you
appear to be using. They are a database construct and a GUID is the
only possible identifier that could be used for them, because they
*must* be guaranteed to be unique in all cases.
The DB is a representation of the real world data AND can also
have DB maintenance data items - like LastUpdate, UpdateUserID,
GUID, etc. But, the maintenance data items should not be used as
keys to the data, but, rather as keys to the up-keep of the DB, or
for DB status information purposes. In replication, that's the
role of the GUID, it separates two, or more, rows (records) that
have the same PK (or Unique idx), the only difference being the
GUID.

This shows that you don't understand replication. The attitude you
outline above is exactly why people so often screw up Jet
replication, because they confuse their wrong-headed view of the
identity of a replica with the real-world identity of the replica.
One thing that bugs me about Microsoft Help articles - they
recommend using the GUID and AutoNumbers as PKs; not because it is
correct, but 'cuz it is easiest. After all Access was/is marketed
as an easy way to create a DB - why go to the bother of learning
all that DB theory/design stuff! Just jump right in. Dawg save
me!

GUIDs in Access are not easy to use and I would recommend against
using them in all cases, in fact.
DB status over time: The first thing that came to mind - DateTime
data types identify entities over time - that's the Time part of
DateTime. Is there some other dimension that can do that? Since
GUIDs are random and supposedly globally unique: Two GUIDs from
different systems of 0xabcdefg12345 and 0x12345abcde - which one
came first?

For OLAP data, that may or may not be important. It depends on how
it's used.

And that's the whole point.

You can't make blanket statements about whether a GUID is a good PK
or not without knowing the requirements of the database and
application in which it's going to be used.

- even if the
GUIDs are generated in order each system will have a different
order 'cuz they ARE different (read about phase shifting - a good
description is given in Minimalist music theory [really]).

And that only matters IF IT MATTERS. If it *does* matter, then it's
a wrong design. If it doesn't matter, then, obviously, it's NOT A
PROBLEM.
If you talking about changes to the DB in short periods of time -
I don't know of any data warehouse that can do an complete update
in milliseconds (the smallest fraction of time most DateTime data
types use) - since the OLAP data is usually a summary of
real-world data, or of time intervals that are greater than
milliseconds (orders, sales - i.e., human induced actions).
Scientific DBs could have less-than millisecond intervals, but
that data would probably be stored in an OLTP db not an OLAP db.
And the machines that store the data would have clocks & data
types that could measure time intervals less than a millisecond.
They would still be DateTime data types that indicate the change
(status) to the DB over Time - there's that pesky Time word again.

When you start talking about actual applications, you make sense.
When you speak in the abstract, you spout gibberish.
Non-scientific DB status can be easily maintained by just making
snapshots of the data at set times - backups, really. Easier to
do now a-days w/ the low cost of data stores.

And how do you then combine all those snapshots into a single
database?
Why the raised hackles, Dave?

My name is not "Dave," asshole.
We're just having a difference of
opinion. I don't beat up posters 'cuz they use AutoNumbers or
GUIDs as PKs (unique idx). I just recommend that they don't do
that.

And when you provide specifics about a particular application, I'm
sure that you're giving good advice.

But in this thread, you've made abstract pronouncements about what
is and isn't correct, and you've been WRONG.
I've even
been known to provide a solution to their problem using their
designs - even though its esthetically grating. :) I like Joe
Celko (a true believer in the natural keys debate) but I don't
religiously follow his dogma.

I HATE Celko. I think he's a major asshole.

Hmm...
 
M

MGFoster

David W. Fenton wrote:
My name is not "Dave," asshole.
I HATE Celko. I think he's a major asshole.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

....OKay, David. Fixated opinions noted. This conversation is over.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)
** Respond only to this newsgroup. I DO NOT respond to emails **

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBScRhpoechKqOuFEgEQLGJQCgrSdsbtzKeHHADvQsXak9q6uAQ+8AoK9J
pwT68zg/EWZFOurH+Fw5XZRT
=irAG
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top