"Blissfully Ignorant" <
[email protected]>
wrote in message
What would violate the integrity? What are the rules?
:
<major db discussion>
My apologies of you already know any and/or all of this.
Part of the trouble about understanding relationships could come
from the fact that Relation, Relations, and Relationship(s) have
multiple meanings in regards to, "Relational Databases."
One meaning, is that the underlying theory behind Relational
Databases is a branch of math called Relational Algebra (invented by
Dr. Codd). This branch of math works on "sets" of data. A "set" in
Relational Algebra is called a "relation" (don't ask me why . . .).
Then, some people in various companies (IBM, Oracle, etc.) took this
branch of math and made it into the precursors of the Relational
Databases apps we have today (they grabbed "relational" off of
Relational Algebra and stuck it in front of "database", and voila,
we have "Relational Databases").
Another meaning is that "Relation" is the technical name of what
Relational Databases call "Tables". (And just to be complete, the
technical name of "Row" is "Tuple".)
Now, in order to run our databases, we have something that goes by
the technical term, Referential Integrity. This states that if we
start with table called Items, that has a Primary Key of UPC, and we
have table called Pricing, that has a primary key of UPC and
PricingDate. Whatever happens, we don't want to have a row in the
Pricing table that has an UPC value that doesn't appear in the Items
table. Why? Because we'd have prices in Pricing for items no one
could in the company could identify (there being nothing in Items to
identify the item). Relational Databases use internal management,
usually indexes, to *enforce* a "rule" that automatically prevents
users from deleting any row in Items . . . *when there is still any
row* . . . in Pricing with the UPC about to be deleted from Items.
This process, this "enforcement" of the "rules", is called
Referential Integrity.
Now, in MS Access, you open up the "Relationships" window, and make
some table sub-windows appear on it. Then you drag fields back and
forth between the table sub-windows and establish nice little lines
that go from table to table, and MS Access calls these lines
"Relationships" (and a lot of other people do, too). In this case,
an MS Access "Relationship" is the method whereby "Referential
Integrity" is set up. It should be noted that MS Access uses the
word "relationship" in completely different ways than other database
products, and differently from Data Modeling (more on that in a
second), but is so pervasive on the PC desktop world, that
terminology creep has taken place, causing the MS Access version of
"relationship" to be thought of by many as "what real relationships
are in databases".
To further confuse the issue (yes, there's more), Data Modeling, the
process where ideas about things (entities) are organized into
understandable formats, like big charts covered by boxes with lines
running between them (rather remarkably like the boxes and lines in
the MS Access Relationships window), also uses the concept of
"relationships". Except in Data Modeling, a "relationship" is an
actual box on the chart.
Data Models themselves are built to help database designers create
good databases. When all the aspects of the chart (there can be
many versions and detail levels) are completed, and I'm skipping
over a lot of stuff here for simplicity, the chart is turned over to
the database designers, and they go to work.
The boxes on a Data Modeling chart represent "entities", or the
people, places, things, or knowledge that are being described on the
chart. A "relationship", then, is a form of knowledge because it
*tells* us something. Therefore, it gets a box on the chart. And
here's another one, the lines on a Data Model chart? They represent
something called Cardinality. Cardinality is one to one, one to
many, many to many, recursive (Bill of Materials), etc. To yet
further confuse things, the word "relationship" *frequently*
replaces "Cardinality" even when experts are discussing the
situation. It's important, because a table of a concrete entity
(like a Book) can have a many to many "cardinality" with a
"relationship", an abstract piece of knowledge. I told you it would
be confusing. If a Book can be checked out of library by many
people, the book does not have a one to many cardinality with
people, there is, rather, a relationship between books and people,
and books have a cardinality with that relationship, and people have
a cardinality with that relationship (we'll call it a CheckedOut
relationship).
Concrete Entity: Books
Concrete Entity: People
Abstract Entity: CheckedOut
Both Books and People can appear in Checked-Out over an over again,
as the same person checks out and reads the same book over and over
again. The check out date will be the attribute ("CheckedOutDate")
of CheckedOut that helps define each row. The primary key would be,
BookID, PeopleID, and CheckedOutDate. The "relationship" between
books and people is that people can check the books out of the
library. The two cardinalities from the two concrete entities to
the one abstract relationship are the one-to-many (etc.) part of it.
Yes, cardinality is *rarely* discussed by MS Access users. But it
is still important to realize that it exists. It is further
necessary to realize that what MS Access "calls" a "relationship" is
really called something else by a large fraction of Data Modelers
and Database Designers. It's important because you have to realize
that a real-world entity, a "bit of knowledge", may be a
"relationship" between those entities and that "relationship" may be
described in the database by a Table. And in MS Access, the
cardinality between the tables is called a "relationship", as well.
Don't ask me why the MS Access designers decided to rob an already
massively overused term and apply it the way they did, but they did,
and knowing about it can be quite helpful.
Now, jumping out of the Data Modeling picture, let's go a little
further into Database Design. There are three main rules of good
table design for Relational Databases. They are Called First Normal
Form, Second Normal Form, and Third Normal Form (often abbreviated
1NF, 2NF, and 3NF). There's also BCNF, 4NF, and 5NF (and even a
proposed 6NF). These rules, when followed, among other things, make
establishing RI easier (they make the whole DB Design process
easier, most of the time). Look up these rules on the internet (you
may already have, I am sliding away from the original topic) for a
set of rule that, if followed, will allow you to build databases
that will help you store the data in a way best suited for this kind
of database. This is important because unless the data is stored
correctly, it can become very difficult to get the answers you want.
</major db discussion>
---------------------------
Database Normalization:
Basics:
About.com
http://databases.about.com/od/specificproducts/a/normalization.htm
Intermediate:
MySQL's website:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
Advanced:
Wikipedia:
http://en.wikipedia.org/wiki/Database_normalization
Very Advanced:
University of Texas:
(I quite like this whole site, since it has a handy menu on the
right
describing many important aspects of database normalization and
modeling.)
http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html
Sincerely,
Chris O.