Robin:
One thing you should try to gain some understanding of is the concept of
'functional dependency' as it is this which underlies the 'rules' of
normalization. A column in a table is functionally dependent on another
column if its value is determined by the value of the first column. If an
employee has an EmployeeID of 42 and a first name John and a last name Brown
the the FirstName and LastName columns are functionally dependent on the
EmployeeID column as every time EmployeeID 42 is encountered it refers to
John Smith.
Essentially, as far as the first 3 normal forms are concerned, non-key
columns must be dependent solely on the whole of a table's key. I recall
John Vinson summing this up one as :
"The key, the whole key and nothing but the key, so help me Codd"
Lets say John Smith lives in San Franciso, CA, so we add City and State
columns to the Employees table to record this. Now the City column (assuming
unrealistically for the moment that we don't record any more detailed
location for each employee) is functionally dependent solely on the
EmployeeID. You might think its also functionally dependent on the
FirstName/LastName column combination too, but that's not the case as there
could be another employee John Smith who's in New York. So City is
functionally dependent on the whole of the key, and so far the table is
correctly normalized. When we come to State, however, it is functionally
dependent on EmployeeID, BUT its also functionally dependent on City as San
Francisco determines that the State must be California (there may be another
San Francisco for all I know, but if so we'll ignore that for the sake of
this example and assume all city names are unique). So State is not
functionally dependent solely on the key (EmployeeID) but also on the non-key
column City. It is said to be transitively functionally dependent as
EmployeeID determines City which in turn determines State. So this breaks
the rule and the table is not correctly normalized to third normal form.
The solution to such 'redundancy' is to decompose the table. Remove the
State field from Employees and create a Cities table with columns City and
State. Create a States column with column State (possibly with two columns
if you want to store the full name as well as the abbreviation for each
State). Enforce referential integrity in the relationships between Employees
and Cities and between Cities and States. This ensures a non-existent city
can't be entered into Employees or a non-existent State into Cities. In
reality, as city names can be duplicated you'd have a numeric CityID in
Employees referencing a unique numeric CityID primary key of Cities.
I hope the above has given some idea of the basics of how functional
dependencies between columns in a table are used to determine whether a table
meets the normalization rules. When it comes to higher normal forms the
picture becomes a little more complex, particularly when it comes to fifth
normal form which involves Cross Join Functional Dependencies over more than
one table. If you concentrate on getting a good grasp of the principles of
the first 3 forms, however, and test your table definitions against these
principles you'll be on pretty safe ground for the vast majority of cases.
Finally let me give a strong endorsement to what Tina said about spending
time on the design of the tables and the relationships between them. Sit
down with a good supply of pencils and paper and draw the design out
diagrammatically, posing questions as to its being a valid model of that part
of the real world with which it is concerned. Only when you are happy with
the logical model should you start creating the tables themselves. If you get
the logical model right the interface will fall into place easily; get it
wrong and you'll end up jumping through hoops for ever more.
Ken Sheridan
Stafford, England