Database Normalization

P

polisci grad

I am currently working on developing a database, in Access 2003, to be used
by mulitple users (albeit in a small environment) who will not be adding or
changing data. I have little experience in Access, although I have read
through many of the tutorials, training sessions, and i have a big book on
the stuff too. However, I am concerned about the way i have normalized my
tables.

this database is to consist of a large amount of disparate and unrelated
data. it is to serve an organization involved in the transportation
industry, but the range of valuable data is huge. for instance, some of my
variables include km of road, # of road collisions, # of vehicle sales, % of
industry share, and so on.

As such, I have approached the problem by constructing a large main table
for all raw data (named data) and small tables to define coded values (like 1
for safety). My fields are Generated ID, Theme (ranging from 1-7), Region
(from 1-20), Industry (1-6), Year (1990-2006), Unit (abbv. title for
variable), and value_#, value_$, value_%, explanation, source, and notes. as
such, each number gets its own record, meaning a simple table in excel
becomes many lines of similar looking data.

now, since one 'unit' or variable may contain value_# (numeric data) for
various regions, years, industries, the amount of repeated data is large,
begging the question of proper normalization. while i have managed to
construct some basic queries, i am worried i may have limited further
advancement with this construct. can anybody offer suggestions based on my
short description of the problem?
 
R

Roger Carlson

If no one is entering data, how does the data get in there? I'm not asking
to be smart. If the database is used only for reporting purposes and it
gets entered in batch from an external source, then much of the need to
normalize to 3NF (third normal form) disappears. It may be sufficient to
leave the data in 1NF.

Much of the rationale for removing redundant data (which is what
normalization principly does) is to preserve data integrity when entering or
changing data. If this does not happen (and I mean never) then you can
leave a large flat file. Such files are often easier to report from. This
is essentially what a data warehouse is.

--
--Roger Carlson
MS Access MVP
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 
P

polisci grad

Well, presumably, I continue to enter it until I quit! But youre saying I
could even do away with the coding in order to simplify my tables? Will that
still enable me to run all types of queries? Is there even a need to use
Access?

Eventually, my goal is to set a database up that users can access, search
for particular data, usually in a time-series, by either a keyword search or
narrowing of criteria, and then present it in a graph or potentially run
statistical analysis with the data. This would require choosing independent
and dependent variables which could be cross tabulated, with and without
aggregate values and other mathematical stuff.

By the way, thanks for the quick reply!
 
R

Roger Carlson

My comments were of a general nature. You haven't given enough detail for
me to be more specific. They stem from an appplication I have which is fed
by an external data source and no one anywhere enters data. It is strictly
for reporting. Therefore, I have very large, flat files that are only
normalized to first normal form. (Some tables have 200+ fields.) The ONLY
reason I can do this is because no one enters data. It is fed periodically
in a batch process.

However, if ANYONE (including you) is entering this data my hand, you must
normalize your database to at least 3NF. You cannot trust the validity of
your data if it is not. Given the additional information below, I would
certainly keep your coding and lookup tables and continue to research the
Normalization process.

Without knowing A LOT more about your business rules, it's impossible for me
to give concrete advice. I'm not particularly fond of creating table
designs for people in a forum like this because there are a lot of questions
that can be easily missed in this context.

I suggest you look at the following resources:
"Database Design for Mere Mortals" by Michael Hernandez.
"Access Database Design and Programming" by Steve Roman (O'Reilly)

I've used both of these books in my college courses. They take a different
approach to Normalization, but they end up at the same point. On my
website, there are some tutorials that use the Hernandez process that might
be helpful AFTER you read the book. You can find them here:
http://www.rogersaccesslibrary.com/TutorialsDesign.html

Other handy references:
Library of Free Data Models
Here are some useful 'Kick-Start' Data Models
Also, check out these sites:
http://www.datamodel.org/NormalizationRules.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;q209534
http://support.microsoft.com/default.aspx?scid=kb;EN-US;164172
http://support.microsoft.com/default.aspx?scid=KB;en-us;q234208


--
--Roger Carlson
MS Access MVP
Access Database Samples: www.rogersaccesslibrary.com
Want answers to your Access questions in your Email?
Free subscription:
http://peach.ease.lsoft.com/scripts/wa.exe?SUBED1=ACCESS-L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top