multiple data types in normalized database

L

LAF

If one has a normalized database, then different types of measurement data
will not be in different fields, but rather in a structure with
measurement.name and measurement.value as the field names. A brief example
will illustrate the potential problem.

tblMeasurement.lookup: measurement.id, measurement.name, measurement.units,
etc.

tblMeasurements: measurement.name,measurement.value, plus fields linking to
other tables.

The problem is that the field measurement.value can only be of one data type
(text, numeric, etc.). What if some of the measurement.values are text, some
are numeric, and some are logical? If this is the case, it appears that text
is the lowest common denominator, and access functions exist that can
transform text to numeric or logical in queries. Alternatively, it would be
easier to have separate measurement tables for the measurement.names
associated with different data types? What is the best way to keep normality
but have the correct data types?
 
A

Allen Browne

If text, are we talking discrete values to choose from (e.g. Low, Medium,
High), or free-form text?

If discrete values, could you use a lookup table with a Number primary key,
a MeasurementID foreign key (what this value applies to), and a
MeasurementDescription (the actual text)? That would permit you to store the
entires as a Number, which would probably be more efficient.

If that is not suitable, take a look at how Duane Hookom suggests handling
this kind of data in his At Your Surey database:
http://www.rogersaccesslibrary.com/Otherdownload.asp?SampleName='At Your Survey 2000'
The sample database illustrates how to store answers to survery questions
which can be a mix of quantative and qualitative responses.
 
M

Mike Sherrill

If one has a normalized database, then different types of measurement data
will not be in different fields, but rather in a structure with
measurement.name and measurement.value as the field names.

Really? What principle of normalization did you apply to come up with
that?
What if some of the measurement.values are text, some
are numeric, and some are logical?

In relational terms, does this translate to "What if the values come
from different domains?"
 
L

LAF

A concrete example may be useful since Mike Sherrill has identified a
problem. The database in question involves bird banding. There are events
based on site, date, and time; bird ID's; and associated junction table. For
each event for each bird, a number of things are identified, measured, and
sampled. These might be different domains within a particular bird-event.
Things identified would be sex, age, breeding condition, molt, disease,
ectoparasites. These things identified would include text and logical
fields. Things measured could all be numeric. Things sampled, like blood
and feathers, could be all text.

My initial problem would be solved through 3 tables instead of one. These
tables would have one-to-one relationships with each other through the
EventID and BirdID combination primary key. Is this the best solution?
 
T

Tony Vrolyk

Forgive me for interjecting but I am doing so partially for my own learning
experience.

Is there such a thing as over-normalization? What you are describine sounds
akin to havng a Customers table witha CustomerId field and then a
Customers_Info table where you store all the contact info. You might have a
fields for Name and Value and look like this

Customer_Info.Name Customer_Info.Value
FirstName John
LastNAme Doe
Address 123 Some Street
City New York
State NY

I think eveyone who tracks customers has a single customer table whith
defined fields for FirstName, LastName, Address, City, etc etc

So with your table you might have something like...(forgive my obvious
idiocy when it comes to ornathology)
measurement.name measurement.value
Sex Male
Age 2
Molt Yes
Disease None

Sounds like you are depending on data in one field to defind the data type
in another. This would seem overly complicated to query. Since I assume each
Event can collect data from multiple birds this is how I would handle it:

Tables: Birds, Events and Birds_Events.

Birds would contian fields for the stuff that is always true about the bird:
BirdID, Breed, Gender, ApproxDOB
Events would contain basic data on the Event: EventID, EventDate, EventTime,
EventSite
Birds_Events would link the two and include all the measured values: BirdID,
EventID, Molt, Breeding Condition, Disease, Blood, Feathers.

Each bird can be measured at multiple Events. Each Event can be used to
measure multiple Birds. This allows you to define each field for its most
appropriate data type and makes querying much easie in my opinion. If this
isn't right than I have been doing it all wrong.

Thanks for entertaining my rant,
Tony V
 
M

Mike Sherrill

There are events
based on site, date, and time; bird ID's; and associated junction table. For
each event for each bird, a number of things are identified, measured, and
sampled. These might be different domains within a particular bird-event.

I find it hard to believe that the banders talk of bird-events, but
I'm not the expert. Are bird ID's just the band numbers?
Things identified would be sex, age, breeding condition, molt, disease,
ectoparasites.

I'm pretty sure a single bird could be diagnosed with more than one
disease and more than one kind of ectoparasite during a bird-event.
And I suppose a diagnosis of disease could be done weeks later, based
on blood work. I'm just guessing about molt, though, because I don't
know what the banders say about molt.
These things identified would include text and logical
fields. Things measured could all be numeric. Things sampled, like blood
and feathers, could be all text.

My initial problem would be solved through 3 tables instead of one. These
tables would have one-to-one relationships with each other through the
EventID and BirdID combination primary key. Is this the best solution?

I don't know. If you're suggesting tables like [Things_identified],
[Things_measured], and [Things_sampled], I'd say not. Post some
representative sample data.
 
M

Mike Sherrill

Is there such a thing as over-normalization?

No. There are a lot of ways to go wrong in database design, but
"over-normalization" doesn't have a generally accepted meaning.

[snip]
Sounds like you are depending on data in one field to defind the data type
in another. This would seem overly complicated to query.

It would be hard. Integrity constraints might be impossible to
declare. I'm more concerned with constraints.
Birds would contian fields for the stuff that is always true about the bird:
BirdID, Breed, Gender, ApproxDOB

Wait . . .
Events would contain basic data on the Event: EventID, EventDate, EventTime,
EventSite

In Access, you shouldn't separate date and time into two columns.
Birds_Events would link the two and include all the measured values: BirdID,
EventID, Molt, Breeding Condition, Disease, Blood, Feathers.

Species, gender, and age are all "measured" values. You should
probably assess them each time the bird is in hand. If you find a
bird's sex has changed from "M" to "F", you know you have a data
error. If you just overwrite "M" with "F", you still have a data
error, but you'll probably never discover it.

An additional problem is that valid values for age might be
species-specific. I know that, in field observations, "fourth winter"
is an identifiable plumage for some gulls, but not for, say, Carolina
Chickadees. I don't know how the OP measures age, and I don't have
any idea how to do it myself (other than by plumage).
 
T

Tony Vrolyk

I was really questioning the table structure. If one were to take the
original post to it's extreme, no table would ever have more then two fields
besides the record ID. Over-normalization seemed an appropriate sounding
term for the situation even if it has not generally accepted meaning -

In your points you mostly address issues like field types and data locatoin.
Maybe Bird tracking is a difficult one to use as an example.


hey, maybe I just coined a new term - okay maybe not.
It would be hard. Integrity constraints might be impossible to
declare. I'm more concerned with constraints.

Not sure what you mean here. Could you clarify?
In Access, you shouldn't separate date and time into two columns.

True but it was just an example. I was just using the posters example of
date and time and I wasn't thinking about that rule at the time.
Species, gender, and age are all "measured" values. You should
probably assess them each time the bird is in hand. If you find a
bird's sex has changed from "M" to "F", you know you have a data
error. If you just overwrite "M" with "F", you still have a data
error, but you'll probably never discover it.

An additional problem is that valid values for age might be
species-specific. I know that, in field observations, "fourth winter"
is an identifiable plumage for some gulls, but not for, say, Carolina
Chickadees. I don't know how the OP measures age, and I don't have
any idea how to do it myself (other than by plumage).

Again this may be true but it is specific to this application. I am guessing
the original poster can make judgments about where the data is tracked I was
just using examples to explain in general terms how I wouls structure the
tables. Gender never does change, it may be mis-measured. The point was
this: Reatlively static data in the Birds Table, Event specific data in the
Events Table and measured data in the Birds_Events table.

Instead of creating tables with numerous fields for different data the
posted wanted two fields - one to name the value and another to store the
value itself. This is what I was bringing into question.

Again I am just posting for discussion and learning. I am no expert on the
subject - just a self-taught access hack - if you call using these groups
and a dozen Access support sites being self-taught

Tony V
 
M

Mike Sherrill

[snippage]
I was really questioning the table structure. If one were to take the
original post to it's extreme, no table would ever have more then two fields
besides the record ID. Over-normalization seemed an appropriate sounding
term for the situation even if it has not generally accepted meaning -

"Normalization" has a pretty precise meaning in database design.
There's such a thing as 1NF, 2NF, and DKNF. There's no such thing as
"over NF". :)

The problem here is in hiding metadata--the name of an
attribute--within data.
Not sure what you mean here. Could you clarify?

Valid values for sex are "male", "female", and "I don't know" (I
guess). Let's say that valid values for age are simply "juvenile" and
"adult". In order to keep the data right, you'd have to build a
validation rule or a CHECK constraint to cover each case, along the
lines of "If column1 is 'sex' then column2 is either 'male' or
'female', but if column1 is 'age' then column2 is either 'juvenile' or
'adult', but if column1 is 'ectoparasite' then column2 is ..." It
might be impossible to express that as a validation rule (Access has a
limit on the length) or as a CHECK constraint (don't know the limits,
but I'm sure there are some).

OTOH, if the one *column* is "sex", then you can enforce integrity
constraints with a simple validation rule, CHECK constraint, or
foreign key constraint. Ditto for the other columns.
Again this may be true but it is specific to this application. I am guessing
the original poster can make judgments about where the data is tracked

Well, we both read the original post. Are you really that confident
in the OP's judgment? I'm not. (And that's an observation, not a
criticism. Not many of us got this stuff right the first time. I
know I certainly didn't.)
Instead of creating tables with numerous fields for different data the
posted wanted two fields - one to name the value and another to store the
value itself. This is what I was bringing into question.

And you were right to do that. It's *highly* questionable.
 
T

Tony Vrolyk

Thank for all your explanations. The idea of using the validation rules or
CEHCK constraints had not occured to me. I hav eonly used them a few times.
Also the whole subject of normalization had been a bit theoritical to me. I
think I understand the basics and in msot cases my dbs have been normalized
to a large degree but I am sure I have broken some rules in places. I wasn't
aware of the specific normalization types 1NF, 2NF and so on. I did a search
and found this article -
http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/1/ -
Assuming it is accuratly written than I am at least designing my dbs to
satisfy 3NF which I was happy to find out.

Thanks
Tony

Mike Sherrill said:
[snippage]
I was really questioning the table structure. If one were to take the
original post to it's extreme, no table would ever have more then two
fields
besides the record ID. Over-normalization seemed an appropriate sounding
term for the situation even if it has not generally accepted meaning -

"Normalization" has a pretty precise meaning in database design.
There's such a thing as 1NF, 2NF, and DKNF. There's no such thing as
"over NF". :)

The problem here is in hiding metadata--the name of an
attribute--within data.
Not sure what you mean here. Could you clarify?

Valid values for sex are "male", "female", and "I don't know" (I
guess). Let's say that valid values for age are simply "juvenile" and
"adult". In order to keep the data right, you'd have to build a
validation rule or a CHECK constraint to cover each case, along the
lines of "If column1 is 'sex' then column2 is either 'male' or
'female', but if column1 is 'age' then column2 is either 'juvenile' or
'adult', but if column1 is 'ectoparasite' then column2 is ..." It
might be impossible to express that as a validation rule (Access has a
limit on the length) or as a CHECK constraint (don't know the limits,
but I'm sure there are some).

OTOH, if the one *column* is "sex", then you can enforce integrity
constraints with a simple validation rule, CHECK constraint, or
foreign key constraint. Ditto for the other columns.
Again this may be true but it is specific to this application. I am
guessing
the original poster can make judgments about where the data is tracked

Well, we both read the original post. Are you really that confident
in the OP's judgment? I'm not. (And that's an observation, not a
criticism. Not many of us got this stuff right the first time. I
know I certainly didn't.)
Instead of creating tables with numerous fields for different data the
posted wanted two fields - one to name the value and another to store the
value itself. This is what I was bringing into question.

And you were right to do that. It's *highly* questionable.
 
M

Mike Sherrill

I
think I understand the basics and in msot cases my dbs have been normalized
to a large degree

A nitpick--normalization applies to tables, not to databases.
I wasn't
aware of the specific normalization types 1NF, 2NF and so on. I did a search
and found this article -
http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/1/ -
Assuming it is accuratly written than I am at least designing my dbs to
satisfy 3NF which I was happy to find out.

"The Three Normal Forms" is a dead giveaway. There are at least
seven, and I think Chris Date has recently written about another, but
I could be wrong about that.

There's a *lot* of misinformation about normalization and database
design on the web. I don't think you can do better than Date's
_Introduction to Database Systems_, and it's easy to do a lot worse.
You can probably find a used copy pretty cheap, but get the seventh
edition, at the very least.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top