Dana:
In addition to the points Allen has made, you also have some
redundancy in your
new table by including both City and State columns. Knowing the city
means you know the state, so the latter is redundant in this table.
This is not merely a case of inefficiency; more importantly it leaves
the door open to inconsistent data as the same city could be entered
in different sates in separate rows.
What you should have is a numeric CityID column I the main addresses
table. City names can be duplicated, so you need to be able to
distinguish them. You should then have a Cities table with columns
CityID, City and StateID, and a States table with columns StateID and
State. The primary key CityID and StateID columns in Cities and
States can conveniently be autonumbers, but the corresponding foreign
key CityID and StateID in the main address table and Cities must be
straightforward long integer number data types.
Of course the same reasoning in fact means that CityID itself is
redundant in the main table as strictly its determined by Street (or
rather by a numeric StreetID as street names are also duplicated).
Most people would probably not normalize a table to that degree
however.
Area and zip codes are, I think, a different matter. Certainly here
in the UK the former and the first (outgoing) part of the latter do
not determine City as they can span city and even county boundaries (a
full post code is very specific on the other hand and its common here
for databases to identity a complete address solely by the post code
and house number/name). I imagine that it would be possible to create
tables to model the many-to-many relationships between area codes and
cities, and between zip codes and cities, but I don't know if this is
often done.
Another point to consider is whether you might need to record somebody
by State but not by City. This unlikely I'd have thought, but it
illustrates a principle which applies when creating such things as an
international database where each country might not have a regional
structure (e.g. Monaco, Liechtenstein etc) so the 'region' link
between Cities and Counties is broken. What's done is to include
rows in cities with a value such as 'N/A', one row per country. In
your case there'd be one 'N/A' or 'Unknown' row per state in Cities.
This enables the chin between the main addresses table and the States
table to be maintained even if the city is unknown, but the state is.
You might find the following helpful. It’s a set of queries and
explanatory notes taken from a demo file I produced some time ago to
show how imported data can be decomposed into normalized tables. The
demo uses a simple international contacts lists so there are
countries, regions and cities involved in the addresses. Also each
contact can have more than one employer. You'll see that the basic
principle is first to populate the 'referenced' tables (those on the
one side of a one-to-many relationship) and then the 'referencing'
tables (those on the may side). The latter can include tables which
model many-to-many relationships (sometimes called 'junction' tables
or similar).
1. The query below inserts rows into the Countries table with unique
values of the Country column imported from Excel into the table
MasterTable. This is necessary before rows can be inserted into the
Regions table as that table references the Countries table in a many-
to-one relationship.
INSERT INTO Countries (Country)
SELECT DISTINCT Country
FROM MasterTable;
2. Having inserted rows into the Countries table rows can now be
inserted into the Regions table with the query below. This joins the
MasterTable to the newly filled Countries table on the Country columns
and inserts unique values from the Region column of the MasterTable
and the CountryID column of the Countries table into the Regions
table.
INSERT INTO Regions (Region, CountryID)
SELECT DISTINCT Region, CountryID
FROM MasterTable INNER JOIN Countries
ON MasterTable.Country=Countries.Country;
3. Having inserted rows into the Regions table rows can now be
inserted into the Cities table with the query below. This joins the
MasterTable to the newly filled Regions table on the Region columns.
The Countries table is joined to the MasterTable on the Country
columns and to the Regions table on the CountryID columns, thus taking
account of any regions of the same name in different countries. The
query inserts unique values from the City column of the MasterTable
and the RegionID column of the Regions table into the Cities table.
INSERT INTO Cities (City, RegionID)
SELECT DISTINCT MasterTable.City, Regions.RegionID
FROM Countries INNER JOIN (MasterTable INNER JOIN Regions
ON MasterTable.Region=Regions.Region)
ON (MasterTable.Country=Countries.Country)
AND (Countries.CountryID=Regions.CountryID);
4. The previous queries inserted rows into the Countries, Regions and
Cities tables. Following the insertion of data into the last of these,
Cities, it is now possible to insert rows into the Contacts table as
this only needs to reference the Cities table, the relevant Region and
Country being referenced via the relationships between these three
tables. The query below does this by joining the MasterTable to both
the Cities table, on the City columns, and to the Regions table, on
the Region columns. The Cities table is also joined to Regions on
RegionID and the Countries table is joined to the MasterTable on
Country and the Regions table on Country ID. This is to take account
of the possibility of two cities having the same name, but being in
different regions, which themselves could theoretically have the
same name but be in different countries, so that the correct CityID
value is inserted into Contacts.
5. For simplicity it is assumed that contacts at the same address
have unique names. This might not always be the case, particularly
with commercial premises (the developer of this demo once worked with
two Maggie Taylors in the same building!). In such cases, however,
there is likely to be some distinguishing value such as Job Title or
Department which could be used.
INSERT INTO Contacts ( FirstName, LastName, Address, CityID )
SELECT DISTINCT MasterTable.FirstName, MasterTable.LastName,
MasterTable.Address, Cities.CityID
FROM Countries INNER JOIN ((MasterTable INNER JOIN Cities ON
MasterTable.City = Cities.City)
INNER JOIN Regions ON (Regions.RegionID = Cities.RegionID)
AND (MasterTable.Region = Regions.Region))
ON (Countries.CountryID = Regions.CountryID)
AND (Countries.Country = MasterTable.Country);
6. The query below inserts rows into the Employers table with unique
values of the Employer column imported from Excel into the table
MasterTable. This is necessary before rows can be inserted into the
ContactEmployers table as that table references the Employers table.
INSERT INTO Employers (Employer)
SELECT DISTINCT Employer
FROM MasterTable;
7. Having inserted rows into the Contacts and Employers table it is
now possible to insert rows into the ContactEmployers table which
models the many-to-many relationship between Contacts and Employers.
The query below does this by joining the MasterTable to Contacts on
the Address, LastName and Firstname columns and to the Employers table
on the Employer columns. The ContactID values from Contacts and
EmployerID values from Employers are inserted into the two columns of
ContactEmployers.
INSERT INTO ContactEmployers (ContactID, EmployerID)
SELECT Contacts.ContactID, Employers.EmployerID
FROM (Contacts INNER JOIN MasterTable
ON (Contacts.Address=MasterTable.Address)
AND (Contacts.LastName=MasterTable.LastName)
AND (Contacts.FirstName=MasterTable.FirstName))
INNER JOIN Employers ON MasterTable.Employer=Employers.Employer;
The above does assume that the imported data is completely consistent,
which might not be the case. If you'd like a copy of the demo itself,
which is made up of an Excel file from which the data is imported and
the Access file which decomposes it using the above queries mail me
at:
kenwsheridan<at>yahoo<dot>co<dot>uk
Ken Sheridan
Stafford, England
I have been out of the game for some time. I am importing a text file into a
database that then needs to be fixed up. I need all the records whether they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.
This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got with
the SQL. Please help!
Freshly imported table has fields:
LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone
The new table has the following fields:
Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)
SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " & [Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null
Forever Indebted,
...Dana