Merge Records From Multiple Sources

S

Steve

I have a people table of about 25 fields. The table is initially created each
year from 5 different sources. The records from each source are appended to the
people table. Any person may be in one or multiple sources. For those in
multiple sources, the data for a person may not be complete in any source. For
example, a person is in three sources; source#2 has the SSN and source#3 has the
membership#. For those people who are in multiple sources, I need to merge the
records of the person into one record that contains all the data for that person
that is available in all the sources that contain that person. From the previous
example, I need to add the membership# from source#3 to the record with the SSN#
in source#2.

I'm looking for some suggestions on a startegy for merging records.

Thanks!

Steve
 
J

John Vinson

I have a people table of about 25 fields. The table is initially created each
year from 5 different sources. The records from each source are appended to the
people table. Any person may be in one or multiple sources. For those in
multiple sources, the data for a person may not be complete in any source. For
example, a person is in three sources; source#2 has the SSN and source#3 has the
membership#. For those people who are in multiple sources, I need to merge the
records of the person into one record that contains all the data for that person
that is available in all the sources that contain that person. From the previous
example, I need to add the membership# from source#3 to the record with the SSN#
in source#2.

I'm looking for some suggestions on a startegy for merging records.

Update queries on joined tables will do it. Create a Query joining the
two tables; put a criterion of IS NULL on the SSN in the table you
want to update with the SSN; and update to

[othertable].[SSN]

or similarly for the other fields.

The problem may be - this presumes that you have some field that you
can count on to uniquely identify a person's record in all five
tables! Do you have such a field? I would *not* recommend using names
to do this; you could have two people who happen to have the same name
(which Steve Jones do you mean, the janitor or the CFO?); or you could
have a name in one table as "Steve Jones" and in the other as "Stephen
Jones".
 
S

Steve

John,

Thanks for responding!

My example over-simplified the problem. Suppose a person is in 4 sources. The
record I want is:
FName, LName, SSN, MembershipNum, SchoolDistrict, Position, City, State, Zip

The available data in the sources are:
#1: FName, LName, SSN, City, State, Zip
#2: FName, LName, MembershipNum, City, State, Zip
#3: FName, LName, SchoolDistrict, City, State, Zip
#4: FName, LName, Position, City, State, Zip

Steve


John Vinson said:
I have a people table of about 25 fields. The table is initially created each
year from 5 different sources. The records from each source are appended to the
people table. Any person may be in one or multiple sources. For those in
multiple sources, the data for a person may not be complete in any source. For
example, a person is in three sources; source#2 has the SSN and source#3 has the
membership#. For those people who are in multiple sources, I need to merge the
records of the person into one record that contains all the data for that person
that is available in all the sources that contain that person. From the previous
example, I need to add the membership# from source#3 to the record with the SSN#
in source#2.

I'm looking for some suggestions on a startegy for merging records.

Update queries on joined tables will do it. Create a Query joining the
two tables; put a criterion of IS NULL on the SSN in the table you
want to update with the SSN; and update to

[othertable].[SSN]

or similarly for the other fields.

The problem may be - this presumes that you have some field that you
can count on to uniquely identify a person's record in all five
tables! Do you have such a field? I would *not* recommend using names
to do this; you could have two people who happen to have the same name
(which Steve Jones do you mean, the janitor or the CFO?); or you could
have a name in one table as "Steve Jones" and in the other as "Stephen
Jones".
 
J

John Vinson

John,

Thanks for responding!

My example over-simplified the problem. Suppose a person is in 4 sources. The
record I want is:
FName, LName, SSN, MembershipNum, SchoolDistrict, Position, City, State, Zip

Ok, let's call this MasterTable... what's its Primary Key?
MembershipNum? If it has none... it's not a table, it's a random
collection of data which cannot be updated.
The available data in the sources are:
#1: FName, LName, SSN, City, State, Zip
#2: FName, LName, MembershipNum, City, State, Zip
#3: FName, LName, SchoolDistrict, City, State, Zip
#4: FName, LName, Position, City, State, Zip

Ok. Some painful questions here:

Table1 has a record for Fred Smith, SSN 445-44-4444, in Poughkeepsie;
and also a record for Fred Smith, SSN 454-44-4444, also in
Poughkeepsie.

Are they the same person? Is either one the same person as the Fred
Smith, SSN unknown, in Schenectady in the master table? If so, which
one? Or maybe it's the Frederick Smythe, SSN unknown, in Poughkeepsie

You MUST - NO OPTION!!! - have some way of uniquely identifying which
record is which. Names *are not such a method*. They're not stable,
they're not unique, and they're not reliable.

ASSUMING... and it's a heck of a big assumption - that the names are
reliable in this case, what do you want to do if Table1 has one
city/state/zip and Table2 a different City/State/Zip, and MasterTable
hase these blank? Maybe the person moved; which address should Access
use?

ASSUMING... even bigger assumption... that every record has FName,
LName, City, State and ZIP in all five tables, reliably and
accurately, you can construct five update queries. Create a unique
Index on these five fields in MasterTable. Create a Query joining
MasterTable to Table1 on the five fields and update MasterTable.SSN to

[Table1].[SSN]

Run the query by clicking the ! icon.

If you'll be doing this kind of exercise in the future, you may want
to look into a technique called "Replication". It's not for the faint
of heart, but it's designed so that you can make five replicas of a
database on five laptops, have people indpendently update all five,
and synchronize the data back into one table. See
http://www.trigeminal.com and search for Michka's Replication FAQ
there, it's the definitive word on the subject.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top