COMBINING 2 DATA TABLES FROM 2 DIFFERENT SOURCES

C

Coastal Studies

I spend a lot of time working with property data from different counties.
1. There are usually multiple sales of the same property (same parcel ID)
that I want to combine.
2. The MLS has additional info for the same address but does not contain the
parcel ID. I want to combine the info from the MLS to the info from the
property appraiser's office. Both sets of data have the following similarity:
a. the same site address
b. similar sale date (sometimes listed 3-7 days different)
c. similar sale amount (sometimes a few dollars to a few thousand off)

I update this info monthly and use it to make charts and comparisons. How
can I make it easier on myself by using Access?
 
V

Vincent Johns

Coastal said:
I spend a lot of time working with property data from different counties.
1. There are usually multiple sales of the same property (same parcel ID)
that I want to combine.
2. The MLS has additional info for the same address but does not contain the
parcel ID. I want to combine the info from the MLS to the info from the
property appraiser's office. Both sets of data have the following similarity:
a. the same site address
b. similar sale date (sometimes listed 3-7 days different)
c. similar sale amount (sometimes a few dollars to a few thousand off)

I update this info monthly and use it to make charts and comparisons. How
can I make it easier on myself by using Access?


I do something similar with respect to matching my personal
check-register records with my bank's statement records, which I store
in separate Tables. I display, in Query Datasheet View, the unmatched
items from both Tables, sorted similarly. Sometimes I sort by date,
sometimes by ascending amount of money, sometimes by descending amount
of money; I could sort by payee's name, but usually I don't do that. I
may have 50 or 60 unmatched items when I begin.

The payees' names almost never match exactly. The amount of money
usually matches, but not always (finding these is a major purpose of
balancing the checkbook). The date almost never matches, as it usually
takes 2-4 days for an item to clear the bank.

I make the bank's dataset be read only ("Snapshot" Query), but I allow
editing of my check-register dataset.

With both datasheets displayed, I select what I think are matching
records, one in each datasheet. I then run a Macro that displays a Form
showing in detail the contents of each record, with (almost) matching
fields displayed side by side. Command buttons on my Form allow me
either to accept the tentative match or to cancel the matching request.

If I accept the match, the program checks to be sure that the date
cleared is no earlier than the date issued, and that the amounts of
money are equal. If not, it displays a suitable message and does not
mark the records as matched, allowing me to edit the check-register
record. If the request is valid, the program sets a field in one of the
Tables identifying the matching record in the other Table. It then
refreshes both datasheets, showing only the records that remain unmatched.

After a while, only truly unmatched records remain, reflecting either
recent items that I have written that have not yet cleared the bank, or
items such as interest payments or service charges that the bank has
applied that I had not listed in my check register.

Although the process may sound a bit tedious as I have described it
here, most of the work that I did was in writing the code. Actually
using it is pretty easy, and it takes only a few minutes to link dozens
of not-too-closely matching records, a process which once was truly
tedious and time consuming (before I wrote the Access system).

Your MLS and appraser's lists can probably be compared and linked using
a system similar to mine. You might want to use slightly different
rules, perhaps, but the principle is similar. At the least, I've found
that minimizing all the windows in the database except for the two
datasheets gives me a good view of the records to be matched, and then
using a large Form that displays the fields side by side for a tentative
match lets me look at the matching records in detail to be sure I'm
correct, and you might want to do that, too.

If you want further details about how I implemented this, I'll be happy
to supply them, but I'm afraid my database wouldn't be directly usable
for what you're doing (it's oriented toward financial accounts). It's
also not suitable for public use, as I'd need to include a large Help
file, which I've never written.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
E

Ed Warren

I work as a data consultant for the local country. You pose an interesting
issue.
How do you access the county information, through a web interface or do you
have direct read access to their data files?

If you can get to some type of legal discription of the property in each
case you have a bit more to go on.
Range, township, section, quarter section, etc. and/or lot number 'xx'
subdivison 'yy'
Fields leading to a possible match are owner (name, address, telephone
number, legal description, lot size)

In our county 'parcel' information resides in: Assessor data (parcel id,
legal description, Owner's Address, land valuation, structure valuation,
etc), Treasurer data( parcel id, taxes assessed, taxes paid, taxpayer info
(not necessarily owner), GIS (parcel id, drawring references, pictures),
Sheriff Office: 911 call center:: (Occupant Name (not necessarily the
owner), telephone, street address)

So the short answer is:

For any automated solution you will need to write an procedure for matching
the parcels
that procedure WILL produce the following:
Matches that are valid
Matches that should be invalid
Non-Matches that should be matches.
The problem is: No amount of logic will solve the problem of identifying
which group a particular match really belongs to, so one can only state the
probabilty of a valid match or non-match, e.g. 90% probable that a match
will be valid and/or 15% chance a non-match is really a match. Then you, the
consumer of the 'information', need to assess the risk of making an error in
the matching.

Lots of luck

Ed Warren.
 
C

Coastal Studies

Ed,
Most of the property appraiser info for the counties I work with is
available through searches on their websites. The problems I deal with are:
1. County by county, the available data online varies.
2. The searching capabilites is sometimes extremely limited and...
3. It is always difficult to get the most recent data

My solution has been to call the PA's office, have them email me the
complete data file (usually in Excel) and then to make it conform to my needs.

I have become adept in Excel, but I have not learned VB. I know exactly what
I need the data to do, I just can't make it do it without lengthy
manipulation in Excel.

The information I seek, whether from the MLS (Multiple Listing Service) or
the PA is the following:

Sales Date
Sales Amt
Site Address
Parcel ID
Section, Range & Township (to verify that the parcels are located in the
precise area that I am studying, ie, aqdjacent to the coastline)
Sq Footage
*Frontage, Depth and Width (only available on the MLS-this is important when
looking at vacant lots and this is one of the main reasons I need to combine
the MLS data with the PA data.)
Unit number(-another piece of info left out on PA data, and only in MLS)

In Excel, I can sort by Site Address, Sales Date, then by Sales Price with
both lists (MLS & PA) and then I have to manually go down about 4000 or more
records and match them up. As I mentioned, the sales date and sales price
vary sometimes and are not exact, but there is enough info to match them up
manually in Excel.

Any suggestions? I do not have this imported into Access yet, and I have
someone locally who will build the forms, I just need to give him a better
idea of what I need and I'm desperate to get this done by next week! Thanks
for any help.

Suzette
 
C

Coastal Studies

Vincent,

Sorry about the multiple posts, I didn't know the "ground rules!" I'm here
to learn though, so thanks.

Regarding your post, I am comparing sometimes up to 12000+ records. It takes
me hours to do this in Excel and I am actually considered an Excel whiz...
not by your group I'm sure, but, I do not know how to write macros or to
query in Excel and I want to go to Access. My issue is that this is a a very
large, new client. I simply need the results of all my work so that I can
analyze the numbers and present them with some market demographics. It takes
80% of my time just to get the basic summary numbers. (PLEASE READ MY REPLY
POST TO ED)

If you can give me info to pass on to the guy who will be setting this up in
Access, I am going to be learning as I go. (I'm pretty quick, but in a week,
I'm not going to have time to learn and perform the rest of the study!)

Thanks again.
Suzette
 
E

Ed Warren

Actually you are a lot further along than you think you are.

Write your 'matching rules' into Psuedo code. Sort of like the following.

For County X
1. if Field1 matches MLS field2 then mark as a match" Strength = 10
2. For records not matched parse out Field2 into Sale Date Strength = 9
if Sale Date matches within 5 days and field6 contains any three
characers that match field 4 of MLS then mark as a match.
rule Strength = 8
rule
rule

For County Y
rule
rule
rule

Then you will be well on your way to being able to write your queries that
will create the matches automatically for you in Access.

In Access you will end up with a table something like

Match IDCounty ParcelID MlsListingID Strength of Match
1 X 9999111 12234 10
2 X 981245 13567 3

With this table you will be able to pull out anything you need from the
County data and tie it to the related MLS rows. Then sort by the strength
of your match. You may want a form to compare the automatic matches row by
row and/or create matches for the rows that were not matched.

Access is much better suited to do this that excel so you're in luck.

As you have stated you have someone locally to help, the 'code' above is a
way to 'talk' to someone who understands databases.

Ed Warren.
 
V

Vincent Johns

I mostly agree with Ed Warren -- if you can come up with rules like
what he suggested, that would be helpful.

Where I might differ just slightly is regarding his "much better
suited" comment. Yes, Access can do a great job of keeping track of
your information. But ...
- you have only a week
- you're already familiar with Excel
- with some care (make frequent backup copies!!!), Excel (as well as
Access) can help you organize your data

For example, what you might be able to do (and since you have lots of
records, this could involve trimming down your worksheets) is to display
your two lists side by side, sorted according to Ed's "strength of
match" values. You could add a column to each list that you can use for
recording matching records. For example, you could enter an "x" in that
column in each list to indicate "this is one of the matching records".
Then you could run a macro to replace each "x" with the record number of
the matching record in the other list, and then re-filter both lists to
hide those that you've already matched.

With sufficient experience, you might be able to get good results via
an automatic matching system, but I wouldn't want to predict that I'd
finish something like that in a week... maybe a couple of months, and it
still wouldn't be perfect. I don't want to sound discouraging, but
these records were created by people who didn't care that the contents
would match the other list, and you have no control over that process.
But you can take some of the drudgery out of it by displaying likely
matches close to each other on your screen and minimizing the physical
actions needed to record each match.

Access provides more safeguards than Excel against accidentally
changing stuff you want to leave untouched. (Excel does allow you to
protect part of your worksheet, and that may help.) If you use Excel,
be careful not to sort one column while leaving its neighbor unsorted --
that basically converts your whole worksheet to garbage, and it's a
"feature" that is not part of Access. So, if you have time, by all
means try to use Access. But if not, you may be able to do a lot of
what you want, at least for right now, in Excel, and convert to Access
as you have time.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 
E

Ed Warren

I agree with Vincent, and in the very short time allowed I probably would
go for the quick and dirty answer.
I would would import two tables into Access, build three forms (one for each
table and one for the key values of each table) to show the fields of
interest and hide those that are not of interest. I would use the Filter by
form and Sort functionality (tools on the access tool bar) to show the
records of current interest, then cut and paste to copy the key field of one
to the key field of the other (third form).

and/or

Keep everything in Excel and do what you've been doing.

Please keep in mind, you expert knowledge, gained over a period of time, is
a valuable asset.

Ed Warren.
 
C

Coastal Studies

Vincent,

Thanks. I actually had already done what you suggested about a week ago. I
added 2 columns, got down to only 4000 records. First I numbered all the
parcel ID records,then I sorted by site address, then date, then amt. Then I
marked the MLS records with the same corresponding number as it's parcel ID
counterpart (whcich was in the adjacent column), then I resorted by number.
That is where I was until I found out that this PA does not use the range,
section, township as the first 6 numbers in the ID. T had to call the PA's
office and they resent me the entire file AGAIN, with the additional columns
for each record with the Section, range & township.

NOW, I have to go back, replace the first set of data with the new data,
BEING SURE NOT TO LOSE THE WORK I'VE ALREADY DONE! I plan on sorting out just
the Parcel ID records (I usually hilight in a bright green or yellow so that
when sorted, the old list stands out) and then sorting by Parcel ID and at
that point, I'm not sure if I'll have to manually combine the upper & lower
row info that is missing (I'll have the same process with the MLS info
combining) or IF YOU HAVE ANOTHER SUGGESTION?

I do this stuff in hours, not months. I have gotten pretty good at it and
it usually takes about 20-30 hours to complete the entire study. I AM HOPING
TO CUT THAT IN 1/2 OR 1/3!!!

Suggestions?
 
C

Coastal Studies

PLEASE SEE MY LAST RESPONSE TO VINCENT.

Ed Warren said:
I agree with Vincent, and in the very short time allowed I probably would
go for the quick and dirty answer.
I would would import two tables into Access, build three forms (one for each
table and one for the key values of each table) to show the fields of
interest and hide those that are not of interest. I would use the Filter by
form and Sort functionality (tools on the access tool bar) to show the
records of current interest, then cut and paste to copy the key field of one
to the key field of the other (third form).

and/or

Keep everything in Excel and do what you've been doing.

Please keep in mind, you expert knowledge, gained over a period of time, is
a valuable asset.

Ed Warren.
 
E

Ed Warren

Sounds like where you want to be is to have a table something like the
following

Table:: MatchedRecords

CountyKey (parcelID) ==> MLSKey() in matching cases I also add a
'strength of match', then make a decision to look at any strength <= x
and/or remove/ignore any <= y

Once you get them matched you can pull whatever you want for either/both
tables for further analysis.

Again the issue is how to automate the knowledge you have aquired into an
algorithm that a database can understand.
Then build a form(s) to display the automated matches for human review in a
quick and efficient manner.

I expect you will have to have a different set of rules for each county.

Ed Warren.
 
V

Vincent Johns

Ooh! So a lot of your work is already done (for now)!

What I'm about to say may result from my limited knowledge of Excel, but
I'm not sure, in Excel, how to automatically link records from different
lists and update a linking field to show that they match. This is
trivially easy in Access, so maybe this is a good time to just import
your Excel lists (the old ones, that you've already mostly matched) into
Access Tables, and import the new, augmented list into another Access
Table. You'll then have information in these Tables that exactly
matches, for all but 400 records, and the last 400 you can handle
manually in a few hours, I expect.

(By "manually", I mean that you can display apparently related records
and mark those that you can identify as probably matching by setting
values in fields in the Forms displaying those records, as Ed Warren and
I suggested earlier.)

You might want to add a "Notes" field (Text or Memo data type) to one or
more of your imported Tables to record special difficulties that you
might encounter with a record, that might occasion a phone call to
resolve. Then when you have time you can track down the 20 or 30
records that just seemed to have nothing at all even close to matching.

Since you've gone through the (what I called manual) process before, you
know what's involved in the matching. If you'd like to post here some
(5 or 6) example sets of records and identify which ones match and maybe
why they do, if it's not obvious, we might be able to suggest ways to
get Access to do more of the work for you.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top