Comparing Columns of Text

P

potter.justin

Hi,

Being a relatively new commer to VBA, I am wondering if there is a
good way to go about comparing columns of text in Excel. Basically I
have a list of company names from one database and I am comparing it
to a list of company names from another database. My problem is that
the entry in one database will be slightly different from another.
For an example one of the lists might have: Frank's Welding Company,
but the other will have something like: Frank's Welding Company, LLC.
As a result the Vlookup command will not return very many results. Is
there an easier way to go about this than say checking letter by
letter while ignoring specifics like llc, the and others?

Furthermore to compound my problem, the list I am comparing it against
is some 4,000 entries long which means it will take forever to run
through the list entry by entry. I have played around a little bit
with Find, but I am not fimiliar enough with the command to use it
properly. In otherwords can I tell it to Find "Fr" at the start of an
entry and therfore skip to that section in the list?

Any help will be greatly appreciated.

Justin
 
J

Joel

I would fix the database. The simpliest way of finding the entries that
don't match is to put both lists in the same column and sort the list. Then
remove duplicate entries which will be on adjacent rows. Hopfully there will
be only a few mis-matches.
 
P

potter.justin

I would fix the database. The simpliest way of finding the entries that
don't match is to put both lists in the same column and sort the list. Then
remove duplicate entries which will be on adjacent rows. Hopfully there will
be only a few mis-matches.









- Show quoted text -

That does not help me as the smaller list is a different list
everytime, with both databases constantly being updated and changed

Justin
 
J

Joel

I did a search for comparing column and found one of my old postings.

http://www.microsoft.com/office/com...5298&catlist=&dglist=&ptlist=&exp=&sloc=en-us

Your problem has been addressed a number of times. There really isn't any
good algorithm for best matching two sets of strings. It a problem
scientists have addressed for a very long time.

To minimize the problem, the best thing to do is to try to keep common names
the same in both lists a periodically check to make sure there aren't the
same name in both lists but speeled differently.

Now what should you do? I don't know the answer. Some people like to come
up with a list of the just the people who aren't in both lists. Then
highlight these names in both lists. Others like to put the lists
alphabetically side by side leavig gap when the two lists don't match.

I don't know how many differences you have between the two lists and how big
each list is to determine the best method. I also don't know your
requirements. Do you want to merge the lists or keep them seperate? Do you
want to sort the list by names or some other criteria.

I also don't know what your plan is when you find names that a spelled
differrently in each list.

I have worked with list for almost thirty years where this problem keeps on
reoccuring. I work more with number where people have entered wrong numbers
and had to find these bad entries. It is painful to keep multiple databases
consistant when humans are entering tthe data incorrectly.

Come up with a plan, and I will assit as necessary. Learning how to use the
find command is the right thing to do. You can check each list against the
other and highlight the ones that don't match or do match. But you still wil
have to manually check the list. With 4000 entries, this takes time.
 
P

potter.justin

I did a search for comparing column and found one of my old postings.

http://www.microsoft.com/office/community/en-us/default.mspx?query=co...

Your problem has been addressed a number of times. There really isn't any
good algorithm for best matching two sets of strings. It a problem
scientists have addressed for a very long time.

To minimize the problem, the best thing to do is to try to keep common names
the same in both lists a periodically check to make sure there aren't the
same name in both lists but speeled differently.

Now what should you do? I don't know the answer. Some people like to come
up with a list of the just the people who aren't in both lists. Then
highlight these names in both lists. Others like to put the lists
alphabetically side by side leavig gap when the two lists don't match.

I don't know how many differences you have between the two lists and how big
each list is to determine the best method. I also don't know your
requirements. Do you want to merge the lists or keep them seperate? Do you
want to sort the list by names or some other criteria.

I also don't know what your plan is when you find names that a spelled
differrently in each list.

I have worked with list for almost thirty years where this problem keeps on
reoccuring. I work more with number where people have entered wrong numbers
and had to find these bad entries. It is painful to keep multiple databases
consistant when humans are entering tthe data incorrectly.

Come up with a plan, and I will assit as necessary. Learning how to use the
find command is the right thing to do. You can check each list against the
other and highlight the ones that don't match or do match. But you still wil
have to manually check the list. With 4000 entries, this takes time.






- Show quoted text -

I figured this might be the case, but I was hoping there was an easier
way. What I have is a list (bidder list ~100 entries) created by an
online database that I then need to check with an insurance list. The
insurance list being the list of 4,000 entries. So I am taking a
company name from the bidder list and looking at the insurance listing
to see what kind of insurance the company has, do they meet our
insurance requirements? do they need additional insurance? and so on.
I might be able to do the sort idea, and just tag those records from
the bidder list as such. I will play with it some this weekend. Just
out of curiousity what is the syntax for find/is there a good API for
Excel VBA?

Justin
 
J

Joel

You are only comparing 100 against 4000. Check the 100 againstt the 4000 and
hightlight in two colors which one is found and which is not found. Then
manually check the ones not found to see if there in the list under some
other spelling.

Fancy way would be to break names into words and check each word. If you
had "Microsoft Inc." and Microsoft and match would be found.

Take Micosoft INC and put each word in a different column. Check each word
and highlight. Then manually check the colors. You would only havve to look
at a list of 100 items which really isn't bad.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top