Company Name and Normalization

C

Carlos Alvarez

I maintain a database which collects thousands of
organization names submitted by individuals. In looking
for matches of organization names, the challenge lies in
the multiple variations of names which a query usually
doesn't match. Examples of this would be "IBM" vs
"International Business Machines", "UConn" vs "University
of Connecticut", "FDA" vs "Food and Drug Administration".

A normalized table with a primary key would be ideal, where
a user could then choose from a combo box of organization
names to identify an organization they are affiliated with.

I know I can't be the first one to be dealing with this
issue, so I am looking to draw on experience from a pro here.

Thought about perhaps looking for a subscription service
that would provide organization name, address(city, state
would suffice), and Federal Tax ID Number(which would be
used as primary key) and include monthly updates, but I
really don't know if this type of subscription even exists.

Any advice on this?

Thanks,

Carlos
 
J

Jay Vinton

There are companies that can help with this. It's not cheap but, if your db is big enough, it may make sense. They will remove dupes, fixup & verify addresses, add Plus4 zip code, etc. Search on "database cleaning" or similar.

It may also make sense to roll your own. If your area of concern is limited, you can do your own brainstorming on all the likely ways to spell/misspell/abbreviate common items, such as FDA, F.D.A., F D A, etc. and run your own batch job at night that will make corrections. You'll probably want to write something like this in C.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top