Fuzzy Search Against List Of Company Names?

P

PeteCresswell

I've got a client that does bond and equity trading on behalf of
various "funds".

Some of these funds are owned by groups that do not care to invest
certain companies.

Each of those groups supplies an explicit list of companies that they
do not want to invest in.

The traders don't have to make judgement calls. All they have to do
is check to see if a company is on the list before buying into it.

But if there are a lot of lists and/or some lists are very long - and
not always alphabetically
sequenced - it becomes a problem.

In addition, the name of the company that the trader wants to buy
might not be spelled/rendered quite the same as it might be on
somebody's list.

What they want is a quick/easy way to check the lists.

Something like:
--------------------------------------------------------
- Trader specifies which group they're buying for.

- Trader enters the name - or some fragment thereof - of the
company they're thinking about buying into.

- The application presents a list of forbidden companies -
hopefully less than a dozen - based on
some sort of fuzzy matching against the list.

- The trader eyeballs that short list to see if the company they're
about to trade
is on it.
---------------------------------------------------------

Before I run off and develop this as an MS Access application, I
thought I'd ask around to see
----------------------------------------
- If I'd be re-inventing the wheel

- What the best approach would be if/when I actually do it.
----------------------------------------

Seems like such a common situation that I'd be suprised if there
weren't a number of canned solutions out there.

Anybody have some thoughts on this?
 
J

John W. Vinson

In addition, the name of the company that the trader wants to buy
might not be spelled/rendered quite the same as it might be on
somebody's list.

Can't they provide the ticker or the CUSIP? That will at least be unique and
unambiguous. I'd hate to try to manage company names if the input is freeform
uncontrolled text input... there are just too many duplicate or near-duplicate
names!

John W. Vinson [MVP]
 
U

UpRider

Each of those groups supplies an explicit list of companies that they
do not want to invest in.

Tell the dimwits to use the Committee on Uniform Securities Identification
Procedures
CUSIP - that's what it's for.

UpRider
 
P

(PeteCresswell)

Per John W. Vinson:
I'd hate to try to manage company names if the input is freeform
uncontrolled text input.

You and me..... -)

But that's what we've got.

Ticker was the first thing that came to my mind - and they've got
that.... but for some reason it doesn't work for them. I leaned
on one of their team a little bit, but didn't get anywhere.

I think I'll put ticker and CUSIP on the search screen anyhow -
as drop down lists.

Right now, it looks like we'll just concoct some SQL on-the-fly
depending on what the user types into the freeform text search
box.

Bank America ==> LIKE *BANK* OR LIKE *AMERICA* OR LIKE
*BANKAMERICA"

"Bank Of America"? Haven't thought it through yet... but we'll
need to do something with more than two words..... maybe parse
them out and then concatenate them in every possible order.

The desired result is to give them more, rather than fewer
possible hits.


CUSIP, I've found tb dicey in prior applications - especially
with bonds where apparently the same bonds (i.e. the same issuer
name) have different CUSIPs depending on various properties.
 
B

bsmith59

Per John W. Vinson:


You and me..... -)

But that's what we've got.

Ticker was the first thing that came to my mind - and they've got
that.... but for some reason it doesn't work for them. I leaned
on one of their team a little bit, but didn't get anywhere.

I think I'll put ticker and CUSIP on the search screen anyhow -
as drop down lists.

Right now, it looks like we'll just concoct some SQL on-the-fly
depending on what the user types into the freeform text search
box.

Bank America ==> LIKE *BANK* OR LIKE *AMERICA* OR LIKE
*BANKAMERICA"

"Bank Of America"? Haven't thought it through yet... but we'll
need to do something with more than two words..... maybe parse
them out and then concatenate them in every possible order.

The desired result is to give them more, rather than fewer
possible hits.

CUSIP, I've found tb dicey in prior applications - especially
with bonds where apparently the same bonds (i.e. the same issuer
name) have different CUSIPs depending on various properties.

I'm not sure if this will work based on the use case you've provided,
but hopefully so. I had a similar problem recently with two huge
lists of customer names I needed to match. I created a function that
essentially walks backward through each name in a given set character
by character and stops when it hits a single match in the other
customer name set. If it gets to zero matches, it backs up to the
previous result set (i.e, if it went from two matches to zero, it goes
back to the two matches to present to the user). I made it to do bulk
processing, but the same logic could be applied to an on-the-fly
transaction as well I think. Performance might depend on where your
data sits. If this sounds helpful, let me know and I'll give you a
copy. I have to take out my data, make sure it's relatively usable
for you, that's why I'll need to know if it sounds like it would
work. I don't want to go to the trouble if it isn't. And if there
are others on this list that would like it, let me know and i'll just
post it for free. It was fun to create, and doesn't have to be for
company names, of course....any two sets of strings would work....

Cheers,
Brandon
http://accesspro.blogspot.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top