Vague match/match partial information

F

frank

I have two tables and each of table has several fields. The first table
contains information about news stories (date, time, headline etc). The
second table contains NYSE companies (company name and stock symbol).

What I want to do is to match the 1st table's news headline with 2nd table's
company name. I created a query with creteria like:

[1999]![Content] Like [NYSE Company Names]![Field4]
and
[1999]![Content] = [TSX Company Names]![Field4]

Both criteria return me a very small portion of the actual matched cases
because it matchs the exactly words for both field. In reality, Company name
often contains Inc, Corp, and LLC but these words are often omitted in news
headlines.

The other alternative I tried is from a previous post in this msg board. I
create a query with both field, under the news headline, i set criteria as:
Like "*" & [NYSE Company]![Company Name] & "*"

and under the company name, I set criteria as:
Like "*" & [News]![Headlines] & "*"

This seemed to be work in other's case but does not particularly work out in
my case. I get very few matched cases than it is supposed to.

Anyone know how to do a partially match? For example, Headline field says:
IBM donates 100 Million to abc fund" and Company name says:"IBM corporation".
The criteria will allow me to match the two fields as long as they have
something in common. In this case, both field have "IBM" and they should
count as a match in my definition. I know I could do this by using wildcard
"like" and reset key words one by one but the problem is it is way too much
work given that i have half million of news headlines and 4000 companies in
each table.

thanks in advance.
 
J

John Vinson

Anyone know how to do a partially match? For example, Headline field says:
IBM donates 100 Million to abc fund" and Company name says:"IBM corporation".
The criteria will allow me to match the two fields as long as they have
something in common. In this case, both field have "IBM" and they should
count as a match in my definition. I know I could do this by using wildcard
"like" and reset key words one by one but the problem is it is way too much
work given that i have half million of news headlines and 4000 companies in
each table.

Computers are very literal minded. What you're asking for requires
intelligence, probably beyond the capability of the program.

What if the headline said "Big Blue donates..."? You or I would know
that means IBM... but there's no match.

What if the headline said "New Cribmaker enters children's furniture
business"? That contains IBM.....^^^

The text strings you cite do NOT contain one another. You could (with
some VBA code) parse the company name into words, and use a criterion
of

HEADLINE LIKE "*" & strKeyword & "*"

but searching for "IBM Corporation" would find all headlines
containing "Corporation"... i.e. many many false hits!

I think you'll need either a commercial Textbase program (which can do
things like keyword extraction, common-word elimination, Key Word In
Context (KWIC) indexing, etc. etc. - or some very sophisticated VBA
code. A simple Access query is not going to be able to solve your need
I fear!

John W. Vinson[MVP]
 
F

frank

thanks for your reply John. The news headline database that I used is a
strict PR news database. It is not a commentary news source (such as Forbes,
BusinessWeek) but a news service for company public relation purpose. I am
pretty sure each news headline will have at least part of company's name
(e.g. it is more like IBM than IBM Corp). I've tried your suggestion
"HEADLINE LIKE "*" & strKeyword & "*"" as well, it works but it only matchs
the headline with exact company name (ie, IBM Corp. instead of IBM). Do you
know anyway to solove it? I am working on the company name list to remove the
words such as Inc. or Corp. As for the other case, a headline contains key
words IBM like "New Cribmaker enters children's furniture business", I have
no way to filter that out but leave it as it is. After all, I am doing a
vague match and a 75% + accurancy will do.
 
J

John Vinson

but it only matchs the headline with exact company name (ie, IBM Corp. instead of IBM).

Exactly. That's what I was trying to say. Computers are very literal
minded. If you search for "IBM Corp." that's what will match.

My suggestion (conveyed very briefly and too vaguely) was that you
should write some unspecified and perhaps complex VBA code to read the
company name string and *extract* the keywords from it, and search for
them. That is, you'ld have a function which could parse the string
"IBM Corp." and return "IBM", but not return "Corp.".

How you would handle cases like "Hewlett Packard" = "HP", "General
Motors" = "GM" I don't know. You may want to concentrate on creating a
table of CompanyNames related one to many to the Companies table,
containing all the searchable names for the headlines. This will
almost certainly need to be done manually for the most part.

John W. Vinson[MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top