Finding Similar entries in one column in another

D

Duguid1

Hi,
I am relatively new to using Excel with vast amounts of data and was after
any suggestions which could help me complete a problem that I have..

I have two columns with descriptions in (approximately 10 words). I need to
find descriptions from one column (Sheet 1,column C) which are similar to the
other column (Sheet 2,Column A).For example,if I could display the number of
words which are similar to both columns I could then find the ones with the
largest number of similar words.

For example, the two columns could look like...

Sheet1,column C....
Description
300g Fish fillets

Sheet2,column A
Description
Fish fingers 300g

This would then give the result that there were 2 similar words.

Any help would be much appreciated as to how to carry this out...
 
R

Recursive Error

Hi,
I am relatively new to using Excel with vast amounts of data and was after
any suggestions which could help me complete a problem that I have..

I have two columns with descriptions in (approximately 10 words). I need to
find descriptions fromonecolumn(Sheet 1,columnC) which aresimilarto the
othercolumn(Sheet 2,ColumnA).For example,if I could display the  numberof
words which aresimilarto both columns I could then find the ones with the
largest number ofsimilarwords.

For example, the two columns could look like...

Sheet1,columnC....
Description
300g Fish fillets

Sheet2,columnA
Description
Fish fingers 300g

This would then give the result that there were 2similarwords.

Any help would be much appreciated as to how to carry this out...

Solution:

Tokenize the word lists [using custom VBA -- see footnote -- or a
series of FIND functions] so you have one word per column.

Now, each row in list one needs a search result for each row in list
2.

This will return the hit count of words in List1 found in each row of
List2 (array formula)

{=SUM(COUNTIF(OFFSET(List2,J$1-1,0),($B2:$G2)))}

List2 is a named range that points to /only the first row/ of the
tokenized list of words from the second list.

The OFFSET piece increments which row is actually scanned. J$1 etc.
contains the row offset. 1 means scan the first row of List2. 2 means
scan the second row if List2, etc.

$B2:$G2 is the tokenized word list for List1.

Don't forget it's an array formula! (Ctrl+Shift+Enter).

Footnote : I can post the VBA code module and a workbook that
demonstrates how the formulas work on my website later today if you
like. The VBA code is not strictly necessary, but it makes tokenizing
the words much easier.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top