Compare two strings

A

andy62

I am wondering if there is any trick for identifying how much of a match two
text strings are? In my case, the two strings would start the same, but then
differ somewhere in the middle or the end. If I could identify the character
position where the difference occurs that would help me - and then maybe
divide that number by the length in characters of the first string to get a
percentage.

TIA
 
D

David Billigmeier

This one will work... it's a long one. It will output the position of the
last 2 characters that match. It assumes the "first" string is in A1 and the
"second" is in A2. To finally get the percentage value divide the whole
thing by LEN(A1). Also, make sure to commit it with CTRL+SHIFT+ENTER:

=MAX(ROW(INDIRECT("1:"&MAX(LEN(A1),LEN(A2))))*(MID(A1,ROW(INDIRECT("1:"&MAX(LEN(A1),LEN(A2)))),1)=MID(A2,ROW(INDIRECT("1:"&MAX(LEN(A1),LEN(A2)))),1)))
 
P

Pete_UK

Here's a User-Defined function which will do the job:

Function test_string(first As String, second As String) As Integer
Dim first_len As Integer
Dim second_len As Integer
Dim i As Integer
test_string = 0
first_len = Len(first)
second_len = Len(second)
If (first_len = 0 Or second_len = 0) Then test_string = -1: Exit
Function
For i = 1 To Application.WorksheetFunction.Min(first_len, second_len)
If Mid(first, i, 1) = Mid(second, i, 1) Then
test_string = i
Else
Exit Function
End If
Next i
End Function

Just copy/paste it into a normal module.

Use it like the following:

=test_string(A1,B1)

where you have strings to be tested in A1 and B1. It will return the
number of consecutive matching characters as an integer, or -1 if one
of the cells is empty.

Hope this helps.

Pete
 
A

andy62

Hmmm . . . maybe I am doing something wrong, but this formula seems to find
the last 2 characters that match regardless of whether any of the preceeding
ones match. In one of the ones I tested, two text strings both had an "s" in
position 87, although none of the preceeding words or characters matched. I
need to determine the position where every preceeding character matched, but
the remaining ones do not. Sorry if I am doing something wrong, or if my
original explanation was unclear.
 
T

Toppers

David,
On my testing:

Incorrect match: doesn't pick up incorrect "a" in sentance

This is a very long sentence to test for match
This is a very long sentance to test for match

Correct match: finds missing "s" in sentence

This is a very long entence to test for match
This is a very long sentance to test for match

... and here too...

This is a very long entence to test for match
This is a very long sentence to test for match


and it outputs postion of last character that matches (20 in correct match
above but 46 in incorrect one).

Comparing ...

abcdfg

axcdef

returns an answer of 4 rather than 1.

Over to you!
 
D

David Billigmeier

Yep you are right, my formula should have included something to stop checking
after it finds an incorrect match. Although my definition of the formula was
correct: "It will output the position of the last 2 characters that match,"
it will only fulfill the subject's request if the 2 strings are complete
opposites after the first non-match. Sorry folks!

You should use Pete's UDF instead.
 
P

Pete_UK

Hi Toppers and David,

thanks for your support!

I was in the middle of posting my reply when I realised my (original)
UDF would work in the same way as you have now reported that David's
formula does - I added the:

Else
Exit Function
End If

at the end of the For .. Next loop to prevent this, just before
posting.

Let's hope the OP tries it for himself.

Pete
 
R

Ron Rosenfeld

I am wondering if there is any trick for identifying how much of a match two
text strings are? In my case, the two strings would start the same, but then
differ somewhere in the middle or the end. If I could identify the character
position where the difference occurs that would help me - and then maybe
divide that number by the length in characters of the first string to get a
percentage.

TIA

The following **array-entered** formula will return the number of the first
character that does not match.

To **array-enter**, after typing or pasting the formula, hold down
<ctrl><shift> while hitting <enter>. Excel will place braces {...} around the
formula:

=MATCH(FALSE,LEFT(A1,ROW(INDIRECT("1:"&LEN(A1))))=LEFT(A2,ROW(INDIRECT("1:"&LEN(A2)))),0)

To obtain the "percent" as you describe, merely divide that result by the
length of the first string:

=MATCH(FALSE,LEFT(A1,ROW(INDIRECT("1:"&LEN(A1))))=LEFT(A2,ROW(INDIRECT("1:"&LEN(A2)))),0)/LEN(A1)

The formulas are NOT case sensitive.

For a case-sensitive formula, use:

=MATCH(FALSE,EXACT(LEFT(A1,ROW(INDIRECT("1:"&LEN(A1)))),
LEFT(A2,ROW(INDIRECT("1:"&LEN(A2))))),0)

The formulas may give an NA result if A1 is shorter than A2.

There are a variety of ways of handling the problem depending on the results
you want:

=MATCH(FALSE,LEFT(A1,ROW(INDIRECT("1:"&MAX(LEN(A1:A2)))))=LEFT(A2,ROW(INDIRECT("1:"&LEN(A2)))),0)

Of course, when you divide that value by the length of the first string, you
will get a value slightly greater than 1, so if you choose to do this, you will
have to decide how you want to compute the percentage match.

Perhaps:

=MATCH(FALSE,LEFT(A1,ROW(INDIRECT("1:"&MAX(LEN(A1:A2)))))
=LEFT(A2,ROW(INDIRECT("1:"&LEN(A2)))),0)/MAX(LEN(A1:A2))

to divide by the longest of A1:A2.


--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top