Finding the nearest match without reusing results

G

goofy11

I'm hoping someone can provide me with some vba to make quick work of my
task. I have 2 tables (call them table1 and table2) that are setup like:

Store State Sales
1 MO 2,120
7 TX 3265
12 CA 4565
37 TX 3375

Both tables have these same fields, but each table has different store
numbers (no duplicates between the tables).

I want to add a new column to table1. For each store in table1, I want to
return the store number from table2 that is the closest match to itself based
on State and Sales. To elaborate further, I want to find a store that is in
the same state, and is closest in sales to itself. For the sales parameter,
it doesn't matter how high or low it has to go.....it just needs to return
the closest match.

The last condition, is that a result can only be used once. So if store 7
found that store 325 was the closest match, and then later, store 37 also
found that 325 was the closest match for it, the code would have to look for
the next best match, and so on. Table 1 would then look like:

Store State Sales Table2 match
1 MO 2,120 56
7 TX 3265 325
12 CA 4565 68
37 TX 3375 652

Any help is appreciated.
 
D

David Hilberg

Hi Goofy11,

Since no-one has posted VBA yet, would you like a working formula? As
you requested, it requires an exact match for State, then returns the
store number representing the closest unused match for sales (whether
higher or lower).


=INDEX($A$9:$A$12,MATCH(
MIN(
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)
*(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))
+MAX($C$2:$C$5,$C$9:$C$12)
*(($B2<>$B$9:$B$12)
+(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1)))
),
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)
*(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1)),
0))


It's on several lines to show that some expressions are similar or the same.
It is an "array formula" that must be entered with Ctrl+Shift+Enter, not
just Enter, or you'll get #VALUE!
If a match cannot be made it returns #N/A.

In my test data,
Table1 = A2:C4
Table2 = B9:C12
Formula was entered in E2:E4.
In E1 (just above the formula) there must be a header (such as "Best
Unused Match").

I hope this is useful!

- David
 
D

David Hilberg

Correction, formula should be longer:

=INDEX($A$9:$A$12,MATCH(
MIN(
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)*
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+
MAX($C$2:$C$5,$C$9:$C$12)*
(($B2<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1)))
),
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)*
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+
MAX($C$2:$C$5,$C$9:$C$12)*
(($B2<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1))),
0))

This takes into account that a store in Table1 might have *exactly* the
same sales as a store from the same state in Table2.
Still enter with Ctrl+Shift+Enter.
Formula has form:

=INDEX(Stores2, MATCH( MIN( RemainingSalesDeltasAcrossCurrentState ),
RemainingSalesDeltasAcrossCurrentState, 0 ))

- David
 
D

David Hilberg

Correction #2. Formula should be longer still:

=IFERROR( INDEX($A$9:$A$12, MATCH(
MIN(
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)*
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+
999999999*(($B2<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1)))
),
($B2=$B$9:$B$12)*ABS($C2-$C$9:$C$12)*
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+
111111111*(($B2<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1))),
0)),
"No Match")


Notes
- The IFERROR function is newly available in Excel 2007. I don't see a
way around using it -- the alternative is a formula that is anyway too
large for Excel 2003.
- The two main blocks of 4 lines are the same except for the numbers
999999999 and 111111111. (Use any two #'s, not equal, that surpass the
largest Sales figure.)
- When all State candidates have been exhausted, "No Match" is displayed.
- See previous posts for other notes.


This formula works best of all -- I believe it covers all relevant cases
-- and will be my last effort, unless you have any questions.

- David
 
G

goofy11

WHOA!!!! That is a beast of a formula! I thought I was fairly handy with
array formula's and I use Index/Match quite a bit, but this thing is a
monter. You kind of lose me on all the MMULT and TRANSPOSE stuff, but I
thought I'd be able to replicate it. Unfortunately when I dropped it into my
workbook, it is finding no results. I have table1 on Sheet1, and table2 on
Sheet2. I structured it so I'm using the same columns as you are below, but
I will add one point of clarification: when I proposed my question, the
example I showed had column B containing states. In actuality, column B will
either be populated with "yes" or "no". I would think it would work the same
either way, but since I'm experiencing difficulty, I thought I'd mention that
in case there's something I'm missing.

I am using Excel 2007 (thank goodness). Here is what the formula looks like:

=IFERROR(INDEX(Sheet2!$A$2:$A$3082,
MATCH(MIN(($B2=Sheet2!$B$2:$B$3082)*ABS($C2-Sheet2!$C$2:$C$3082)*(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+999999999*(($B2<>Sheet2!$B$2:$B$3082)+(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1)))),($B2=Sheet2!$B$2:$B$3082)*ABS(Sheet2!$C$2:$C$3082)*(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+111111111*(($B2<>Sheet2!$B$2:$B$3082)+(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1))),0)),"No Match")

Do you see anything that looks wrong?
 
G

goofy11

David,

Does this formula require that both tables (ranges) be on the same
worksheet? I'm asking because I rearranged my data so that they were on the
same worksheet, arranged how you had set up your test data and.........IT
WORKED!!! I don't understand how it works, but it does. Anyway, thanks a
bunch. I now bow to the array formula master. :)
 
D

David Hilberg

Goofy11 - I took your points in order, but skip to the end for the typo
in the function.
Unfortunately when I dropped it into my
workbook, it is finding no results.

By "no results" I assume it is returning "No Match", which signifies an
error--either a "legitimate" lack of matches, or something else. Or is
it returning nothing at all? If nothing at all, perhaps it is not done
calculating.

I have table1 on Sheet1, and table2 on
Sheet2.

....shouldn't be a problem...
when I proposed my question, the
example I showed had column B containing states. In actuality, column B will
either be populated with "yes" or "no". I would think it would work the same
either way,

It should work the same, as long as the "yes" and "no" text in both
tables shares the same [non-]capitalization, [non-]leading/trailing
spaces, etc.
I am using Excel 2007 (thank goodness).
Since I don't have Excel 2007, I am trusting to Microsoft's description
of the IFERROR function, and my own tests without it. My tests work as
far as they go, but I can't trap errors, so that #N/A is displayed the
first time there is no match, which would be OK except that the lower
results rely on the results above, and will themselves propagate #N/A
the rest of the way down. You can see this by stripping ISERROR away, so
you have =INDEX ... 0)) .
Here is what the formula looks like:

=IFERROR(INDEX(Sheet2!$A$2:$A$3082,
MATCH(MIN(($B2=Sheet2!$B$2:$B$3082)*ABS($C2-Sheet2!$C$2:$C$3082)*(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+999999999*(($B2<>Sheet2!$B$2:$B$3082)+(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1)))),($B2=Sheet2!$B$2:$B$3082)*ABS(Sheet2!$C$2:$C$3082)*(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)=COUNTA($E$1:$E1))+111111111*(($B2<>Sheet2!$B$2:$B$3082)+(MMULT(--(Sheet2!$A$2:$A$3082<>TRANSPOSE($E$1:$E1)),ROW($E$1:$E1)^0)<>COUNTA($E$1:$E1))),0)),"No Match")

Do you see anything that looks wrong?

Yes. The second instance of the ABS function should be the same as the
first.
ABS($C2-Sheet2!$C$2:$C$3082)
and not:
ABS(Sheet2!$C$2:$C$3082)

Other than that, it looks good to me. I assume it was successfully
array-entering in E2, and that just above it is the necessary title (E1
mustn't be blank). But with over 3000 rows, I don't believe all of your
data can be handled by my formula, even after we get the first instance
working. The further down the formula is copied, the more work it has to
do -- creating huger and huger working arrays, 3082 x E-range. I should
have asked the size of your data set! Your instincts were right -- VBA
is the way to go for a permanent solution.

Still, I would be interested to know if you get the first instance
working, and how far down you can copy it without prohibitive
calculation times!

- David
 
D

David Hilberg

Goofy11,

When I test across 2 worksheets, I have no problems. But perhaps in your
rearranging, the omission in the 2nd ABS function was corrected (see
previous post)?

I'll take your bow & redirect it to Harlan Grove et al, from whom half
of these techniques were poached!

BTW, how far down can the formula be copied before calculation time
becomes prohibitive?

- David
 
G

goofy11

Luckily, I only had 49 stores that I needed to copy this down (there were
over 3,000 stores where it looked for a match). Copying this down 50 rows
was fast, just a couple of seconds).

To satisfy your curiousity I ran a test last night to see what happens with
more rows. I broke my tables up into similar sizes (1,564 and 1,566 stores
respectively). After dropping the formula down 1,566 rows, it took exactly 1
hour to finish calculating. There ended up being 310 stores with no matches.
I ran this on my Dell Latitude D620 laptop (Duo Core processors, 2GB RAM).

Thanks again for the help. Very cool formula........still don't know how it
works. :)
 
D

David Hilberg

Oh, only a couple of seconds for 49 formula instances--good to know.
An hour to calculate 1500 is a bit more cumbersome. Thanks for running
the test!

You're very welcome for the formula. It was an interesting challenge.

I noticed a simplification for the 4 sections that evaluate previously-
unused/used stores. Instead of:
MMULT ... <> ... = COUNTA($E$1:$E4)
MMULT ... <> ... <> COUNTA($E$1:$E4)
MMULT ... <> ... = COUNTA($E$1:$E4)
MMULT ... <> ... <> COUNTA($E$1:$E4)

One may use:
MMULT ... = ... = 0
MMULT ... = ... > 0
MMULT ... = ... = 0
MMULT ... = ... > 0

........still don't know how it works. :)

I'll leave my notes on the formula, in case you want to dissect.

[I'll assume you know the uses of not-equal-to (<>), exponentiation
(^), and arrays { }. And I'll assume you know, or could look up, how
comparing an array with a value generates an array of Trues and
Falses, that you can convert those to numbers by multiplying by
numbers, or must sometimes convert to 1's and 0's by doubly-negating
(--), that multiplying (*) two horizontal (or vertical) arrays of the
same length results in a horizontal (or vertical) array; but two
arrays of different lengths and same orientation can be multiplied (*)
or matrix-multiplied (MMULT) if the orientation of one is transposed
(the result being a rectangular array); that in using MMULT, the row
entries of matrix1 are multiplied by the column entries of matrix2 and
summed to form a single entry in the resulting array.]


*Current test formula, omitting the IsError function*

=INDEX($A$9:$A$12,MATCH(
MIN(
($B5=$B$9:$B$12)*ABS($C5-$C$9:$C$12)*
(MMULT(--($A$9:$A$12=TRANSPOSE($E$1:$E4)),ROW($E$1:$E4)^0)=0)+
999999999*(($B5<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12=TRANSPOSE($E$1:$E4)),ROW($E$1:$E4)^0)>0))
),
($B5=$B$9:$B$12)*ABS($C5-$C$9:$C$12)*
(MMULT(--($A$9:$A$12=TRANSPOSE($E$1:$E4)),ROW($E$1:$E4)^0)=0)+
111111111*(($B5<>$B$9:$B$12)+
(MMULT(--($A$9:$A$12=TRANSPOSE($E$1:$E4)),ROW($E$1:$E4)^0)>0)),
0))


*Formula*

IsError (Index( StoresInTable2, Match( Min(UnusedStore SalesDiffs
across CurrentState), (UnusedStore SalesDiffs across CurrentState'),
exactmatch)), "No Match")

*Errors*

IsError returns "No Match" for no match or for general snafu. Could
use IsNA to trap #N/A caused by no match, but not hide other errors
(e.g. #VALUE! caused by not array-entering formula). IsNA would have
to be nested as If( IsNA(formula), "No Match", formula), duplicating
the long Index formula.

*Arrays*

UnusedStore SalesDiffs across CurrentState:
1D array of Diffs interspersed with 999999999's in the first mention
and 111111111's in the second, but start w/Diffs and 0's:

( IsGoodState * AbsoluteDiff * IsUnusedStore)
Any diff with wrong State or with used Store is implicitly zeroed out.
($B5=$B$9:$B$12) * ABS($C5-$C$9:$C$12) * (MMULT ... = ... = 0)

Finally, Add the below array to the above to fill in zeroed diffs with
huge numbers, so MIN won't grab a filler zero instead of a sales diff.

+ HugeNumber*( IsBadState + IsUsedStore ):
1D array of 0's and huge numbers. 0's where the diffs are above, huge
numbers where the 0's are above.
+ 999999999 * (($B5<>$B$9:$B$12)+(MMULT ... = ... > 0))
or
+ 111111111 * (($B5<>$B$9:$B$12)+(MMULT ... = ... > 0))

HugeNumber:
E.g., 999999999 the first time, something different the next, e.g.,
111111111.
Each must be larger than the absolute value of the maximum Sales
figure over both tables.
Within MATCH, different HugeNumber from 1st array (inside MIN) to 2nd
array, so when no match should be available, i.e., MIN is a
HugeNumber, no match is found among the different HugeNumbers of the
2nd array.

Note: MIN Diff could still legitimately be zero, if sales1-sales2=0
for unused Store2 in a legit State. Such 0's are not replaced by huge
numbers.


*Evaluating Store list in Table2 as Unused/Used according to Previous
Formula Results in Col E*

IsUNusedStore: 1D array of True and False:
Compares the entries of the 1D array resulting from MMULT collapse to
0.
MMULT_result = 0
MMULT((EachStore <> StoresUsed), (1's list)) = 0
(MMULT(--($A$9:$A$12=TRANSPOSE($E$1:$E4)),ROW($E$1:$E4)^0) = 0)
Similarly for the T/F list IsUSEDStores, MMULT_result > 0.

(EachStore <> StoresUsed): 2D array of True and False
($A$9:$A$12<>TRANSPOSE($E$1:$E4))
Compare a long vertical list of all table2 store numbers (in A) with a
short horizontal list (need transpose) of the store numbers already
used (E), to form a 2D rectangular matrix. Each row has Trues and
Falses according to whether each store matches any already used. Up to
1 True per row.

List of 1's to collapse the 2D matrix above into a 1D list:
ROW($E$1:$E4)^0
{1;2;3;4}^0 = {1;1;1;1}
Required by MMULT.
Nice trick seen in a Harlan Grove post.


Okay, that is probably both more and less than you wanted to know, if
you wanted any explanation at all. :)

- David



Luckily, I only had 49 stores that I needed to copy this down (there were
over 3,000 stores where it looked for a match). Copying this down 50 rows
was fast, just a couple of seconds).

To satisfy your curiousity I ran a test last night to see what happens with
more rows. I broke my tables up into similar sizes (1,564 and 1,566 stores
respectively). After dropping the formula down 1,566 rows, it took exactly 1
hour to finish calculating. There ended up being 310 stores with no matches.
I ran this on my Dell Latitude D620 laptop (Duo Core processors, 2GB RAM).

Thanks again for the help. Very cool formula........still don't know how it
works. :)

..
..
..

<snipped messages>

..
..
..
 
B

Bernd P

Hello,

A VBA solution:
Store State Sales Closest Store Sales of closest
2 MO 2120 =closest_store(B2,C2,Sheet3!B2:B20,Sheet3!C2:C20,Sheet3!
A2:A19) =closest_store(B2,C2,Sheet3!B2:B20,Sheet3!C2:C20,Sheet3!
A2:A19)
(enter into D2:E2 as array formula with CTLR + SHIFT + ENTER)

Function closest_store(sMyState As String, _
dMySales As Double, _
rAllStates As Range, _
rAllSales As Range, _
rAllStores) As Variant
'Returns store (from rAllStores) and sales
'(from rAllSales) of that store which
'is in same state as sMyState and which has
'least diff to dMySales.
Dim dMin As Double
Dim dCurrDiff As Double
Dim i As Long
Dim vR(1 To 2) As Variant

vR(1) = ">>> No state matches <<<"
vR(2) = 0#
dMin = 1E+300 'Nothing found so far
For i = 1 To rAllStates.Count
If sMyState = rAllStates(i) Then
dCurrDiff = Abs(dMySales - rAllSales(i))
If dMin > dCurrDiff Then
vR(1) = rAllStores(i)
vR(2) = rAllSales(i)
dMin = dCurrDiff
End If
End If
Next i
closest_store = vR
End Function

Regards,
Bernd
 
B

Bernd P

Hello,

With two helper columns you can get a comparibly fast worksheet
solution:
Store State Sales Min Diff Index Closest Store Sales of closest
2 MO 2120 =MIN(IF(Sheet3!$B$2:$B$20=B2,ABS(Sheet3!$C$2:$C$20-C2)))
=MATCH(B2&","&D2,Sheet3!$B$2:$B$20&","&ABS(Sheet3!$C$2:$C$20-C2),)
=INDEX(Sheet3!$A$2:$A$20,E2) =INDEX(Sheet3!$C$2:$C$20,E2)

MIN and MATCH formulas have to be array-entered.

Regards,
Bernd
 
D

David Hilberg

Hello Bernd,

Glad to see someone else offering solutions! I think your work is not
quite done, however, if you want to take into account the OP's goal of
not repeating a match. Not sure whether easier to implement in your
formulas below or your combined formula/VBA solution.

- David
 
B

Bernd P

Hello David,

Thanks for this hint.

My sub shown below needs 1 minute for 1560 lookups in a table with
1560 stores.

Regards,
Bernd

Sub closest_store(rMyStates As Range, _
rMySales As Range, _
rAllStates As Range, _
rAllSales As Range, _
rAllStores As Range, _
rOutputStores As Range, _
rOutputSales As Range)
'Fills store names in rOutputStores (and sales figures
'in rOutputSales) after looking
'up for each state in rMyStates corresponding state
'in rAllStates with sales from rAllSales with
'least diff to rMySales.
Dim dMin As Double
Dim dCurrDiff As Double
Dim i As Long, j As Long, k As Long
Dim vR(1 To 2) As Variant
Dim collStores As New Collection
Dim CalcModus As Long
Dim UpdateModus As Long

CalcModus = Application.Calculation
Application.Calculation = xlCalculationManual
UpdateModus = Application.ScreenUpdating
Application.ScreenUpdating = False
On Error Resume Next 'Necessary for collection lookup
rOutputStores.ClearContents
rOutputSales.ClearContents
For i = 1 To rMyStates.Count
If i Mod 100 = 0 Then
Application.StatusBar = "Looking for closest store " _
& i & " of " & rMyStates.Count
End If
vR(1) = ">>> No state matches <<<"
vR(2) = 0#
dMin = 1E+300 'Nothing found so far
For j = 1 To rAllStates.Count
If rMyStates(i) = rAllStates(j) Then
Err.Clear
k = collStores("X" & rAllStores(j))
If Err.Number <> 0 Then
dCurrDiff = Abs(rMySales(i) - rAllSales(j))
If dMin > dCurrDiff Then
vR(1) = rAllStores(j)
vR(2) = rAllSales(j)
dMin = dCurrDiff
End If
End If
End If
Next j
rOutputStores(i) = vR(1)
rOutputSales(i) = vR(2)
collStores.Add i, "X" & vR(1)
Next i
Application.StatusBar = False
Application.Calculation = CalcModus
Application.ScreenUpdating = UpdateModus
End Sub

Sub test()
Call closest_store( _
Sheets("Sheet2").Range("B2:B1568"), _
Sheets("Sheet2").Range("C2:C1568"), _
Sheets("Sheet3").Range("B2:B1569"), _
Sheets("Sheet3").Range("C2:C1569"), _
Sheets("Sheet3").Range("A2:A1569"), _
Sheets("Sheet2").Range("D2:D1568"), _
Sheets("Sheet2").Range("E2:E1568"))
End Sub
 
D

David Hilberg

Stealing a great idea from Bernd to shrink the formula:

=IFERROR( INDEX( $A$9:$A$12, MATCH(
MIN(
IF(($B2=$B$9:$B$12)*(0=MMULT(--($A$9:$A$12=TRANSPOSE($F$1:$F1)), ROW($F
$1:$F1)^0)), ABS($C2-$C$9:$C$12))
),
IF(($B2=$B$9:$B$12)*(0=MMULT(--($A$9:$A$12=TRANSPOSE($F$1:$F1)), ROW($F
$1:$F1)^0)), ABS($C2-$C$9:$C$12)),
0)),
"no match")

The above array-entered in F2. (Tested with table2 in A9:C12 as
before).

Bernd's great idea is using IF to generate a mixed array of Diffs and
FALSEs. This is better than Diffs and 0's. Since MIN ignores Boolean
values that are enclosed in an array (this was new to me), FALSE is
not automatically converted to zero. No huge-number substitution is
then required.

- David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top