Count Unique Text Values

S

Scott Halper

I am trying to write a formula that will count the unique values in an
array.

I have been able to get this far...

Month Branch Item
Jan A 1
Jan B 2
Feb C 1
Feb C 2
Feb A 1

What I need is the unique occurances of the Branch for each different
Item.

This is what I have written so far:
=COUNT(1/FREQUENCY(IF((B1:B5=C1),MATCH(A1:A5,A1:A5,0)),ROW(1:5)))

I am just not sure if you can next another If Statement and/or how you
would do that with this type of Array.

Thanks for the help.
 
K

KL

try something like this:

if there are no empty cells in column B, then
=SUMPRODUCT(--(FREQUENCY(MATCH(B1:B5,B1:B5,0),MATCH(B1:B5,B1:B5,0)*(A1:A5="Jan")*(C1:C5=2))>0))

if there are empty cells in column B and they need to be counted, then
=SUMPRODUCT(--(FREQUENCY(MATCH(B1:B5&"",B1:B5&"",0),MATCH(B1:B5&"",B1:B5&"",0)*(A1:A5="Jan")*(C1:C5=2))>0))

if there are empty cells in column B and they do not need to be counted, then
=SUMPRODUCT(--(FREQUENCY(MATCH(B1:B5&"",B1:B5&"",0),MATCH(B1:B5&"",B1:B5&"",0)*(A1:A5="Jan")*(B1:B5<>"")*(C1:C5=2))>0))

Attention, the formula may slow down you worksheet's recalc as it is calculation-intensive

--
KL
[MVP - Microsoft Excel]
RU: http://www.mvps.ru/Program/Default.aspx
ES: http://mvp.support.microsoft.com/?LN=es-es
EN: http://mvp.support.microsoft.com/?LN=en-us
Profile: https://mvp.support.microsoft.com/profile=AB32F8EE-8ED3-4AB9-ADDA-9E6C73C09A36
 
D

Domenic

Assuming that A2:C6 contains the data, let E2 and E3 contain 1 and 2,
then try the following formula that needs to be confirmed with
CONTROL+SHIFT+ENTER...

F2, copied down:

=COUNT(1/FREQUENCY(IF($C$2:$C$6=E2,IF($B$2:$B$6<>"",MATCH("~"&$B$2:$B$6,$
B$2:$B$6&"",0))),ROW($B$2:$B$6)-ROW($B$2)+1))

Hope this helps!
 
B

Bernd

Hello,

I suggest to apply my UDF lfreq to your combined cells Item & Branch:

Introduce a helper column D (Item_Branch):
Enter into D2
=C1&" "&B1
and copy down.

Then select a sufficiently big area anywhere and array-enter
=lfreq(D2:D999)

The function lfreq you can get from http://www.sulprobil.com/html/listfreq.html.

Regards,
Bernd
 
S

Scott Halper

Everyone,

I appreciate all your responses, however let me try to explain it with
some more descriptive information.

Month Item Branch
Jan A 1
Jan B 2
Feb A 1
Feb A 2
Feb C 1

What I'm looking for is a formula that will tell me in the month of
February Item A was sold in 2 unique branches (with branches have
1,000 of choices). Thanks for you help again.

Scott
 
D

Domenic

Let E2 contain Jan, and F2 contain A, then try the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER...

=SUM(IF(FREQUENCY(IF($A$2:$A$6=E2,IF($B$2:$B$6=F2,IF($C$2:$C$6<>"",MATCH(
"~"&$C$2:$C$6,$C$2:$C$6&"",0)))),ROW($C$2:$C$6)-ROW($C$2)+1),1))

Hope this helps!
 
H

Harlan Grove

KL said:
try something like this:

if there are no empty cells in column B, then

=SUMPRODUCT(--(FREQUENCY(MATCH(B1:B5,B1:B5,0),
MATCH(B1:B5,B1:B5,0)*(A1:A5="Jan")*(C1:C5=2))>0))
....

Even using the OP's data,

Month Branch Item
Jan A 1
Jan B 2
Feb C 1
Feb C 2
Feb A 1

your wonderful formula returns 2 even though only the second row
matches the (A1:A5="Jan")*(C1:C5=2) criteria. Can you figure out why?
Did you bother to test? Rhetorical - obviously not.
 
H

Harlan Grove

Domenic said:
Let E2 contain Jan, and F2 contain A, then try the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER...

=SUM(IF(FREQUENCY(IF($A$2:$A$6=E2,IF($B$2:$B$6=F2,
IF($C$2:$C$6<>"",MATCH("~"&$C$2:$C$6,$C$2:$C$6&"",0)))),
ROW($C$2:$C$6)-ROW($C$2)+1),1))
....

There are shorter alternatives. Also an array formula,

=COUNT(1/(MATCH(""&$B$2:$B$6,IF(($A$2:$A$6=E2)*($C$2:$C$6=F2),
""&$B$2:$B$6,0),0)=ROW($B$2:$B$6)-MIN(ROW($B$2:$B$6))+1))
 
K

KL

...Even using the OP's data,...
your wonderful formula returns 2 even though only the second row
matches the (A1:A5="Jan")*(C1:C5=2) criteria.

Yup, you're right. Thanks for jumping in.
Can you figure out why?

Of course I can :) It is the extra element (the count of values superior of the max interval).
Did you bother to test? Rhetorical - obviously not.

I did, but with my own randomly generated data based on the OP's pattern. It so happened that in my sample the max value was
preceded by a 0 in the second argument something like this:

FREQUENCY({1;2;3;3;1},{0;2;0;3;0}) -> {0;3;0;2;0;0}

....so when it worked after a couple of random changes I wrongly concluded that the formula was good.

Did you bother to consider this possibility before firing out your "verdict"? Rhetorical - obviously not, but thanks anyway ;-)

--
KL
[MVP - Microsoft Excel]
RU: http://www.mvps.ru/Program/Default.aspx
ES: http://mvp.support.microsoft.com/?LN=es-es
EN: http://mvp.support.microsoft.com/?LN=en-us
Profile: https://mvp.support.microsoft.com/profile=AB32F8EE-8ED3-4AB9-ADDA-9E6C73C09A36
 
D

Domenic

Thanks Harlan! As usual, very interesting. However, I noticed that the
COUNT formula is very inefficient. After an informal test where I
increased the lower boundary for the range from Row 6 to Row 20000,
here's what I found...

COUNT formula -----> approximately 113 seconds to calculate

FREQUENCY formula -----> approximately 3 seconds to calculate

Although, I should mention that I'm using a Mac version of Excel. I
don't know if this makes a difference.
 
H

Harlan Grove

Domenic said:
Thanks Harlan! As usual, very interesting. However, I noticed
that the COUNT formula is very inefficient. After an informal
test where I increased the lower boundary for the range from Row 6
to Row 20000, here's what I found...

COUNT formula -----> approximately 113 seconds to calculate

FREQUENCY formula -----> approximately 3 seconds to calculate

Although, I should mention that I'm using a Mac version of Excel.
I don't know if this makes a difference.
....

It does seem to make a difference. 20 iterations of your formula under
Windows XP and Excel 2003 on my laptop takes about 5 seconds, while 20
iterations of my formula takes about 16 seconds. 3:1 rather than 30:1
difference.

The constraint isn't the COUNT function, it's the array expression in
MATCH's 2nd argument in my formula. Looks like FREQUENCY is best, but
could still wrap COUNT around it, so

=COUNT(1/FREQUENCY(IF(($A$2:$A$20000=E2)*($C$2:$C$20000=F2),
MATCH(""&$B$2:$B$20000,""&$B$2:$B$20000,0)),
ROW($B$2:$B$20000)-MIN(ROW($B$2:$B$20000))+1))

rather than

=SUM(IF(FREQUENCY(IF($A$2:$A$20000=E2,IF($C$2:$C$20000=F2,
IF($B$2:$B$20000<>"",MATCH("~"&$B$2:$B$20000,$B$2:$B$20000&"",
0)))),ROW($B$2:$B$20000)-ROW($B$2)+1),1))

For 100 iterations, yours takes 25 seconds while my COUNT-FREQUENCY
formula takes 21 seconds.
 
H

Harlan Grove

KL said:
Did you bother to consider this possibility before firing out your
"verdict"? . . .

Your formula failing with the OP's own sample data could be construed
as untested. I did take into account light testing with randomly
generated values leading to fortuitously correct results. Happens all
the time.
 
D

Domenic

Harlan Grove said:
It does seem to make a difference. 20 iterations of your formula under
Windows XP and Excel 2003 on my laptop takes about 5 seconds, while 20
iterations of my formula takes about 16 seconds. 3:1 rather than 30:1
difference.

Wow! There's a big difference between Windows and Mac versions of
Excel. Very disappointing...
The constraint isn't the COUNT function, it's the array expression in
MATCH's 2nd argument in my formula. Looks like FREQUENCY is best, but
could still wrap COUNT around it, so

=COUNT(1/FREQUENCY(IF(($A$2:$A$20000=E2)*($C$2:$C$20000=F2),
MATCH(""&$B$2:$B$20000,""&$B$2:$B$20000,0)),
ROW($B$2:$B$20000)-MIN(ROW($B$2:$B$20000))+1))

rather than

=SUM(IF(FREQUENCY(IF($A$2:$A$20000=E2,IF($C$2:$C$20000=F2,
IF($B$2:$B$20000<>"",MATCH("~"&$B$2:$B$20000,$B$2:$B$20000&"",
0)))),ROW($B$2:$B$20000)-ROW($B$2)+1),1))

For 100 iterations, yours takes 25 seconds while my COUNT-FREQUENCY
formula takes 21 seconds.

Here I'm surprised. I thought 1/Array would be more expensive.

Thanks Harlan! Much appreciated!
 
P

pshepard

Hi Scott,

Assuming that your data is in the range A1:C6...

If you concatenate Month and Branch in Column D, by copying the following
formula from this post into D2:

=A2&","&B2

copy the formula from D2, and paste into D3:D6

Enter copy the following array formula from this post into E1:

=IF(ROWS($1:1)<=SUM(1/COUNTIF(D$2:D$6,D$2:D$6)),INDEX(D$2:D$6,SMALL(IF(ROW(D$2:D$6)-MIN(ROW(D$2:D$6))+1=MATCH(D$2:D$6,D$2:D$6,0),ROW(D$2:D$6)-MIN(ROW(D$2:D$6))+1),ROWS($1:1))),"")

When you enter the formula into the cell, press Ctrl and Shift while
pressing Enter, and Excel will place curly brackets {} around the formula.
Copy the formula from E1 and paste into E2 through E4. Note that this formula
cannot have any blank cells in the D2:D6 range or it will produce a division
error.

Then copy the formula below from this post and paste into F1:

=COUNT(1/FREQUENCY(IF($D$2:$D$6=E1,IF($C$2:$C$6<>"",MATCH("~"&$C$2:$C$6,$C$2:$C$6&"",0))),ROW($C$2:$C$6)-ROW($C$2)+1))

When you enter the formula into the cell, press Ctrl and Shift while
pressing Enter, and Excel will place curly brackets {} around the formula.
Copy the formula from F1 and paste into F2 through F4.
 
P

pshepard

Hi Scott,

Domenic's post worked for me.

Here is a way to get a list of unique Item values, assuming the range for
Item is C2:C6. Copy the formula down a column of cells until you get a blank.

=IF(ROWS($1:1)<=SUM(1/COUNTIF(C$2:C$6,C$2:C$6)),INDEX(C$2:C$6,SMALL(IF(ROW(C$2:C$6)-MIN(ROW(C$2:C$6))+1=MATCH(C$2:C$6,C$2:C$6,0),ROW(C$2:C$6)-MIN(ROW(C$2:C$6))+1),ROWS($1:1))),"")
 
A

Aladin Akyurek

...

It does seem to make a difference. 20 iterations of your formula under
Windows XP and Excel 2003 on my laptop takes about 5 seconds, while 20
iterations of my formula takes about 16 seconds. 3:1 rather than 30:1
difference.

The constraint isn't the COUNT function, it's the array expression in
MATCH's 2nd argument in my formula. Looks like FREQUENCY is best, but
could still wrap COUNT around it, so

=COUNT(1/FREQUENCY(IF(($A$2:$A$20000=E2)*($C$2:$C$20000=F2),
MATCH(""&$B$2:$B$20000,""&$B$2:$B$20000,0)),
ROW($B$2:$B$20000)-MIN(ROW($B$2:$B$20000))+1))

rather than

=SUM(IF(FREQUENCY(IF($A$2:$A$20000=E2,IF($C$2:$C$20000=F2,
IF($B$2:$B$20000<>"",MATCH("~"&$B$2:$B$20000,$B$2:$B$20000&"",
0)))),ROW($B$2:$B$20000)-ROW($B$2)+1),1))

For 100 iterations, yours takes 25 seconds while my COUNT-FREQUENCY
formula takes 21 seconds.

Adding the ~ bit to the COUNT formula seems to make both formula
equally fast, as measured with FastExcel. BTW, I did not test the
effects of * vs IF and MIN.
 
D

Domenic

Adding the ~ bit to the COUNT formula seems to make both formula
equally fast, as measured with FastExcel.

Very interesting. I didn't think appending "" vs "~" would make much
difference, but apparently it does.
BTW, I did not test the
effects of * vs IF and MIN.

So I don't think we can draw any concrete conclusions, since we're not
comparing apples to apples.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top