Top 10 Text Values

K

KarenH

I have a column of text values in which I need to display the ten most
frequently occurring. I tried setting the AutoFilter and choosing "Top 10",
but it doesn't seem to be doing anything. I checked into some functions that
I thought might work, like Frequency and Rank, but those seem to be for
numbers only. Is there a way I can accomplish this? Thanks.
 
D

Domenic

Assuming that A2:A100 contains your text values, try the following which
will take into consideration ties for 10th place...

B2, copied down:

=IF((A2<>"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($A$2:$A$100,A2),"")

C2, copied down:

=IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2)-1,"")

D1: 10

This indicates that you want a Top 10 list. You can change this as
necessary. For example, if you want a Top 5 list, enter 5 instead.

E1:

=MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)),C2:C100))-D1

....confirmed with CONTROL+SHIFT+ENTER, not just ENTER.

F2, copied down:

=IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100,MATCH(ROW()-ROW($F$2)+
1,$C$2:$C$100,0)),"")

Hope this helps!
 
P

Peo Sjoblom

Here's a pretty easy way of doing it, assume the values are in A1:A100 with
a header in A1, sel;ect A1:A100, do data>filer>advanced filter, select copy
to another location and where you want the filtered range, select unique
records only and click OK. This will give you a distinct list with the text
values, assume you put them in H1, in the first adjacent cell (in my example
I2) put

=COUNTIF($A$2:$A$100,H2)

copy down as long as needed, now select both columns (H and I) and sort
descending by column I, the first 10 values in H will be the top 10


--

Regards,

Peo Sjoblom
 
K

KarenH

Excellent, that works great! Thanks!

Domenic said:
Assuming that A2:A100 contains your text values, try the following which
will take into consideration ties for 10th place...

B2, copied down:

=IF((A2<>"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($A$2:$A$100,A2),"")

C2, copied down:

=IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2)-1,"")

D1: 10

This indicates that you want a Top 10 list. You can change this as
necessary. For example, if you want a Top 5 list, enter 5 instead.

E1:

=MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)),C2:C100))-D1

....confirmed with CONTROL+SHIFT+ENTER, not just ENTER.

F2, copied down:

=IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100,MATCH(ROW()-ROW($F$2)+
1,$C$2:$C$100,0)),"")

Hope this helps!
 
K

KarenH

What does the CONTROL+SHIFT+ENTER do that ENTER doesnt? I notice it puts
little curly brackets around the formula -- and that if I just enter, the
values don't come out right -- but I'm wondering what exactly that
combination does. Thanks again!
 
D

Domenic

CONTROL+SHIFT+ENTER is used when dealing with array formulas. For a
detailed explanation, have a look in Excel's help menu under 'Array
Formula'.
 
H

Harlan Grove

Domenic wrote...
Assuming that A2:A100 contains your text values, try the following which
will take into consideration ties for 10th place...

B2, copied down:

=IF((A2<>"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($A$2:$A$100,A2),"")

C2, copied down:

=IF(N(B2),RANK(B2,$B$2:$B$100)+COUNTIF($B$2:B2,B2)-1,"")

D1: 10

This indicates that you want a Top 10 list. You can change this as
necessary. For example, if you want a Top 5 list, enter 5 instead.

E1:
=MAX(IF(B2:B100=INDEX(B2:B100,MATCH(D1,C2:C100,0)),C2:C100))-D1

...confirmed with CONTROL+SHIFT+ENTER, not just ENTER.

F2, copied down:
=IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$100,
MATCH(ROW()-ROW($F$2)+1,$C$2:$C$100,0)),"")
....

Ancillary cells aren't necessary.

B2 [array formula]:
=INDEX($A$2:$A$100,MODE(MATCH($A$2:$A$100,$A$2:$A$100,0)))

B3 [array formula]:
=INDEX($A$2:$A$100,MODE(IF(COUNTIF(B$2:B2,$A$2:$A$100)=0,
MATCH($A$2:$A$100,$A$2:$A$100,0))))

Fill B3 down into B4:B11.

Ancillary cells can make this more efficient, but only additional one
column suffices.

B2:
=COUNTIF(A2:A100,A2)+1

B3:
=IF(COUNTIF(A$2:A2,A3)=0,COUNTIF(A3:A$100,A3)
+ROWS(B3:B$100)/ROWS($A$2:$A$100))

Fill B3 down into B4:B100.

C2:
=INDEX($A$2:$A$100,MATCH(MAX($B$2:$B$100),$B$2:$B$100,0))

C3 [array formula]:
=INDEX($A$2:$A$100,MATCH(MAX(IF($B$2:$B$100<VLOOKUP(C2,$A$2:$B$100,2,0),
$B$2:$B$100)),$B$2:$B$100,0))

Fill C3 down into C4:C11.
 
D

Domenic

A couple of issues...

1) It doesn't take into consideration ties for 10th place.

2) Owing to the MODE function, #N/A is returned when values occurring
more than once have been exhausted and only values occurring once remain.

Nevertheless, very interesting Harlan!
 
H

Harlan Grove

Domenic wrote...
A couple of issues...

1) It doesn't take into consideration ties for 10th place.

Meaning all tied values should be shown?
2) Owing to the MODE function, #N/A is returned when values occurring
more than once have been exhausted and only values occurring once remain.
....

B1 [array formula]:
=INDEX($A$1:$A$500,MATCH(MAX(COUNTIF($A$1:$A$500,$A$1:$A$500)),
COUNTIF($A$1:$A$500,$A$1:$A$500),0))

B2 [array formula]:
=INDEX($A$1:$A$500,MATCH(MAX(IF(COUNTIF(B$1:B1,$A$1:$A$500)=0,
COUNTIF($A$1:$A$500,$A$1:$A$500))),COUNTIF($A$1:$A$500,
IF(COUNTIF(B$1:B1,$A$1:$A$500)=0,$A$1:$A$500)),0))

without including ties. If ties are an issue, then it's much better to
use two columns of formulas to produce the needed results.
 
D

Domenic

Harlan Grove said:
Domenic wrote...

Meaning all tied values should be shown?

Yes. I believe that any solution for a Top N list must show any and all
values tied for Nth place. Otherwise, which of the tied values should
be returned?

Personally, for a Top N list, I prefer the solution I offered. It takes
into consideration ties for Nth place, it doesn't use array formulas,
and is very efficient.

By the way, the solution I offered is courtesy of Aladin Akyurek.
 
H

Harlan Grove

Domenic wrote...
Yes. I believe that any solution for a Top N list must show any and all
values tied for Nth place. Otherwise, which of the tied values should
be returned?

Fair point.
Personally, for a Top N list, I prefer the solution I offered. It takes
into consideration ties for Nth place, it doesn't use array formulas,
and is very efficient.

By the way, the solution I offered is courtesy of Aladin Akyurek.

Source is irrelevant. Very little in these ngs is original.

OK, I hadn't considered ties, but your solution still uses one column
too many and gains no efficiency by doing so. Given the original data
in A2:A501,

B2:
=COUNTIF(A$2:A$501,A$2:A$501)+(ROWS(A2:A$501)-1)/ROWS(A$2:A$501)

B3:
=IF(ISNA(MATCH(A3,A$2:A2,0)),COUNTIF(A3:A$501,A3)
+(ROWS(A3:A$501)-1)/ROWS(A$2:A$501))

Fill B3 down into B4:B501. Compare my B3 formula to yours.

Yours (B3):
=IF((A3<>"")*(ISNA(MATCH(A3,$A$1:A2,0))),COUNTIF($A$2:$A$501,A3),"")

You're only including non-empty cells in col A. You're also matching
against a cell, A1, that isn't specified just so you can use the same
formula in B2 as in B3. There are arguments either way, but I tend to
avoid needing to leave some cells around my data ranges blank. Next, if
the ISNA call returns TRUE, then the current col A cell's value doesn't
appear in previous rows in col A, so there's NO POINT to including
those rows in the COUNTIF call unless your goal is an INEFFICIENT
solution. Again, the motivation seems to be formula uniformity, but if
partly and fully relative references are OK inside the MATCH call and
as the 2nd arg to COUNTIF, why not as the 1st arg to COUNTIF?

As for my (ROWS(x)-1)/ROWS(y) terms, they should be more or less
directly calculated from the references without having to access data.

Next, I'd enter the desired number of results in C1, then use a formula
in B1 to calculate the effective number of results.

B1:
=COUNTIF($B$2:$B$501,">="&INT(LARGE($B$2:$B$501,C1)))

This compares to your E1 formula

Yours (E1) [array formula]:
=MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)),C2:C501))-D1

The LARGE call may make this somewhat less efficient, but it's a
mystery why you subtract D1 in your E1 formula when all your col F
formulas add $D$1 and $E$1. Seems inefficient to me. Also note your use
of an array formula. One or two unavoidable array formulas OK, but many
not OK even if they can be filled into the needed range from a single
template formula?

My revised results would be in C2:C501.

C2:
=IF(B$1,INDEX(A$2:A$501,MATCH(MAX(B$2:B$501),B$2:B$501,0)),"")

C3 [array formula]:
=IF(ROWS(B$2:B2)<B$1,INDEX(A$2:A$501,
MATCH(MAX(IF(B$2:B$501<VLOOKUP(C2,A$2:B$501,2,0),B$2:B$501)),B$2:B$501,0)),
"")

Compare my C3 formula to your C3 and F3 formulas.

Yours (C3):
=IF(N(B2),RANK(B2,$B$2:$B$501)+COUNTIF($B$2:B2,B2)-1,"")

Yours (F3):
=IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$501,
MATCH(ROW()-ROW($F$2)+1,$C$2:$C$501,0)),"")

Your C3 COUNTIF call corresponds to my B3 (ROWS(x)-1)/ROWS(y) term.
It's a tie breaker. The difference is that your COUNTIF call has to
work with values in col B while my ratio or ROWS calls can be
calculated straight from the references without having to access range
data.

Your C3 RANK call correspond to my C3 MAX(IF(.<VLOOKUP(..),.)) term.
Mine is probably a bit slower in calculation speed, but it uses less
memory. You could tighten up your formulas a bit.

B2:
=IF((A2<>"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($A2:$A$501,A2),"")

E1 [array formula]:
=MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)),C2:C501))

F2:
=IF(ROWS(F$2:F2)<=$E$1,INDEX($A$2:$A$501,
MATCH(ROWS(F$2:F2),$C$2:$C$501,0)),"")
 
D

Domenic

Harlan Grove said:
OK, I hadn't considered ties, but your solution still uses one column
too many and gains no efficiency by doing so. Given the original data
in A2:A501,

B2:
=COUNTIF(A$2:A$501,A$2:A$501)+(ROWS(A2:A$501)-1)/ROWS(A$2:A$501)

I take it the following would suffice, right?

=COUNTIF(A2:A501,A2)+(ROWS(A2:A501)-1)/ROWS(A2:A501)
B3:
=IF(ISNA(MATCH(A3,A$2:A2,0)),COUNTIF(A3:A$501,A3)
+(ROWS(A3:A$501)-1)/ROWS(A$2:A$501))

Fill B3 down into B4:B501. Compare my B3 formula to yours.

Yours (B3):
=IF((A3<>"")*(ISNA(MATCH(A3,$A$1:A2,0))),COUNTIF($A$2:$A$501,A3),"")

You're only including non-empty cells in col A. You're also matching
against a cell, A1, that isn't specified just so you can use the same
formula in B2 as in B3. There are arguments either way, but I tend to
avoid needing to leave some cells around my data ranges blank.

Although, A1 doesn't necessarily need to be blank. It could contain a
header, which in all likelihood would be the case, and wouldn't affect
the result.
Next, if
the ISNA call returns TRUE, then the current col A cell's value doesn't
appear in previous rows in col A, so there's NO POINT to including
those rows in the COUNTIF call unless your goal is an INEFFICIENT
solution. Again, the motivation seems to be formula uniformity, but if
partly and fully relative references are OK inside the MATCH call and
as the 2nd arg to COUNTIF, why not as the 1st arg to COUNTIF?

Excellent point, Harlan!!!
As for my (ROWS(x)-1)/ROWS(y) terms, they should be more or less
directly calculated from the references without having to access data.

Next, I'd enter the desired number of results in C1, then use a formula
in B1 to calculate the effective number of results.

B1:
=COUNTIF($B$2:$B$501,">="&INT(LARGE($B$2:$B$501,C1)))

This compares to your E1 formula

Yours (E1) [array formula]:
=MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)),C2:C501))-D1

The LARGE call may make this somewhat less efficient, but it's a
mystery why you subtract D1 in your E1 formula when all your col F
formulas add $D$1 and $E$1. Seems inefficient to me.

Doing it this way enables a user to clearly see the number of tied
values that exist, indicated by E1. I suspect any difference in
efficiency is negligible. In the end, it probably comes down to a
matter of preference.
Also note your use
of an array formula. One or two unavoidable array formulas OK, but many
not OK even if they can be filled into the needed range from a single
template formula?

Here I'm somewhat confused. You seem to be arguing against your
solution. My solution only contains one array formula, whereas yours
contains array formulas in C3:C501.
My revised results would be in C2:C501.

C2:
=IF(B$1,INDEX(A$2:A$501,MATCH(MAX(B$2:B$501),B$2:B$501,0)),"")

C3 [array formula]:
=IF(ROWS(B$2:B2)<B$1,INDEX(A$2:A$501,
MATCH(MAX(IF(B$2:B$501<VLOOKUP(C2,A$2:B$501,2,0),B$2:B$501)),B$2:B$501,0)),
"")

Compare my C3 formula to your C3 and F3 formulas.

Yours (C3):
=IF(N(B2),RANK(B2,$B$2:$B$501)+COUNTIF($B$2:B2,B2)-1,"")

Yours (F3):
=IF(ROW()-ROW($F$2)+1<=$D$1+$E$1,INDEX($A$2:$A$501,
MATCH(ROW()-ROW($F$2)+1,$C$2:$C$501,0)),"")

Your C3 COUNTIF call corresponds to my B3 (ROWS(x)-1)/ROWS(y) term.
It's a tie breaker. The difference is that your COUNTIF call has to
work with values in col B while my ratio or ROWS calls can be
calculated straight from the references without having to access range
data.
Interesting!

Your C3 RANK call correspond to my C3 MAX(IF(.<VLOOKUP(..),.)) term.
Mine is probably a bit slower in calculation speed, but it uses less
memory. You could tighten up your formulas a bit.

B2:
=IF((A2<>"")*(ISNA(MATCH(A2,$A$1:A1,0))),COUNTIF($A2:$A$501,A2),"")

Most definitely!
E1 [array formula]:
=MAX(IF(B2:B501=INDEX(B2:B501,MATCH(D1,C2:C501,0)),C2:C501))

Here I'd probably stick with the original formula for the reason cited
earlier.
F2:
=IF(ROWS(F$2:F2)<=$E$1,INDEX($A$2:$A$501,
MATCH(ROWS(F$2:F2),$C$2:$C$501,0)),"")

Yeah, I usually favour...

ROWS(F$2:F2)

....over...

ROW()-ROW($F$2)+1
 
H

Harlan Grove

Domenic said:
I take it the following would suffice, right?

=COUNTIF(A2:A501,A2)+(ROWS(A2:A501)-1)/ROWS(A2:A501)
....

Oops. You're right.
Although, A1 doesn't necessarily need to be blank. It could contain a
header, which in all likelihood would be the case, and wouldn't affect
the result.
....

It would affect the result if it just happened to match any of the items
below it. Improbable, but the nastiest bugs arise from improbable
situations.
Here I'm somewhat confused. You seem to be arguing against your
solution. My solution only contains one array formula, whereas yours
contains array formulas in C3:C501.
....

I mentioned that because my interpretation of what you had written
previously implies that it was good to avoid using array formulas. Not
always. Array formulas sometimes make for more efficient solutions.

Your experience may differ from mine, but mine is that the more cells there
are, the more bugs can creep into spreadsheet models. There's an efficiency
trade-off. For this particular sort of task, if the data would change
infrequently, macros would be most efficient.
 
D

Domenic

Harlan Grove said:
I mentioned that because my interpretation of what you had written
previously implies that it was good to avoid using array formulas. Not
always. Array formulas sometimes make for more efficient solutions.

That's great! Thanks Harlan!
 
D

Domenic

Although, A1 doesn't necessarily need to be blank. It could contain a
header, which in all likelihood would be the case, and wouldn't affect
the result.
...

It would affect the result if it just happened to match any of the items
below it. Improbable, but the nastiest bugs arise from improbable
situations.[/QUOTE]

While I usually try to eliminate any risk, the risk here is so remote
that I don't see it as an issue.

Also, I carried out an informal test to compare the re-calculation speed
between your solution and mine. Here's what I did...

1) used separate workbooks for each solution

2) expanded the range to A2:A10000

3) copied the relevant formulas down to row 10,000

The results surprised me on two counts...

1) With regards to my solution alone, I was surprised with the time it
took to copy the formulas down to row 10,000 and the difference in
re-calculation speed (about 10 seconds).

2) With regards to the two solutions, I was surprised to see that there
didn't seem to be any difference in re-calculation speed. Both took
about 10 seconds.

Maybe the results would be different if the range was expanded to
A2:A65535 and formulas copied down as well.

Since there doesn't seem to be any real difference between the two
solutions, I think I'll stick with mine. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top