Counting unique items based on date

D

DKS

I have a sheet, with each row containing one record. Column A contains a
date, and column B contains a department. The contents of the column A
(date) are either a simple date "dd/mm/yyyy" format or a timestamp e.g.
"dd/mm/yyyy hh:mm:ss". Duplicates are allowed in Column A, Column B, Column
A + Column B.

I needed (ideally) a worksheet function that would allow me to obtain the
following information:

Based on a START DATE, END DATE and DEPT NAME return to me the number of
times I have the DEPT NAME occuring in my data for between the START DATE and
END DATE (both dates included) but counting multiple rows for the same date
as only 1 record. Thus if I have 27 rows for the same department with date
July 15, then it should give me only the value 1.

The time-part in the timestamp format can be ignored. It exists because of
input data coming from various sources and some sources store time and some
do not store time.

How could I do the above with a worksheet function?

Many thanks.
 
R

Ron Coderre

Try something like this:

With this structure in A1:A16
Date Dept
1/1/2007 A
1/1/2007 A
1/1/2007 A
1/1/2007 B
1/1/2007 C
1/1/2007 C
1/1/2007 C
1/15/2007 A
1/15/2007 B
1/15/2007 C
1/15/2007 C
2/1/2007 A
2/1/2007 A
2/1/2007 B
2/1/2007 B

And...
E1: (StartDate eg: 01/01/2007)
F1: (EndDate eg: 01/31/2007)
G1: (DeptName eg: A)

Then this formula returns the number of unique dates in that range where the
DeptName matches the name in G1:
=SUMPRODUCT(--ISNUMBER(MATCH(ROW(INDEX(A:A,E1):INDEX(A:A,F1)),(B2:B20=G1)*(A2:A20>=E1)*(A2:A20<=F1)*A2:A20,0)))

Using the above example, the formula returns: 2
Dept A appears at least once with 1/1/2007 and 1/15/2007

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP
 
D

DKS

Hi Don, Ron

Thanks for your suggestions. I tried them but they did not work. I forgot
to mention that the data may not always be sorted by date. I hope that does
not affect your logic?

To give you extra info on what went wrong: the formula of Don gave me an
extremely high number. For example: I had data for one calendar year but the
result of Don's formula was more than 366. This is not possible because
there are only 366 unique dates in a year. The formula of Ron returned zero
as a result.

Thanks.
 
R

Ron Coderre

I tried scrambling the dates and DeptNames and I still get correct (to me)
results.

Try this:
Play with my posted example and see if it gives correct results under those
conditions.
-If yes
....Try putting some of your data into the example and test again.
-If No, then the data is the problem and you'll need to post some sample
data so we can see what you're dealing with?

Either way, please update us on your progress.

***********
Regards,
Ron

XL2003, WinXP
 
R

Ron Coderre

Not me, Don (I don't have a preference)....but DKS wants unique dates. :)

(BTW......Congratulations on the MVP award)

***********
Regards,
Ron

XL2003, WinXP
 
D

Don Guillett

I was saying that yours was the way to go and "Gracias Senor" for the kind
words.
 
T

Teethless mama

Try this:

=SUM(IF(FREQUENCY(IF((Dept="A")*(TEXT(Date,"mmmyy")="Jan07"),INT(Date)),INT(Date))>0,1))
 
D

DKS

Ron,

Your code worked. I had made a mistake whilst typing the formula you gave.
Instead of typing A:A as you have mentioned, I inadvertently typed A2:A25000
to signify the full range. And apparently with that it did not work.

But now I have corrected the formula and it seems to be working like charm.
I still have to test with time-stamps but I believe it should work.

thanks for your help.
 
D

DKS

First test with timestamp included in the date, and the date being one of the
extremes being tested and the formula does not work.

For example: in your test-data hereunder if I put only one record for dept A
for date March 15; and if I test for Jan 1st till Marc 15th for dept A, I get
one number less than the actual number.

Just FYI.
 
R

Ron Coderre

Wait a minute......timestamp?!?
You may not think so, but that was an important detail to leave out.

Here's why:

Excel treats dates as whole numbers and times as decimal fractions of a day.
15-MAR-2007 is 39,156 to Excel
Noon on 15-MAR-2007 is 39156.5

We've been matching dates. So, when you enter something like 03/15/2007
12:00:00, that value is greater than the tested end point of 03/15/2007.

To compensate for time values, try this formula:
=SUMPRODUCT(--ISNUMBER(MATCH(ROW(INDEX(A:A,E1):INDEX(A:A,F1)),INDEX(INT((B2:B20=G1)*(A2:A20>=E1)*(INT(A2:A20)<=F1)*A2:A20),0),0)))


Does that help?
***********
Regards,
Ron

XL2003, WinXP
 
R

Ron Coderre

A somewhat shorter version implements the FREQUENCY function. My only
reluctance in using it is that it can be a bit confusing to figure out.

With the actual data in A2:B20, Col_A contains dates, Col_B contains
DeptNames,
and...
E1: (StartDate)
F1: (EndDate)
G1: (a dept name)

Here are both versions:
NON-array formula
=SUMPRODUCT(--ISNUMBER(MATCH(ROW(INDEX(A:A,E1):INDEX(A:A,F1)),INDEX(INT((B2:B20=G1)*(A2:A20>=E1)*(INT(A2:A20)<=F1)*A2:A20),0),0)))

ARRAY formula (committed with Ctrl+Shift+Enter, instead of Enter)
=SUMPRODUCT(--ISNUMBER(MATCH(ROW(INDEX(A:A,E1):INDEX(A:A,F1)),INT((B2:B20=G1)*(A2:A20>=E1)*(INT(A2:A20)<=F1)*A2:A20),0)))

NON-array formula, using FREQUENCY
=SUMPRODUCT(--((B2:B21=G1)*(FREQUENCY((B2:B20=G1)*(A2:A20>=E1)*(INT(A2:A20)<=F1)*INT(A2:A20),INT(A2:A20))>0)))

Note the first part: (B2:B21=G1) extends 1 row below the actual data area to
accommodate the way the FREQUENCY function constructs its bins.

Does that help?
***********
Regards,
Ron

XL2003, WinXP
 
O

orchid11652

Hi - o.k. - I can't help asking because your formula is so close to
something I need. See my pasted data below:

Event CodePlex Service ID Dates Scheduled*
EART309 SHO SH2 6/7/2006, 1/5/2007
EART309 SHO SHO 3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006,
6/7/2006

EART309 WOM WOM 12/9/2006
EART312 SHO SH2 6/7/2006, 1/6/2007
EART312 SHO SHO 3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006,
3/31/2006, 1/6/2007

EART312 WOM WOM 12/12/2006


Some of the data is wrapped so I'll explain the format. There are 4 fields -
an Event, Plex, Service ID and Date. Above is the exact excel export of the
data from an auxiliary system. The dates are populated into a single excel
field, however, I can parse them to populate a single date per field if it
makes the formula easier. What I need is a formula that totals all the
unique dates for each Event Code/Plex combination. The challenge is that
there can be multiple rows with the same Event Code and Plex, each which may
have the same date(s) which may only be counted once. In the above sample
data, the formula should return a quantity of 8 for the combination of Event
Code/Plex "EART309/SHO" and 8 for "EART312/SHO". I thought maybe an Access DB
would work better since I need to download the date data and create ITD
reports on a regular basis by Event. Any advice on if a formula or Access
would be better would be appreciated as well.

Thanks.

Orchid11652
 
R

Ron Coderre

I think I came up with something that works......

With
your posted sample data in A1:D7 (I'll break it up to avoid Text Wrap):
A1:C7 contains:
Event CodePlex Service ID
EART309 SHO SH2
EART309 SHO SHO
EART309 WOM WOM
EART312 SHO SH2
EART312 SHO SHO
EART312 WOM WOM

D1:D7 contains:
Dates Scheduled*
6/7/2006, 1/5/2007
3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006, 6/7/2006
12/9/2006
6/7/2006, 1/6/2007
3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006, 3/31/2006, 1/6/2007
12/12/2006

AND....
E1: (StartDate....of the range of dates to be searched, eg 3/1/2006)
F1: (EndDate....of the range of dates to be searched, eg 12/31/2007)

Then....try this:

G1: (an event, eg EART309)
H1: (a codeplex, eg SHO)

This ARRAY FORMULA returns the count of unique dates within the start/end
range for the Event/CodePlex combination in G1:H1
I1:
=SUM(--NOT(ISNA(MATCH("*"&TEXT(ROW(INDEX($A:$A,$E$1):INDEX($A:$A,$F$1)),"m/d/yyyy")&"*",IF($A$2:$A$7&$B$2:$B$7=G1&H1,$D$2:$D$7),0))))

Note: For array formulas, [Ctrl] [Shift] and press [Enter], instead of just
pressing [Enter].

With sample tests, these are the results:
EART309, SHO.........Returns 8
EART309, WOM........Returns 1
EART312, SHO.........Returns 8

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP
 
O

orchid11652

Thanks! I will try it and let you know.

Ron Coderre said:
I think I came up with something that works......

With
your posted sample data in A1:D7 (I'll break it up to avoid Text Wrap):
A1:C7 contains:
Event CodePlex Service ID
EART309 SHO SH2
EART309 SHO SHO
EART309 WOM WOM
EART312 SHO SH2
EART312 SHO SHO
EART312 WOM WOM

D1:D7 contains:
Dates Scheduled*
6/7/2006, 1/5/2007
3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006, 6/7/2006
12/9/2006
6/7/2006, 1/6/2007
3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006, 3/31/2006, 1/6/2007
12/12/2006

AND....
E1: (StartDate....of the range of dates to be searched, eg 3/1/2006)
F1: (EndDate....of the range of dates to be searched, eg 12/31/2007)

Then....try this:

G1: (an event, eg EART309)
H1: (a codeplex, eg SHO)

This ARRAY FORMULA returns the count of unique dates within the start/end
range for the Event/CodePlex combination in G1:H1
I1:
=SUM(--NOT(ISNA(MATCH("*"&TEXT(ROW(INDEX($A:$A,$E$1):INDEX($A:$A,$F$1)),"m/d/yyyy")&"*",IF($A$2:$A$7&$B$2:$B$7=G1&H1,$D$2:$D$7),0))))

Note: For array formulas, [Ctrl] [Shift] and press [Enter], instead of just
pressing [Enter].

With sample tests, these are the results:
EART309, SHO.........Returns 8
EART309, WOM........Returns 1
EART312, SHO.........Returns 8

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP


orchid11652 said:
Hi - o.k. - I can't help asking because your formula is so close to
something I need. See my pasted data below:

Event CodePlex Service ID Dates Scheduled*
EART309 SHO SH2 6/7/2006, 1/5/2007
EART309 SHO SHO 3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006,
6/7/2006

EART309 WOM WOM 12/9/2006
EART312 SHO SH2 6/7/2006, 1/6/2007
EART312 SHO SHO 3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006,
3/31/2006, 1/6/2007

EART312 WOM WOM 12/12/2006


Some of the data is wrapped so I'll explain the format. There are 4 fields -
an Event, Plex, Service ID and Date. Above is the exact excel export of the
data from an auxiliary system. The dates are populated into a single excel
field, however, I can parse them to populate a single date per field if it
makes the formula easier. What I need is a formula that totals all the
unique dates for each Event Code/Plex combination. The challenge is that
there can be multiple rows with the same Event Code and Plex, each which may
have the same date(s) which may only be counted once. In the above sample
data, the formula should return a quantity of 8 for the combination of Event
Code/Plex "EART309/SHO" and 8 for "EART312/SHO". I thought maybe an Access DB
would work better since I need to download the date data and create ITD
reports on a regular basis by Event. Any advice on if a formula or Access
would be better would be appreciated as well.

Thanks.

Orchid11652
 
O

orchid11652

Hi - I tried your example but I'm having trouble replicating the result for
'EART309,WOM' or the other combination with only a single date. It is giving
me a zero. If I populate an additional date, it gives the correct count of
2. I will work with it and see if I set up the format correctly. Thanks!
This was very helpful.

Ron Coderre said:
I think I came up with something that works......

With
your posted sample data in A1:D7 (I'll break it up to avoid Text Wrap):
A1:C7 contains:
Event CodePlex Service ID
EART309 SHO SH2
EART309 SHO SHO
EART309 WOM WOM
EART312 SHO SH2
EART312 SHO SHO
EART312 WOM WOM

D1:D7 contains:
Dates Scheduled*
6/7/2006, 1/5/2007
3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006, 6/7/2006
12/9/2006
6/7/2006, 1/6/2007
3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006, 3/31/2006, 1/6/2007
12/12/2006

AND....
E1: (StartDate....of the range of dates to be searched, eg 3/1/2006)
F1: (EndDate....of the range of dates to be searched, eg 12/31/2007)

Then....try this:

G1: (an event, eg EART309)
H1: (a codeplex, eg SHO)

This ARRAY FORMULA returns the count of unique dates within the start/end
range for the Event/CodePlex combination in G1:H1
I1:
=SUM(--NOT(ISNA(MATCH("*"&TEXT(ROW(INDEX($A:$A,$E$1):INDEX($A:$A,$F$1)),"m/d/yyyy")&"*",IF($A$2:$A$7&$B$2:$B$7=G1&H1,$D$2:$D$7),0))))

Note: For array formulas, [Ctrl] [Shift] and press [Enter], instead of just
pressing [Enter].

With sample tests, these are the results:
EART309, SHO.........Returns 8
EART309, WOM........Returns 1
EART312, SHO.........Returns 8

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP


orchid11652 said:
Hi - o.k. - I can't help asking because your formula is so close to
something I need. See my pasted data below:

Event CodePlex Service ID Dates Scheduled*
EART309 SHO SH2 6/7/2006, 1/5/2007
EART309 SHO SHO 3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006,
6/7/2006

EART309 WOM WOM 12/9/2006
EART312 SHO SH2 6/7/2006, 1/6/2007
EART312 SHO SHO 3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006,
3/31/2006, 1/6/2007

EART312 WOM WOM 12/12/2006


Some of the data is wrapped so I'll explain the format. There are 4 fields -
an Event, Plex, Service ID and Date. Above is the exact excel export of the
data from an auxiliary system. The dates are populated into a single excel
field, however, I can parse them to populate a single date per field if it
makes the formula easier. What I need is a formula that totals all the
unique dates for each Event Code/Plex combination. The challenge is that
there can be multiple rows with the same Event Code and Plex, each which may
have the same date(s) which may only be counted once. In the above sample
data, the formula should return a quantity of 8 for the combination of Event
Code/Plex "EART309/SHO" and 8 for "EART312/SHO". I thought maybe an Access DB
would work better since I need to download the date data and create ITD
reports on a regular basis by Event. Any advice on if a formula or Access
would be better would be appreciated as well.

Thanks.

Orchid11652
 
R

Ron Coderre

The formula I posted searches for TEXT values. Consequently, when the "Dates
Scheduled*" field only has one "date" in it, you must make sure the value is
TEXT, not an actual date.

You can do that by either:
•prepending an apostrophe in front of the date
(eg '03/01/2007 )
or
•Setting the numeric format of the cells to Text before entering data.
<format><cells><number tab>....Category: Text

Does that help?
***********
Regards,
Ron

XL2003, WinXP


orchid11652 said:
Hi - I tried your example but I'm having trouble replicating the result for
'EART309,WOM' or the other combination with only a single date. It is giving
me a zero. If I populate an additional date, it gives the correct count of
2. I will work with it and see if I set up the format correctly. Thanks!
This was very helpful.

Ron Coderre said:
I think I came up with something that works......

With
your posted sample data in A1:D7 (I'll break it up to avoid Text Wrap):
A1:C7 contains:
Event CodePlex Service ID
EART309 SHO SH2
EART309 SHO SHO
EART309 WOM WOM
EART312 SHO SH2
EART312 SHO SHO
EART312 WOM WOM

D1:D7 contains:
Dates Scheduled*
6/7/2006, 1/5/2007
3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006, 6/7/2006
12/9/2006
6/7/2006, 1/6/2007
3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006, 3/31/2006, 1/6/2007
12/12/2006

AND....
E1: (StartDate....of the range of dates to be searched, eg 3/1/2006)
F1: (EndDate....of the range of dates to be searched, eg 12/31/2007)

Then....try this:

G1: (an event, eg EART309)
H1: (a codeplex, eg SHO)

This ARRAY FORMULA returns the count of unique dates within the start/end
range for the Event/CodePlex combination in G1:H1
I1:
=SUM(--NOT(ISNA(MATCH("*"&TEXT(ROW(INDEX($A:$A,$E$1):INDEX($A:$A,$F$1)),"m/d/yyyy")&"*",IF($A$2:$A$7&$B$2:$B$7=G1&H1,$D$2:$D$7),0))))

Note: For array formulas, [Ctrl] [Shift] and press [Enter], instead of just
pressing [Enter].

With sample tests, these are the results:
EART309, SHO.........Returns 8
EART309, WOM........Returns 1
EART312, SHO.........Returns 8

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP


orchid11652 said:
Hi - o.k. - I can't help asking because your formula is so close to
something I need. See my pasted data below:

Event CodePlex Service ID Dates Scheduled*
EART309 SHO SH2 6/7/2006, 1/5/2007
EART309 SHO SHO 3/5/2006, 3/6/2006, 3/7/2006, 3/8/2006, 3/9/2006, 3/10/2006,
6/7/2006

EART309 WOM WOM 12/9/2006
EART312 SHO SH2 6/7/2006, 1/6/2007
EART312 SHO SHO 3/26/2006, 3/27/2006, 3/28/2006, 3/29/2006, 3/30/2006,
3/31/2006, 1/6/2007

EART312 WOM WOM 12/12/2006


Some of the data is wrapped so I'll explain the format. There are 4 fields -
an Event, Plex, Service ID and Date. Above is the exact excel export of the
data from an auxiliary system. The dates are populated into a single excel
field, however, I can parse them to populate a single date per field if it
makes the formula easier. What I need is a formula that totals all the
unique dates for each Event Code/Plex combination. The challenge is that
there can be multiple rows with the same Event Code and Plex, each which may
have the same date(s) which may only be counted once. In the above sample
data, the formula should return a quantity of 8 for the combination of Event
Code/Plex "EART309/SHO" and 8 for "EART312/SHO". I thought maybe an Access DB
would work better since I need to download the date data and create ITD
reports on a regular basis by Event. Any advice on if a formula or Access
would be better would be appreciated as well.

Thanks.

Orchid11652





(e-mail address removed)
Congrats, Don!

I has assumed you were already but didn't make it public.

--
Biff
Microsoft Excel MVP


I was saying that yours was the way to go and "Gracias Senor" for the kind
words.

--
Don Guillett
Microsoft MVP Excel
SalesAid Software
(e-mail address removed)
Not me, Don (I don't have a preference)....but DKS wants unique dates.
:)

(BTW......Congratulations on the MVP award)

***********
Regards,
Ron

XL2003, WinXP


:

I didn't see that you only wanted UNIQUE dates.

--
Don Guillett
Microsoft MVP Excel
SalesAid Software
(e-mail address removed)
I tried scrambling the dates and DeptNames and I still get correct (to
me)
results.

Try this:
Play with my posted example and see if it gives correct results under
those
conditions.
-If yes
...Try putting some of your data into the example and test again.
-If No, then the data is the problem and you'll need to post some
sample
data so we can see what you're dealing with?

Either way, please update us on your progress.

***********
Regards,
Ron

XL2003, WinXP


:

Hi Don, Ron

Thanks for your suggestions. I tried them but they did not work. I
forgot
to mention that the data may not always be sorted by date. I hope
that
does
not affect your logic?

To give you extra info on what went wrong: the formula of Don gave
me an
extremely high number. For example: I had data for one calendar
year but
the
result of Don's formula was more than 366. This is not possible
because
there are only 366 unique dates in a year. The formula of Ron
returned
zero
as a result.

Thanks.

:

Try something like this:

With this structure in A1:A16
Date Dept
1/1/2007 A
1/1/2007 A
1/1/2007 A
1/1/2007 B
1/1/2007 C
1/1/2007 C
1/1/2007 C
1/15/2007 A
1/15/2007 B
1/15/2007 C
1/15/2007 C
2/1/2007 A
2/1/2007 A
2/1/2007 B
2/1/2007 B

And...
E1: (StartDate eg: 01/01/2007)
F1: (EndDate eg: 01/31/2007)
G1: (DeptName eg: A)

Then this formula returns the number of unique dates in that range
where the
DeptName matches the name in G1:
=SUMPRODUCT(--ISNUMBER(MATCH(ROW(INDEX(A:A,E1):INDEX(A:A,F1)),(B2:B20=G1)*(A2:A20>=E1)*(A2:A20<=F1)*A2:A20,0)))

Using the above example, the formula returns: 2
Dept A appears at least once with 1/1/2007 and 1/15/2007

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP


:

I have a sheet, with each row containing one record. Column A
contains a
date, and column B contains a department. The contents of the
column
A
(date) are either a simple date "dd/mm/yyyy" format or a
timestamp
e.g.
"dd/mm/yyyy hh:mm:ss". Duplicates are allowed in Column A,
Column B,
Column
A + Column B.

I needed (ideally) a worksheet function that would allow me to
obtain
the
following information:

Based on a START DATE, END DATE and DEPT NAME return to me the
number
of
times I have the DEPT NAME occuring in my data for between the
START
DATE and
END DATE (both dates included) but counting multiple rows for
the
same date
as only 1 record. Thus if I have 27 rows for the same
department
with date
July 15, then it should give me only the value 1.

The time-part in the timestamp format can be ignored. It exists
because of
input data coming from various sources and some sources store
time
and some
do not store time.

How could I do the above with a worksheet function?

Many thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top