Duplicate Difficulties

A

:: Amy ::

Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!
 
P

PJFry

Couple of questions:

How are your timestamps formated? Are they a traditional stamp, like
4/28/2009 10:00 PM, or are they something custom

When you that you need to delete entries in the 'same five minutes', is this
a static window of time? For example if the time is between 10:00:00 and
10:04:59, then are these the same five minutes? 10:05:00 to 10:09:59 would
be the next, etc.

Let me know.
 
S

smartin

:: Amy :: said:
Hello All,
I am working a list with over 40,000 rows...
I need to see if there is a way that can delete duplicate rows...
Now to complicate things... Have a list that has acct numbers and times...
but i want delete duplicates that are in the same 5 mins. Make sense???
Please Help!!!

Hello,

Consider this example in A1:C15

Acct Time Dups
J 17 0
E 1 1
K 3 0
B 18 0
K 19 1
J 7 0
E 6 2
G 4 0
F 9 0
C 6 0
E 11 2
K 16 1
E 16 1
B 12 0

Columns A and B are random data. The formula in C2 and filled down is:
=SUMPRODUCT((A2=$A$2:$A$15)*(ABS(B2-$B$2:$B$15)<=5))-1

A "Dups" value greater than 0 means the Acct has at least one other row
matching Acct with a Time within 5 units.

(I am using whole "Time" numbers to simplify the explanation. Since you
are probably working with a date/time-type field you will need to adjust
the comparison
<=5
to something like
<=(5/24/60)
in order to convert to Excel's date/time numbering.)

But before you sort on Dups and delete > 0 I would draw your attention
to the special case of Acct "E".

Acct "E" has times of 1,6,11 and 16; each is within 5 units of some
other. However, the total Time span of this Acct is 15. Do all these
count as duplicates?
 
A

:: Amy ::

ok... here is an example of the info that i am working w/
Unfortunately when I export this report to excel the time stamp is like a
text... :(
I can fix that however, we are looking at about 40,000 rows :(

Employee Account # Date Time
John Doe 83773011 3/20/2009 4:46PM
John Doe 83773011 3/20/2009 4:27PM
John Doe 97868507 3/20/2009 4:26PM
John Doe 97868507 3/20/2009 4:25PM
John Doe 2433904 3/20/2009 4:19PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 2433904 3/20/2009 4:18PM
John Doe 92396306 3/20/2009 4:15PM
John Doe 103451304 3/20/2009 4:08PM
John Doe 54459201 3/20/2009 3:56PM
John Doe 54459201 3/20/2009 3:52PM
John Doe 6101201 3/20/2009 3:39PM
John Doe 108010401 3/20/2009 3:24PM
John Doe 81673524 3/20/2009 3:15PM
Thank you sooooo much for your time... i appreciate any help that I can
get :)
 
S

smartin

No worries. Divide and conquer.

Your sample appears nicely in A1:D15.

I added in E2:
=VALUE(LEFT(D2,LEN(D2)-2))+0.5*(RIGHT(D2,2)="PM")
(Removes the "AM/PM" tail and converts to a number representing the time
of day. Format this column as Time to see for real.)

And in F2:
=C2+E2
(Adds the date to the time of day. Now we have a proper date/time value.)

Finally in G2:
=SUMPRODUCT((B2=$B$2:$B$15)*(ABS(F2-$F$2:$F$15)<=5/24/60))-1

Maybe one worry... the SUMPRODUCT will probably be very slow on 40k rows.
 
D

Dennis Tucker

Are the Account#'s and the Time values in separate columns?

A B C D
John Doe 83773011 3/20/2009 4:46PM

If they are, I would do the following
1. Add a column on the end(right side) that contains the formula
=if(B1=B2,"Duplicate Account") note: lets say this is in cell E1 (are the
account numbers duplicated?)
2. Add another column on the end(right side) that contains the formula
=if(E1="Duplicate Account",D1-D2) note: lets say this is in cell F1 (if the
account numbers duplicated, what is the time difference?)
3. Add another column on the end(right side) that contains the formula
=if(F1<0:05,"Less Than 5 Minutes") note: lets say this is in cell G1 (if
the account numbers duplicated, is the time difference less than 5 minutes?)

Then use a macro to look through the G column and start erasing the bad
line(duplicate records under 5 minutes).

Dennis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top