VBA Code to delete duplicate and extraneous data

S

Sue

I'm a novice VBA user - any assistance is appreciated.
I am using Excel 2003 and have a worksheet with about 15K rows of data. I
need VBA code that will help to delete duplicate data and extraneous data
based on specific dates or blank cells.

My worksheet looks like this:

HRID UID LName FName MI Region StartDate AdjStartDate CurrentDate TermDate
0001 lj1 Jones Lily USA 5/19/1999 5/19/1999
0002 js2 Smith Jon E Can 5/19/1999 5/19/1999
7/1/2000
0002 js2 Smith Jon E Can 5/19/1999 5/19/1999
0004 bh3 Ham Bill G USA 6/1/2000 6/1/2000 9/1/2003
9/1/2003
0005 vv5 Vish V V Ind 7/1/2003 7/1/2003 8/31/2007
8/31/2007

I am looking for VBA code that will detect the duplicates and do the
following base on dates:
If the CurrentDate and TermDate are blank ignore
If the CurrentDate is blank and the TermDate is populated delete only if
there is another row for the user where the current and term dates are blank,
else delete
If both current and termdates are populated and the dates are prior to
1/1/2007 then delete the row.
Keep all rows with delete dates after 1/1/2007

Once I have this down to only 2007 data and open dates I can begin to
calculate duration time in years and quarters.

Thank you in advance for any assistance.
 
O

OssieMac

Hi Sue,

First and foremost. Make sure you have a backup of your data before
attempting to run the macros.

I have developed some code for you but instead of the code going ahead and
deleting the rows, I have written two macros; one to mark the rows to be
deleted and the second one to actually delete the rows. My reason for this is
that you need to do a reasonable sample check after the rows are marked for
deletion to ensure that the correct rows are being deleted.

My assumptions:-

1. That your data is in columns A to I and

2. Column A (headed HRID) does not have any blank cells before the bottom of
the data. If it does then I will have to modify the macro because I use this
column to identify the end of the data.

The delete reference and notes are written to columns J and K. If you have
got anything in these columns then I suggest that you insert 2 columns after
column I and they can be deleted after completion.

You run the first macro (Mark_for_Delete) and then check the rows to be
deleted. You will see that in column K will be written the row number of a
duplicate record (If any) for records where CurrentDate is blank and TermDate
is populated. (It only finds one duplicate and if more then they are ignored).

After checking, if you are happy that the deletes are correct then run the
second macro (Delete_Rows) which will do the deletions for you.

Records with CurrentDate and NO TermDate have no comment because you did not
specify anything for them.

Feel free to get back to me if you have any problems.


Sub Mark_for_Delete()
Dim c As Range 'Cell in column H
Dim strHRID As String
Dim foundHRID As Range
Dim strFirstAddr As String
Dim datePrior As Date

'NOTE: If altering the date between # # then enter date
'using the the alpha characters for the month and let
'Excel reset to numerics.
'Example Jan 1 2007 or 1 Jan 2007 is acceptable.

datePrior = #1/1/2007#

Sheets("Sheet1").Select
For Each c In Range("H:H")
'Next line tests for blank in column A
'and assumes end of data if it is blank
If c.Offset(0, -7) = "" Then Exit For

'Test if Current and Term are blank
If c.Value = "" And c.Offset(0, 1).Value = "" Then
c.Offset(0, 2).Value = "Ignore"
GoTo skipToNextc
End If

'Test if Current blank and Term populated
If c.Value = "" And c.Offset(0, 1).Value > 0 Then
'Save value and address of HRID
strHRID = c.Offset(0, -7).Value
strFirstAddr = c.Offset(0, -7).Address

'Find strHRID
Set foundHRID = Columns("A:A") _
.Find(What:=strHRID, _
After:=c.Offset(0, -7), _
LookIn:=xlFormulas, _
LookAt:=xlWhole, _
SearchOrder:=xlByColumns, _
SearchDirection:=xlNext, _
MatchCase:=False, _
SearchFormat:=False)

'Not necessary to test if found because
'must find at least one at the current c.row.

Do
'Test if not found at current row
If foundHRID.Address <> strFirstAddr Then
'Found but not at current c.row
'Test if Current and Term are blank at new location
If foundHRID.Offset(0, 7) = "" And foundHRID.Offset(0, 8) =
"" Then
'Both blank so write Delete to column J.
c.Offset(0, 2).Value = "Delete"
c.Offset(0, 2).Interior.ColorIndex = 3
'Write duplicate row number to column K
c.Offset(0, 3).Value = "Duplicate Row " & foundHRID.Row
Exit Do 'No additional find require
Else
'Not both blank so look for another record
Set foundHRID = Columns("A:A").FindNext(foundHRID)
End If
Else
'Only one record Current blank and Term populated
c.Offset(0, 2).Value = "No Match with Current & Term blank"
Exit Do
End If
Loop While Not foundHRID Is Nothing And foundHRID.Address <>
firstAddress
End If

'Test if current and term are populated
If c.Value > 0 And c.Offset(0, 1).Value > 0 Then
'test if current and term less than datePrior
If c.Value < datePrior And c.Offset(0, 1).Value < datePrior Then
c.Offset(0, 2).Value = "Delete"
c.Offset(0, 2).Interior.ColorIndex = 3
Else
c.Offset(0, 2).Value = "Recent"
End If
End If
skipToNextc:
Next c

End Sub

Sub Delete_Rows()

Dim c As Range 'Cell in column H

Sheets("Sheet1").Select
For Each c In Range("J:J")
'Next line tests for blank in column A
'and assumes end of data if it is blank
If c.Offset(0, -7) = "" Then Exit For

If c.Value = "Delete" Then
c.EntireRow.Delete
End If
Next c

End Sub


Regards,

OssieMac
 
O

OssieMac

Hi again Sue,

I don't know how savvy you are with VBA so this message is just in case you
are a novice and the following gets you into problems.

I notice that some of the lines in my code broke into 2 lines when posted.
When you copy them into the VBA editor they will appear red. Simply place the
cursor at the end of the first red line and press delete a few times until
the second red line comes up to join the end of the first (leave a space) and
move the cursor off the line and it should turn black.

Comments lines are preceded by a single quote and appear in green.

Regards,

OssieMac
 
S

Sue

OssieMac,

Thank you so much, this helped a great deal. I had to run the DeleteRow
macro a couple of times, but I can live with that since it only takes a
couple of minutes.

I did notice what you stated about the CurrentDate and No TermDate populated.

Here's what I see. In the worksheet now, I have duplicate rows for any
given HRID where the CurrentDate and the TermDate are both populated (with a
2007 date) and on the next row there is a CurrentDate (with a 2007 date) and
no termdate populated. I want to delete this row and keep only the row with
both the Current Date and the TermDate populated.

What would I add to the previous code you supplied.

Thank you again,
Sue
 
O

OssieMac

Hi again Sue,

As per my message yesterday, don't forget to keep a backup of your data.

Copy both macros in again. I have rewritten the delete one. Rows should be
deleted in the reverse order from the bottom not from the top otherwise there
are problems deleting adjacent rows because the row where the code is up to
gets deleted and the program looses its place.

I have put proper line breaks (Space and underscore) in the long lines of
code so that hopefully you should not have to edit them. (I usually do this
and I forgot yesterday.)

Sub Mark_for_Delete()
Dim c As Range 'Cell in column H
Dim strHRID As String
Dim foundHRID As Range
Dim strFirstAddr As String
Dim datePrior As Date

'NOTE: If altering the date between # # then enter date
'using the the alpha characters for the month and let
'Excel reset to numerics.
'Example Jan 1 2007 or 1 Jan 2007 is acceptable.

datePrior = #1/1/2007#

Sheets("Sheet1").Select
For Each c In Range("H:H")
'Next line tests for blank in column A
'and assumes end of data if it is blank
If c.Offset(0, -7) = "" Then Exit For

'Test if Current and Term are blank
If c.Value = "" And c.Offset(0, 1).Value = "" Then
c.Offset(0, 2).Value = "Ignore"
GoTo skipToNextc
Else
'Test for text column header
If c.Value <> "" And IsDate(c.Value) = False Then
GoTo skipToNextc
End If
End If


'Test if Current blank and Term populated
If c.Value = "" And c.Offset(0, 1).Value > 0 Then
'Save value and address of HRID
strHRID = c.Offset(0, -7).Value
strFirstAddr = c.Offset(0, -7).Address

'Find strHRID
Set foundHRID = Columns("A:A") _
.Find(What:=strHRID, _
After:=c.Offset(0, -7), _
LookIn:=xlFormulas, _
LookAt:=xlWhole, _
SearchOrder:=xlByColumns, _
SearchDirection:=xlNext, _
MatchCase:=False, _
SearchFormat:=False)

'Not necessary to test if found because
'must find at least one at the current c.row.

Do
'Test if not found at current row
If foundHRID.Address <> strFirstAddr Then
'Found but not at current c.row
'Test if Current and Term are blank
If foundHRID.Offset(0, 7) = "" And _
foundHRID.Offset(0, 8) = "" Then
'Both blank so write Delete to column J.
c.Offset(0, 2).Value = "Delete"
c.Offset(0, 2).Interior.ColorIndex = 3
'Write duplicate row number to column K
c.Offset(0, 3).Value = "Duplicate Row " & _
foundHRID.Row
Exit Do 'No additional find require
Else
'Not both blank so look for another record
Set foundHRID = Columns("A:A") _
.FindNext(foundHRID)
End If
Else
'Only one record Current blank and Term populated
Exit Do
End If
Loop While Not foundHRID Is Nothing And _
foundHRID.Address <> strFirstAddr
GoTo skipToNextc
End If

'Test if current and term are populated
If c.Value > 0 And c.Offset(0, 1).Value > 0 Then
'test if current and term less than datePrior
If c.Value < datePrior And _
c.Offset(0, 1).Value < datePrior Then
c.Offset(0, 2).Value = "Delete"
c.Offset(0, 2).Interior.ColorIndex = 3
GoTo skipToNextc
Else

'Save value HRID
strHRID = c.Offset(0, -7).Value
'Find strHRID
strFirstAddr = c.Offset(0, -7).Address

Set foundHRID = Columns("A:A") _
.Find(What:=strHRID, _
LookIn:=xlFormulas, _
LookAt:=xlWhole, _
SearchOrder:=xlByColumns, _
SearchDirection:=xlNext, _
MatchCase:=False, _
SearchFormat:=False)

Do
'Test if NOT found at current row
If foundHRID.Address <> strFirstAddr Then
'Found but NOT at current c.row
'Test for Current populated and Term blank
If foundHRID.Offset(0, 7) > 0 And _
foundHRID.Offset(0, 8) = "" Then
'Current populated Term blank true.Delete row
foundHRID.Offset(0, 9).Value = "Delete"
foundHRID.Offset(0, 9).Interior.ColorIndex = 3
'Write duplicate row number to column K
foundHRID.Offset(0, 10).Value = _
"Duplicate Row " & c.Row
End If
Set foundHRID = Columns("A:A").FindNext(foundHRID)
Else
'Columns("A:A").FindNext(foundHRID).Activate
'Set foundHRID = ActiveCell
Set foundHRID = Columns("A:A").FindNext(foundHRID)
End If
Loop While Not foundHRID Is Nothing And _
foundHRID.Address <> strFirstAddr
c.Offset(0, 2).Value = "Recent"
End If
End If
skipToNextc:
Next c
'Auto fir columns J and K
Columns("J:K").Columns.AutoFit

End Sub

Sub Delete_Rows()

Dim c As Range 'Cell in column H

Sheets("Sheet1").Select
For Each c In Range("J:J")
'Next line tests for blank in column A
'and assumes end of data if it is blank
If c.Offset(0, -7) = "" Then Exit For

If c.Value = "Delete" Then
c.EntireRow.Delete
End If
Next c

End Sub

Sub Delete_Rows_2()
Dim rngColJ As Range
Dim c As Long

'Note: looks in column J for deletes.
With Sheets("Sheet1")
Set rngColJ = Range(.Cells(2, "J"), .Cells(.Rows.Count, "J").End(xlUp))
End With

With rngColJ
'Step backwards from bottom
For c = .Rows.Count To 2 Step -1
If .Cells(c, 1) = "Delete" Then
.Cells(c, 1).EntireRow.Delete
End If
Next c
End With

End Sub



Regards,

OssieMac
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top