Comparing Rows

M

Monomeeth

Hello

I need two macros to do as follows:

ONE
I need a macro which will check every row in a worksheet and if it finds two
or more duplicate rows (i.e. every cell in the row is duplicated), it will
then proceed to delete all duplicate rows except the first duplicate row.

TWO
I need a macro which will check every row in a worksheet and if it finds two
or more rows with the exact same value in a particular column, it will then
proceed to delete all duplicate rows except the first duplicate row. I would
prefer the user to have the ability to specify the column they want
interrogated.

Any suggestions would be greatly appreciated!

Thanks,

Joe.
 
J

Joel

I have Few questions

1) Is it acceptable to sort the worksheet in the macro?

2) For macro ONE what is the last column?

3) Will the last column always be the same when you run the macro?

4) Does every Row have data filed to the last column.

What I propose for macro ONE is to sort every column of the worksheet
starting with the last column and moving towards the first column. Then
comparing adjacent rows for matches.
 
M

Monomeeth

Hi Joel

Thanks very much for your help. My answers are as follows:

1) Is it acceptable to sort the worksheet in the macro?

Yes.

2) For macro ONE what is the last column?

This will be different as I want to be able to run the macro on various
spreadsheets. Although I suppose you could set the last column as IV (using
Excel 2003) and therefore it wouldn't matter as some cells may be blank.

3) Will the last column always be the same when you run the macro?

No. See answer to 2 above.

4) Does every Row have data filed to the last column.

Not necessarily. Some cells could be blank.


At the risk of providing too much info, here is some context for you:

BACKGROUND
Basically I am trying to remove duplicate rows from worksheets which import
data from various online forms submitted by users. Unfortunately, sometimes
users submit the exact same data multiple times (for whatever reason) and
this causes problems when we need to do an analysis of the collated data. So
I want to be able to remove duplicate rows where every cell in the row is
exactly the same as every corresponding cell within another row, and some of
these cells could be blank.

As for Macro TWO, I need this for another reason altogether. Last column is
not constant and yes there may be blank cells within the row.

JUST A THOUGHT
I suppose you could have the one macro do both jobs by designing it so that
the user can specify a range to interrogate - the user could select a single
column or a group of columns - but I don't know whether that would complicate
things as I don't want the user to specify WHAT the macro is actually looking
for, only WHERE to look for duplicates.

EXAMPLE

I may have a worksheet with a data range of A1:AG25000. Some cells within
this range may be blank for whatever reason. Now, I may want to remove any
duplicate rows where data in columns A to AG are an exact match for the
corresponding cells.

That is, A123 may be exactly the same as A237 and A767, B123 may be exactly
the same as B237 and B767, and so on for every column in those rows,
regardless of whether the match happens to contain data or be blank - hence
why you could go to column IV to do your interrogation. In this scenario I
want to only be left with unique rows after the macro is finished, so only
row 123 would remain while rows 237 and 767 are deleted.

Now, I may have another spreadsheet where I want to achieve the same thing,
but only want to check a single column for duplicates. For instance, I may
have 25000 rows in a spreadsheet containing employee data, but there may only
be 18000 employees in total, so I should only have 18000 rows not 25000. So,
if Column E contained a unique identifier such as employee numbers, then I
would want the macro to only interrogate that column and remove rows based
only on the duplicated data in row E regardless of whether the other cells in
the row were duplicated. In this scenario I only want to keep the first row
conatining that unique identifier.

Hope this makes sense and that I haven't confused you!

:)

Once again, I appreciate your help and time with this.

Joe.
 
J

Joel

The code assumes there is a header row in both cases. I asked the questions
becasue I want to keep the code as simple as possible and to make it run
quicker. Becasue the columns lengths varied I had to check every row for the
last column.

To make the code run faster instead of deleting rows one at a time. I
placed an X in column IV for rows that needed to be deleted. Then filtered
the X's using Autofilter and removed the rows with the X's. See comments in
the code.

Sub RemoveDuplicatesAllColumns()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'find last row
'use column A to determine last row
LastRow = Range("A" & Rows.Count).End(xlUp).Row

'find last column
'check every row to determine last column
EndColumn = 0
For RowCount = 1 To LastRow
LastColumn = Cells(RowCount).End(xlUp).Column
If LastColumn > EndColumn Then
EndColumn = LastColumn
End If
Next RowCount


'sort cells three columns at a time
For ColCount = 1 To EndColumn Step 3
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, ColCount), _
order1:=xlAscending, _
key2:=Cells(1, ColCount + 1), _
order2:=xlAscending, _
key3:=Cells(1, ColCount + 1), _
order3:=xlAscending

Next ColCount

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
'check if column lengths are equal if not skip
LastCol = Cells(RowCount, Columns.Count).End(xlToLeft).Column
LastCompareCol = Cells(CompareRow, Columns.Count).End(xlToLeft).Column
Match = True
If LastCol = LastCompareCol Then
For ColCount = 1 To LastCol
If Cells(RowCount, ColCount) <> _
Cells(CompareRow, ColCount) Then

Match = False
Exit For
End If
Next ColCount
If Match = True Then
Range("IV" & CompareRow) = "X"
End If
Else
Match = False
RowCount = RowCount + 1
End If
Loop


'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub

Sub RemoveDuplicatesOneColumn()

'clear columnn IV incase data ther is data from last run
Columns("IV").Delete

'select column
Set Selected = Application.InputBox(Prompt:="Select one column", _
Title:="Select one column", Type:=8)

SelectCol = Selected.Column

'find last row
'use column A to determine last row
LastRow = Cells(Rows.Count, SelectCol).End(xlUp).Row


'sort by column selected
Rows("1:" & LastRow).Sort _
header:=xlYes, _
key1:=Cells(1, SelectCol), _
order1:=xlAscending

'compare rows putting a X in column IV where duplicates exist
RowCount = 2
Match = False 'use to indicate that last compared rows
'either matched or didn't match
Do While RowCount < LastRow
If Match = False Then
CompareRow = RowCount + 1
Else
CompareRow = CompareRow + 1
End If
If Cells(RowCount, SelectCol) = _
Cells(CompareRow, SelectCol) Then

Range("IV" & CompareRow) = "X"
Match = True
Else
Match = False
RowCount = RowCount + 1
End If
Loop

'remove rows with X's in column IV
Set c = Columns("IV").Find(what:="X", _
LookIn:=xlValues, lookat:=xlWhole)
'only filer if at least one X is found
If Not c Is Nothing Then
Columns("IV").AutoFilter Field:=1, Criteria1:="X"
Rows("2:" & LastRow).SpecialCells(xlCellTypeVisible).Delete
Columns("IV").Delete
End If

End Sub
 
M

Monomeeth

Hi Joel

Thank you very much for yoiur continued help. I really appreciate it. I also
like the way you have chosen to approach this.

I decided to test both your macros by doing a straight copy and paste as is.
I chose a worksheet which has data in the range A1:AL29420. The results were
as follows:

REMOVE DUPLICATES ALL COLUMNS
Macro seemed to run fine as it was obviously doing something. However, after
a few minutes I got a strange error message from Visual Basic. All it said
was "400". Nothing else, the only options I had was an OK button and a Help
button, but when I clicked on Help nothing seemed to happen.

REMOVE DUPLICATES ONE COLUMN
Macro started fine - I was able to select the column and get it running.
However, I waited for an hour and the Macro seemed to still be running. There
were no error messages, but Excel said it was not responding, while Visual
Basic Editor said it was still running.

Sorry, I can't give any further clues.

Now that I'm thinking about it, I will try these macros on much smaller
worksheets. I will post back shortly with the results.

Thanks Joel.

:)
 
M

Monomeeth

Hi Joel

Okay, retested the macro on a test worksheet I created. This worksheet had
data in the range A1:AC543. Oh, and yes, the first row was a header row.

Results are as follows:

REMOVE DUPLICATES ALL COLUMNS
Okay, same problem as before. It didn't take long to get the Visual Basic
error message "400". When I click on Help all I get is a blank white window.
I did notice, however, that the worksheet had been sorted - so the error
occurs sometime after the sort is done. However, the duplicate rows had not
beed deleted.

I did try to find out what the error 400 was about, but all I found was some
info relating to showing forms which are already visible - but in this case
we there is no form.

Anyway, hope this helps to narrow the proble down.

REMOVE DUPLICATES ONE COLUMN
Success! This worked. I deliberately designed the test worksheet to have
duplicate rows - both in terms of every column and in terms of a single
column. The macro went through fairly quickly and everything was A-OK.

Thanks for this Joel - I will just have to bear in mind it may not work on
larger worksheets - or take care to run it when the computer isn't doing
anything else in case it's a memory issue.

Now all I have to do is sort out the first Macro.

Thanks again for your help!

:)
 
J

Joel

I asked my original questions because I knew these programs could take a long
time to run. The sorting and putting the X'x in column IV probably increased
the speed of this program by an order of maginitude. Integer Variables
default to 16,536 when not declared. when you have a worksheet larger than
16,536 rows you must declare any variables that are used to count the rows as
long. try putting these statements at the top of the two macros

Dim LastRow as long
Dim RowCount as Long
Dim CompareRow as Long
 
J

Joel

One thing that might help speed up the code s to reduce the size of the
worksheet. Once a cell is used excel sometimes thinks there is data in the
cell. So when you do a .end(xlup) or .end(xltoleft) you get the wrong
results. to solve this probelm is you manually delete the rows and columns
after the used data this helps

to delte the unused rows
1) select A1
2) Press Shift-Cntl-Down Arrow to get to last row
3) Select the row number after the last row of data
4) Press again Shift-Cntl-down Arrow all the rows after your data will be
selected
5) Right click any row number that is selected and choose delete

Repeat the same for columns instead of down arrow use right arrow.
 
M

Monomeeth

Hi Joel

I'm afraid I've run into problems. I've been experimenting and trying to
sort it out myself, not wanting to trouble you anymore than necessary, but
I've hit a brick wall.

I tried to run the REMOVE DUPLICATES ONE COLUMN macro on a "real" worksheet
which had data in the range A1:AL4942, but was met with the Visual Basic
"400" error. I decided to reduce the worksheet down, since this macro did
work on a much smaller worksheet last week. I progressively reduced the
worksheet down from 4942 rows to 2000 rows to 1000 rows to 500 rows and
finally to 250 rows.

I ran the Macro each time with the same result (i.e. the "400" error). I
also tried deleting all unused rows and columns, but to no avail. Adding the
extra DIM statements at the top of the code made no difference either.

DEBUGGING

What I can tell you is that in every instance the first column was filtered,
with a custom filter looking for "X", but there is no "X" in the first
column. The Xs only appear in column IV as they should, so I checked the code
looking for something obvious, but couldn't see anything. I'm now wondering
whether the same problem is affecting both macros?

I also redid my test from last week when I got the macro to work on my dummy
workbook, but cannot replicate the success - so much for my testing. I swear
it worked TWICE last week, but now not at all - so I can't explain that as I
was certainly methodical in my approach.

Sorry to hassle you, but perhaps you might see this as a challenge?

I'm using Excel 2003, just in case that's an issue.

Regards,

Joe.
 
J

Joel

Which line is giving the Error 400?

Try removing the autofilter before running the macro.


If is still fails check the value of LastRow and let me know what LastRow is
being set to.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top