J
Jac Tremblay
Hi everyone
I have 40000 row of data and need to delete duplicate rows. Of course, I have checked the Excel programming workgroups and Chip Pearson's site. I found many interesting ideas but I need a bit more. I found out that when I use the advanced filter on the key row, it is a significant gain on time processing. In that particular case, what I do is load each unique value extracted in a dynamic array. Then I select the data column and, for each unique key, modify the first occurrence found by adding a string of 3 "#" in front of it. After that, I replace all the other occurrences by "---" (suggestion of CPearson). Finally, I replace all "###" with an empty string. I can then sort the rows, find out the number of rows containing "---" and delete them. That way of doing works faster when there are more than 1000 rows. But the one part where I loop to replace the first occurrence of each key is too slow. I would like to know if there is some faster way to do that job
The Sub procedure below is called by a main procedure (I have over 20 sheets to deal with). The parameters are the columns (2 or 3) that form the key (strCol1 = "1", strCol2 = "2" and strCol3 = "3"
Thanks in advance
Here is the code
' ******************************************************************
Sub DeleteDuplicateRows(ByVal strCol1, ByVal strCol2, ByVal strCol3
' Jac Tremblay 2004-05-2
Dim strFormula As Strin
Dim strRangeAddress As Strin
Dim strSearch() As Strin
Dim lngNbSearch As Lon
Const strConstDashes As String = "---
Const strConstNumbers As String = "###
Dim intColFilter As Intege
Dim intColFiltre As Strin
Dim lngI As Lon
Dim lngNbLines As Lon
' The number of data rows is found in column D
lngNbLinesBeginning = Range("D3").End(xlDown).Ro
If lngNbLinesBeginning <= 3 The
booSheetDone = Tru
Exit Su
End I
' The number of columns is found in row 2
intNbCol = Range("A2").End(xlToRight).Colum
Columns("D:E").Selec
Selection.Insert Shift:=xlToRigh
intNbCol = intNbCol +
' NbToAA is a fonction that returns the correspondin
' letter(s) of a column number
strNbCol = NbToAA(intNbCol
Range("D3").Selec
Selection.FormulaR1C1 = "1
Selection.AutoFill
Destination:=Range("D3" & lngNbLinesBeginning),
Type:=xlFillSerie
Range("E3:E" & lngNbLinesBeginning).Selec
Selection.NumberFormat = "General
strFormula = "=RC[" & strCol1 & "]&RC[" & strCol2 & "]
If strCol3 <> "" The
strFormula = strFormula & "&RC[" & strCol3 & "]
End I
' The formula concatenates the key columns
Selection.FormulaR1C1 = strFormul
Selection.Cop
Selection.PasteSpecial Paste:=xlPasteValues,
Operation:=xlNone, SkipBlanks:=False,
Transpose:=Fals
Application.CutCopyMode = Fals
' Sort the data on the key and the sequence number
Range("A3:" & strNbCol & lngNbLinesBeginning).Selec
Selection.Sort Key1:=Range("E3"), Key2:=Range("D3"),
Order1:=xlAscending, Header:=xlGuess, OrderCustom:=1,
MatchCase:=False, Orientation:=xlTopToBottom,
DataOption1:=xlSortNormal, DataOption2:=xlSortTextAsNumber
intColFilter = Range("F2").End(xlToRight).Column +
intColFiltre = NbToAA(intColFilter
' Use advanced filter on column E
Range("E2").Selec
ActiveCell.Value = "Key
Range(Selection, Selection.End(xlDown)).Selec
strRangeAddress = ActiveWindow.RangeSelection.Addres
Range(strRangeAddress).AdvancedFilter
Action:=xlFilterCopy,
CopyToRange:=Columns(intColFiltre & ":" & intColFiltre),
Unique:=Tru
' Load unique values in a dynamic array
Range(intColFiltre & "1").Selec
lngNbSearch =
ActiveCell.Offset(1, 0).Selec
Do While ActiveCell.Value <> "
lngNbSearch = lngNbSearch +
ReDim Preserve strSearch(lngNbSearch
strSearch(lngNbSearch) = ActiveCell.Valu
ActiveCell.Offset(1, 0).Selec
Loo
Columns(intColFiltre).Delet
' For each code found, replace the first occurrence with
' the string "###" concatenated to the cell value.
' ***** HERE IS THE PROBLEM - BEGINNING *****
Columns("E:E").Select
For lngI = 1 To lngNbSearch
Selection.Find(What:=strSearch(lngI), After:=ActiveCell, _
LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByColumns, _
SearchDirection:=xlNext, MatchCase:=False, _
SearchFormat:=False).Activate
ActiveCell.Replace What:=strSearch(lngI), _
Replacement:=strConstNumbers & strSearch(lngI), _
LookAt:=xlWhole, SearchOrder:=xlByColumns, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False
Next lngI
' ***** HERE IS THE PROBLEM - END *****
' Replace each other values by "---".
Columns("E:E").Select
For lngI = 1 To lngNbSearch
Selection.Replace What:=strSearch(lngI), _
Replacement:=strConstDashes, _
LookAt:=xlWhole, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next lngI
' Replace the "###" with nothing.
Columns("E:E").Select
Selection.Replace What:=strConstNumbers, _
Replacement:="", _
LookAt:=xlPart, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
ReDim strSearch(0)
' Sort the data on column E.
Range("A3:" & strNbCol & lngNbLinesBeginning).Select
Selection.Sort Key1:=Range("E3"), _
Order1:=xlAscending, Header:=xlGuess, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
' Build a formula in cell D2 pour to get the number of "---".
Range("D2").Select
Selection.FormulaR1C1 = "=COUNTIF(R[1]C[1]:R[" & _
lngNbLinesBeginning & "]C[1],""---"")"
lngNbLines = ActiveCell.Value
' Select and delete the lines.
Rows("3:" & lngNbLines + 2).Select
Selection.Delete Shift:=xlUp
' Count the number of rows left.
lngNbLinesFin = Range("F3").End(xlDown).Row
' Sort the data on column D (sequence).
Range("A3:" & strNbCol & lngNbLinesFin).Select
Selection.Sort Key1:=Range("D3"), Order1:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom, DataOption1:=xlSortTextAsNumbers
' Delete temporary columns D et E.
Columns("D:E").Select
Selection.Delete Shift:=xlToLeft
End Sub
' *******************************************************************
Thanks again.
I have 40000 row of data and need to delete duplicate rows. Of course, I have checked the Excel programming workgroups and Chip Pearson's site. I found many interesting ideas but I need a bit more. I found out that when I use the advanced filter on the key row, it is a significant gain on time processing. In that particular case, what I do is load each unique value extracted in a dynamic array. Then I select the data column and, for each unique key, modify the first occurrence found by adding a string of 3 "#" in front of it. After that, I replace all the other occurrences by "---" (suggestion of CPearson). Finally, I replace all "###" with an empty string. I can then sort the rows, find out the number of rows containing "---" and delete them. That way of doing works faster when there are more than 1000 rows. But the one part where I loop to replace the first occurrence of each key is too slow. I would like to know if there is some faster way to do that job
The Sub procedure below is called by a main procedure (I have over 20 sheets to deal with). The parameters are the columns (2 or 3) that form the key (strCol1 = "1", strCol2 = "2" and strCol3 = "3"
Thanks in advance
Here is the code
' ******************************************************************
Sub DeleteDuplicateRows(ByVal strCol1, ByVal strCol2, ByVal strCol3
' Jac Tremblay 2004-05-2
Dim strFormula As Strin
Dim strRangeAddress As Strin
Dim strSearch() As Strin
Dim lngNbSearch As Lon
Const strConstDashes As String = "---
Const strConstNumbers As String = "###
Dim intColFilter As Intege
Dim intColFiltre As Strin
Dim lngI As Lon
Dim lngNbLines As Lon
' The number of data rows is found in column D
lngNbLinesBeginning = Range("D3").End(xlDown).Ro
If lngNbLinesBeginning <= 3 The
booSheetDone = Tru
Exit Su
End I
' The number of columns is found in row 2
intNbCol = Range("A2").End(xlToRight).Colum
Columns("D:E").Selec
Selection.Insert Shift:=xlToRigh
intNbCol = intNbCol +
' NbToAA is a fonction that returns the correspondin
' letter(s) of a column number
strNbCol = NbToAA(intNbCol
Range("D3").Selec
Selection.FormulaR1C1 = "1
Selection.AutoFill
Destination:=Range("D3" & lngNbLinesBeginning),
Type:=xlFillSerie
Range("E3:E" & lngNbLinesBeginning).Selec
Selection.NumberFormat = "General
strFormula = "=RC[" & strCol1 & "]&RC[" & strCol2 & "]
If strCol3 <> "" The
strFormula = strFormula & "&RC[" & strCol3 & "]
End I
' The formula concatenates the key columns
Selection.FormulaR1C1 = strFormul
Selection.Cop
Selection.PasteSpecial Paste:=xlPasteValues,
Operation:=xlNone, SkipBlanks:=False,
Transpose:=Fals
Application.CutCopyMode = Fals
' Sort the data on the key and the sequence number
Range("A3:" & strNbCol & lngNbLinesBeginning).Selec
Selection.Sort Key1:=Range("E3"), Key2:=Range("D3"),
Order1:=xlAscending, Header:=xlGuess, OrderCustom:=1,
MatchCase:=False, Orientation:=xlTopToBottom,
DataOption1:=xlSortNormal, DataOption2:=xlSortTextAsNumber
intColFilter = Range("F2").End(xlToRight).Column +
intColFiltre = NbToAA(intColFilter
' Use advanced filter on column E
Range("E2").Selec
ActiveCell.Value = "Key
Range(Selection, Selection.End(xlDown)).Selec
strRangeAddress = ActiveWindow.RangeSelection.Addres
Range(strRangeAddress).AdvancedFilter
Action:=xlFilterCopy,
CopyToRange:=Columns(intColFiltre & ":" & intColFiltre),
Unique:=Tru
' Load unique values in a dynamic array
Range(intColFiltre & "1").Selec
lngNbSearch =
ActiveCell.Offset(1, 0).Selec
Do While ActiveCell.Value <> "
lngNbSearch = lngNbSearch +
ReDim Preserve strSearch(lngNbSearch
strSearch(lngNbSearch) = ActiveCell.Valu
ActiveCell.Offset(1, 0).Selec
Loo
Columns(intColFiltre).Delet
' For each code found, replace the first occurrence with
' the string "###" concatenated to the cell value.
' ***** HERE IS THE PROBLEM - BEGINNING *****
Columns("E:E").Select
For lngI = 1 To lngNbSearch
Selection.Find(What:=strSearch(lngI), After:=ActiveCell, _
LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByColumns, _
SearchDirection:=xlNext, MatchCase:=False, _
SearchFormat:=False).Activate
ActiveCell.Replace What:=strSearch(lngI), _
Replacement:=strConstNumbers & strSearch(lngI), _
LookAt:=xlWhole, SearchOrder:=xlByColumns, MatchCase:=False, _
SearchFormat:=False, ReplaceFormat:=False
Next lngI
' ***** HERE IS THE PROBLEM - END *****
' Replace each other values by "---".
Columns("E:E").Select
For lngI = 1 To lngNbSearch
Selection.Replace What:=strSearch(lngI), _
Replacement:=strConstDashes, _
LookAt:=xlWhole, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next lngI
' Replace the "###" with nothing.
Columns("E:E").Select
Selection.Replace What:=strConstNumbers, _
Replacement:="", _
LookAt:=xlPart, SearchOrder:=xlByRows, _
MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
ReDim strSearch(0)
' Sort the data on column E.
Range("A3:" & strNbCol & lngNbLinesBeginning).Select
Selection.Sort Key1:=Range("E3"), _
Order1:=xlAscending, Header:=xlGuess, OrderCustom:=1, _
MatchCase:=False, Orientation:=xlTopToBottom, _
DataOption1:=xlSortNormal
' Build a formula in cell D2 pour to get the number of "---".
Range("D2").Select
Selection.FormulaR1C1 = "=COUNTIF(R[1]C[1]:R[" & _
lngNbLinesBeginning & "]C[1],""---"")"
lngNbLines = ActiveCell.Value
' Select and delete the lines.
Rows("3:" & lngNbLines + 2).Select
Selection.Delete Shift:=xlUp
' Count the number of rows left.
lngNbLinesFin = Range("F3").End(xlDown).Row
' Sort the data on column D (sequence).
Range("A3:" & strNbCol & lngNbLinesFin).Select
Selection.Sort Key1:=Range("D3"), Order1:=xlAscending, _
Header:=xlGuess, OrderCustom:=1, MatchCase:=False, _
Orientation:=xlTopToBottom, DataOption1:=xlSortTextAsNumbers
' Delete temporary columns D et E.
Columns("D:E").Select
Selection.Delete Shift:=xlToLeft
End Sub
' *******************************************************************
Thanks again.