UDF to merge duplicate rows

Henrietta Horne · Jan 11, 2011

Since you guys were so helpful, I hate to wear out my welcome, but...

Now that I have the high-index data from my previous question, I need
to process the list of words to eliminate duplicates. The table below
shows some sample data. The first column is the worksheet row numbers.
The Word column (B) contains the words. The Score column (C) contains
a count of how often that words occurs is a large collection of text
(400 million words). The Index column (D) contains the result of the
GetHighIndex UDF from my previous post.

I need another UDF to walk down the list looking for duplicate words
(tee, a, in). When it finds a duplicate, it will merge the two rows
into one by adding the Scores and deleting the second row.

In the example below, rows 5 & 6 would be combined to form one row
with a Score of 3301 (199 + 3102).

Rows 62-65 would be combined into one row with a Score of 7,140,219
(808 + 9,711 + 279,364 + 6,850,336).

Only exact matches are combined. The two "balloon" rows would be
combined as would the two "ballooning" rows, but would not include the
"balloonist" row.

Can I impose on someone to get this started?

The parts I am not sure how to do inside a UDF are:

1. How to step through the rows and address the cells (relatively).

2. How to delete a row.

B C D
4 Word Score Index
5 tee 199 2
6 tee 3,102 2
7 a 298 3
8 a 9,996,626 3
9 at 1,730,609 3
10 eat 69,484 3

62 in 808 6
63 in 9,711 6
64 in 279,364 6
65 in 6,850,336 6

2054 dear 97 11
2055 dear 2,015 11
2056 dear 3,364 11
2057 dear 8,417 11

32159 balloon 536 20
32160 balloon 4,887 20
32161 ballooning 28 20
32162 ballooning 82 20
32163 balloonist 51 20

48196 turquoise 435 26
48197 turquoise 718 26

49270 ad-lib 29 27
49271 ad-lib 46 27
49272 ad-libbed 40 27
49273 ad-libbing 19 27

Clif McIrvin · Jan 11, 2011

Henrietta Horne said:
Since you guys were so helpful, I hate to wear out my welcome, but...

Can I impose on someone to get this started?

The parts I am not sure how to do inside a UDF are:

1. How to step through the rows and address the cells (relatively).

The working procedure I'm posting below might give you some ideas to
work with, even though I'm making no attempt to apply it to your

question said:
2. How to delete a row.

one possibility:
Range("A5").Select
Selection.EntireRow.Delete

(code the result of recording a macro while manually selecting a cell
and using the UI to delete the row)
========== begin code

Sub Copy2BreakTable()
' 1/07/11 cm remove auto-filter; use range for find
' 1/25/10 cm add auto-filter on Plant ID ... partially tested
' 08 29 09 cm disable screen updates during startup

Const newRow = 2 ' New Data row
Const lookRows = 300 ' Only search newest rows

Dim ErrNum As Variant
Dim newDate, thisDate, currentDate
Dim newPlant, newLotID
Dim x2, x$, x1$
Dim SearchBegin, SearchEnd, SearchDirection
Dim firstRow, lastRow, currentRow ' cylinder data range
'Dim bPlantFilter As Boolean

'' turn off slow screen updating during searching
' Application.ScreenUpdating = False

Dim screenUpdateState As Boolean
Dim statusBarState As Boolean
Dim calcState As XlCalculation
Dim eventsState As Boolean
Dim displayPageBreakState As Boolean

With Application
screenUpdateState = .ScreenUpdating
statusBarState = .DisplayStatusBar
calcState = .Calculation
eventsState = .EnableEvents
End With

'turn off some Excel functionality so your code runs faster
With Application
.ScreenUpdating = False
.DisplayStatusBar = False
.Calculation = xlCalculationManual
.EnableEvents = False
End With

Set c = Nothing ' insure exit test is valid

If newSub() Then
' newSub returns False if activesheet not valid cylData
' copies new cylinder measurements to row newRow;
' returns with BreakTable!Data as ActiveSheet

' Look in last lookRows rows for match in LotID
' Begin search at current row; look up or down depending
' on date comparison; begin at last row if current row
' is outside of lookRows range

' gather data of interest...
newPlant = Cells(newRow, BreakTableColumns.Plant).Value
newLotID = Cells(newRow, BreakTableColumns.LotID).Value
newDate = Cells(newRow, BreakTableColumns.SampleDate).Value
Set r = Range("cylData") ' all cylinder data
firstRow = r.Row
lastRow = r.Rows.Count + firstRow - 1 'xx' marker row
SearchEnd = lastRow
SearchBegin = lastRow - lookRows
If SearchBegin < firstRow Then
SearchBegin = firstRow
End If
currentRow = Selection.Row
If currentRow < SearchBegin Or _
currentRow >= SearchEnd Then
' search from the bottom up -
' new break data tends to be recent
currentRow = SearchEnd
End If

Select Case newPlant
Case "MCSS"
Case "LCP" ' special processing for LCP worksheet lotID
If Mid(newLotID, 7, 1) = "+" Then
newLotID = Left(newLotID, 7)
ElseIf Right(newLotID, 2) = "++" Then
newLotID = Left(newLotID, 6) & "++"
Else
newLotID = Left(newLotID, 6)
End If
Cells(newRow, BreakTableColumns.LotID).Value = newLotID
newLotID = "xxxx" ' force search on date
Case Else
If Right(newLotID, 1) = "+" Then
newLotID = Left(newLotID, Len(newLotID) - 1)
End If
End Select

' Prepare to search for match on LotID
' search 'up' or 'down' depending on new date

currentDate = Cells(currentRow, BreakTableColumns.SampleDate).Value
Select Case DateDiff("d", currentDate, newDate)
Case Is < 0 ' newDate is older
SearchDirection = xlPrevious
Case Is >= 0 ' newDate is newer
SearchDirection = xlNext
End Select

' define search range

Set r = Range(Cells(SearchBegin, BreakTableColumns.LotID), _
Cells(SearchEnd, BreakTableColumns.LotID))
Set c = Cells(currentRow, BreakTableColumns.LotID)

Set c = r.Find(What:=newLotID, After:=c, _
LookIn:=xlValues, LookAt:=xlPart, _
SearchOrder:=xlByRows, _
SearchDirection:=SearchDirection, _
MatchCase:=False, SearchFormat:=False)
If c Is Nothing Then 'lot not found! -- look for date
Application.Speech.Speak "Lot not found", -1, 0, -1
For x2 = SearchBegin To SearchEnd
thisDate = Cells(x2, BreakTableColumns.SampleDate).Value
If IsEmpty(thisDate) Then Exit For
Select Case DateDiff("d", thisDate, newDate)
Case Is < 0 ' thisDate is future
Exit For
Case 0 ' same date
If DatePart("h", thisDate) = 0 Then
Exit For ' No Sample time (MCSS) so stop here
End If ' else ignore time; continue search
End Select
Next x2 ' SearchBegin To SearchEnd
Set c = Cells(x2, BreakTableColumns.SampleDate)
End If ' c Is Nothing Then lot not found! -- look for date

With c ' scroll to selected row
.Show
.EntireRow.Select
End With

Select Case newPlant
Case "MCSS" 'Set MCSS Summary not printed flag
flagMCSS = True
With Selection.Cells(1, 5).Interior
.ColorIndex = 6
.Pattern = xlSolid
End With
With Cells(4, 5)
.Value = "MCSS Summary not printed"
With .Interior
.ColorIndex = 6
.Pattern = xlSolid
End With
End With
End Select

End If

'' restore screen updates
'put this at the end of your code
With Application
.Calculation = calcState ' cancels cut/copy mode
.EnableEvents = eventsState
.DisplayStatusBar = statusBarState
.ScreenUpdating = screenUpdateState
End With

If Not c Is Nothing Then
Rows(newRow).Copy
End If

Set c = Nothing
Set r = Nothing

End Sub

Don Guillett Excel MVP · Jan 11, 2011

Since you guys were so helpful, I hate to wear out my welcome, but...

Now that I have the high-index data from my previous question, I need
to process the list of words to eliminate duplicates. The table below
shows some sample data. The first column is the worksheet row numbers.
The Word column (B) contains the words. The Score column (C) contains
a count of how often that words occurs is a large collection of text
(400 million words). The Index column (D) contains the result of the
GetHighIndex UDF from my previous post.

I need another UDF to walk down the list looking for duplicate words
(tee, a, in). When it finds a duplicate, it will merge the two rows
into one by adding the Scores and deleting the second row.

In the example below, rows 5 & 6 would be combined to form one row
with a Score of 3301 (199 + 3102).

Rows 62-65 would be combined into one row with a Score of 7,140,219
(808 + 9,711 + 279,364 + 6,850,336).

Only exact matches are combined. The two "balloon" rows would be
combined as would the two "ballooning" rows, but would not include the
"balloonist" row.

Can I impose on someone to get this started?

The parts I am not sure how to do inside a UDF are:

1. How to step through the rows and address the cells (relatively).

2. How to delete a row.

B C D
4 Word Score Index
5 tee 199 2
6 tee 3,102 2
7 a 298 3
8 a 9,996,626 3
9 at 1,730,609 3
10 eat 69,484 3

62 in 808 6
63 in 9,711 6
64 in 279,364 6
65 in 6,850,336 6

2054 dear 97 11
2055 dear 2,015 11
2056 dear 3,364 11
2057 dear 8,417 11

32159 balloon 536 20
32160 balloon 4,887 20
32161 ballooning 28 20
32162 ballooning 82 20
32163 balloonist 51 20

48196 turquoise 435 26
48197 turquoise 718 26

49270 ad-lib 29 27
49271 ad-lib 46 27
49272 ad-libbed 40 27
49273 ad-libbing 19 27

Trying to copy your data for testing didn't work so what you need is a
looping macro from the bottom up to do this.

"If desired, send your file to dguillett @gmail.com I will only look
if:
1. You send a copy of this message on an inserted sheet
2. You give me the newsgroup and the subject line
3. You send a clear explanation of what you want
4. You send before/after examples and expected results."

Charabeuh · Jan 12, 2011

Hello,

Another way without VBA:

I assumed your data are in columns A,B,C
(e.g. Word,Score,Index <==> A1 to C10000)

Select an empty cell on your sheet ( e.g. cell F1)
Select in your menu 'Data' the command 'Consolidate'
Into the windows:
- select in the list function "Sum"
- select in the textbox "Référence" the area of your data to sum
(i.e. A1:B10000)
- click 'Add'
- then select the option 'Top row' and 'Left column')
- clic 'OK'

You will get a table with the sum of score.

Into cell F2, put the following formula to get your index:
=MAX(FIND(MID(F2,ROW(INDIRECT("1:" &
LEN(F2))),1),"etaoinsrhldcumgfpwybvkjxzq-"))
(this is an array formula, you should validate this formula with the three
keys CTRL+SHIFT+Enter instead of the single key Enter)

Then drag down this formula to the end of your consolidated data.

-----------------------------------------------------------------------------------------------------------------
"Henrietta Horne" a écrit dans le message de groupe de discussion :
(e-mail address removed)...

Since you guys were so helpful, I hate to wear out my welcome, but...

Now that I have the high-index data from my previous question, I need
to process the list of words to eliminate duplicates. The table below
shows some sample data. The first column is the worksheet row numbers.
The Word column (B) contains the words. The Score column (C) contains
a count of how often that words occurs is a large collection of text
(400 million words). The Index column (D) contains the result of the
GetHighIndex UDF from my previous post.

I need another UDF to walk down the list looking for duplicate words
(tee, a, in). When it finds a duplicate, it will merge the two rows
into one by adding the Scores and deleting the second row.

In the example below, rows 5 & 6 would be combined to form one row
with a Score of 3301 (199 + 3102).

Rows 62-65 would be combined into one row with a Score of 7,140,219
(808 + 9,711 + 279,364 + 6,850,336).

Only exact matches are combined. The two "balloon" rows would be
combined as would the two "ballooning" rows, but would not include the
"balloonist" row.

Can I impose on someone to get this started?

The parts I am not sure how to do inside a UDF are:

1. How to step through the rows and address the cells (relatively).

2. How to delete a row.

B C D
4 Word Score Index
5 tee 199 2
6 tee 3,102 2
7 a 298 3
8 a 9,996,626 3
9 at 1,730,609 3
10 eat 69,484 3

62 in 808 6
63 in 9,711 6
64 in 279,364 6
65 in 6,850,336 6

2054 dear 97 11
2055 dear 2,015 11
2056 dear 3,364 11
2057 dear 8,417 11

32159 balloon 536 20
32160 balloon 4,887 20
32161 ballooning 28 20
32162 ballooning 82 20
32163 balloonist 51 20

48196 turquoise 435 26
48197 turquoise 718 26

49270 ad-lib 29 27
49271 ad-lib 46 27
49272 ad-libbed 40 27
49273 ad-libbing 19 27

Henrietta Horne · Jan 13, 2011

The working procedure I'm posting below might give you some ideas to
work with, even though I'm making no attempt to apply it to your

one possibility:
Range("A5").Select
Selection.EntireRow.Delete

(code the result of recording a macro while manually selecting a cell
and using the UI to delete the row)

========== begin code

Sub Copy2BreakTable()
' 1/07/11 cm remove auto-filter; use range for find
' 1/25/10 cm add auto-filter on Plant ID ... partially tested
' 08 29 09 cm disable screen updates during startup

Const newRow = 2 ' New Data row
Const lookRows = 300 ' Only search newest rows

Dim ErrNum As Variant
Dim newDate, thisDate, currentDate
Dim newPlant, newLotID
Dim x2, x$, x1$
Dim SearchBegin, SearchEnd, SearchDirection
Dim firstRow, lastRow, currentRow ' cylinder data range
'Dim bPlantFilter As Boolean

'' turn off slow screen updating during searching
' Application.ScreenUpdating = False

Dim screenUpdateState As Boolean
Dim statusBarState As Boolean
Dim calcState As XlCalculation
Dim eventsState As Boolean
Dim displayPageBreakState As Boolean

With Application
screenUpdateState = .ScreenUpdating
statusBarState = .DisplayStatusBar
calcState = .Calculation
eventsState = .EnableEvents
End With

'turn off some Excel functionality so your code runs faster
With Application
.ScreenUpdating = False
.DisplayStatusBar = False
.Calculation = xlCalculationManual
.EnableEvents = False
End With

Set c = Nothing ' insure exit test is valid

If newSub() Then
' newSub returns False if activesheet not valid cylData
' copies new cylinder measurements to row newRow;
' returns with BreakTable!Data as ActiveSheet

' Look in last lookRows rows for match in LotID
' Begin search at current row; look up or down depending
' on date comparison; begin at last row if current row
' is outside of lookRows range

' gather data of interest...
newPlant = Cells(newRow, BreakTableColumns.Plant).Value
newLotID = Cells(newRow, BreakTableColumns.LotID).Value
newDate = Cells(newRow, BreakTableColumns.SampleDate).Value
Set r = Range("cylData") ' all cylinder data
firstRow = r.Row
lastRow = r.Rows.Count + firstRow - 1 'xx' marker row
SearchEnd = lastRow
SearchBegin = lastRow - lookRows
If SearchBegin < firstRow Then
SearchBegin = firstRow
End If
currentRow = Selection.Row
If currentRow < SearchBegin Or _
currentRow >= SearchEnd Then
' search from the bottom up -
' new break data tends to be recent
currentRow = SearchEnd
End If

Select Case newPlant
Case "MCSS"
Case "LCP" ' special processing for LCP worksheet lotID
If Mid(newLotID, 7, 1) = "+" Then
newLotID = Left(newLotID, 7)
ElseIf Right(newLotID, 2) = "++" Then
newLotID = Left(newLotID, 6) & "++"
Else
newLotID = Left(newLotID, 6)
End If
Cells(newRow, BreakTableColumns.LotID).Value = newLotID
newLotID = "xxxx" ' force search on date
Case Else
If Right(newLotID, 1) = "+" Then
newLotID = Left(newLotID, Len(newLotID) - 1)
End If
End Select

' Prepare to search for match on LotID
' search 'up' or 'down' depending on new date

currentDate = Cells(currentRow, BreakTableColumns.SampleDate).Value
Select Case DateDiff("d", currentDate, newDate)
Case Is < 0 ' newDate is older
SearchDirection = xlPrevious
Case Is >= 0 ' newDate is newer
SearchDirection = xlNext
End Select

' define search range

Set r = Range(Cells(SearchBegin, BreakTableColumns.LotID), _
Cells(SearchEnd, BreakTableColumns.LotID))
Set c = Cells(currentRow, BreakTableColumns.LotID)

Set c = r.Find(What:=newLotID, After:=c, _
LookIn:=xlValues, LookAt:=xlPart, _
SearchOrder:=xlByRows, _
SearchDirection:=SearchDirection, _
MatchCase:=False, SearchFormat:=False)
If c Is Nothing Then 'lot not found! -- look for date
Application.Speech.Speak "Lot not found", -1, 0, -1
For x2 = SearchBegin To SearchEnd
thisDate = Cells(x2, BreakTableColumns.SampleDate).Value
If IsEmpty(thisDate) Then Exit For
Select Case DateDiff("d", thisDate, newDate)
Case Is < 0 ' thisDate is future
Exit For
Case 0 ' same date
If DatePart("h", thisDate) = 0 Then
Exit For ' No Sample time (MCSS) so stop here
End If ' else ignore time; continue search
End Select
Next x2 ' SearchBegin To SearchEnd
Set c = Cells(x2, BreakTableColumns.SampleDate)
End If ' c Is Nothing Then lot not found! -- look for date

With c ' scroll to selected row
.Show
.EntireRow.Select
End With

Select Case newPlant
Case "MCSS" 'Set MCSS Summary not printed flag
flagMCSS = True
With Selection.Cells(1, 5).Interior
.ColorIndex = 6
.Pattern = xlSolid
End With
With Cells(4, 5)
.Value = "MCSS Summary not printed"
With .Interior
.ColorIndex = 6
.Pattern = xlSolid
End With
End With
End Select

End If

'' restore screen updates
'put this at the end of your code
With Application
.Calculation = calcState ' cancels cut/copy mode
.EnableEvents = eventsState
.DisplayStatusBar = statusBarState
.ScreenUpdating = screenUpdateState
End With

If Not c Is Nothing Then
Rows(newRow).Copy
End If

Set c = Nothing
Set r = Nothing

End Sub

Wow. I'll need a few days to digest that. Did you just type that up on
the fly or did you already have some similar code?

Clif McIrvin · Jan 13, 2011

Wow. I'll need a few days to digest that. Did you just type that up on
the fly or did you already have some similar code?

It happens that I'd just done some fine-tuning on that routine to speed
it up, so I just opened the module and did a copy / paste for you <g>.

Feel free to come back with questions ... I didn't really offer any
explanation. That code is taking new data that has been placed into row
2 and finding the correct place in the table for that new data. When
originally written, it was based on a recorded macro and used lots of
..Select and .Activate ... and as the table grew it got slower and
slower. Now, it finishes in sub-second times.

Oh ... that routine was one of my very first ventures into VBA, and what
it is now reflects a lot of what I've learned in the three years since.

Have fun with it!

Henrietta Horne · Jan 13, 2011

Trying to copy your data for testing didn't work so what you need is a
looping macro from the bottom up to do this.

"If desired, send your file to dguillett @gmail.com I will only look
if:
1. You send a copy of this message on an inserted sheet
2. You give me the newsgroup and the subject line
3. You send a clear explanation of what you want
4. You send before/after examples and expected results."

I'll try it myself first. I (when) I fail, I may send it over. Thanks.

Henrietta Horne · Jan 13, 2011

Hello,

Another way without VBA:

I assumed your data are in columns A,B,C
(e.g. Word,Score,Index <==> A1 to C10000)

Select an empty cell on your sheet ( e.g. cell F1)
Select in your menu 'Data' the command 'Consolidate'
Into the windows:
- select in the list function "Sum"
- select in the textbox "Référence" the area of your data to sum
(i.e. A1:B10000)
- click 'Add'
- then select the option 'Top row' and 'Left column')
- clic 'OK'

You will get a table with the sum of score.

Into cell F2, put the following formula to get your index:
=MAX(FIND(MID(F2,ROW(INDIRECT("1:" &
LEN(F2))),1),"etaoinsrhldcumgfpwybvkjxzq-"))
(this is an array formula, you should validate this formula with the three
keys CTRL+SHIFT+Enter instead of the single key Enter)

Then drag down this formula to the end of your consolidated data.

Thanks. I'll look into this solution, too.

Help!! find two number recurring numbers in row??	1	Apr 3, 2013
Help to improve UDF please	11	Nov 6, 2007
move data from columns to rows	3	Nov 18, 2009
How do I find duplicate rows, add quantity field & retain one reco	7	May 11, 2006
How to link option box	1	Oct 7, 2008
re : How to delete rows.	0	Mar 28, 2007
merging sheets to master without moving total lines	3	Sep 26, 2008
Averaging specific rows - HELP!	5	Feb 18, 2010

UDF to merge duplicate rows

Henrietta Horne

Clif McIrvin

Don Guillett Excel MVP

Charabeuh

Henrietta Horne

Clif McIrvin

Henrietta Horne

Henrietta Horne

Ask a Question

Similar Threads