Grouping According to Sets of Rows

B

Balthanon

I am in the process of comparing two Excel reports on users where the
data is (or should be) identical, but in one some of the user
information has been grouped into a single identifier. Unfortunately,
I basically need to take the other report, which has anywhere from 8 to
25 rows for a given user and compare it to this one with a consolidated
listing.

For instance, it would be something like this:

Consolidated Report:
User1 identifier1
User2 identifier2
User3 identifier1

Base Report:
User1 infoA
User1 infoB
User1 infoC
User1 infoD
User2 infoC
User2 infoD
User3 infoA
User3 infoB
User3 infoC
User3 infoD

What I basically need to do is consolidate the rows for any user that
has the same grouping of information in a single column. So User1 and
User2 would be reduced to a single row (or flagged with a certain
color, tagged in another cell, etc... anything to automate this a bit)
and I would then replace the "info" column with identifer1. I have
been playing with this off and on for about a day, but I haven't been
able to find an efficient means of consolidating this data so that it
is in a form where I can easily compare the two reports.

I would really like to avoid doing the 4500 or so rows that are in the
base report by hand, given that there are nearly 400 users. If there
are any solutions to this problem that would enable what I'm looking
for, I would really appreciate it.
 
J

JLatham

I think this can be done with VLOOKUP(), but need to make sure of what your
end result needs to look like.

Choice 1:
Consolidated Report
User1 identifier1 infoA infoB infoC infoD
User2 identifier2 infoC infoD
User3 identifier1 infoA infoB infoC infoD
(and should identifier for User3 be identifier1 or identifier3?)

Choice 2:
Base Report:
User1 Identifier1
User1 Identifier1
User1 Identifier1
User1 Identifier1
User2 Identifier2
User2 Identifier2
User3 Identifier3
User3 Identifier3
User3 Identifier3
User3 Identifier3

Which is the result you want. #2 is easier than #1 because of the varying
number of data items for each user in the Base Report.
 
B

Balthanon

#2 is fine for my purposes--in fact that's actually what I'm looking
for, since identifier 1 tells me that the user has infoA, infoB, infoC,
and infoD, it's just not stored that way in our systems. I can go
through and delete the duplicated rows manually if need be, though if
that could be done automatically it would be nice as well.

Also, User3 should be identifier1--the identifier is what I am actually
comparing between the reports.
 
J

JLatham

Ok, that is easiest. VLOOKUP() will work just great.

Set up a formula similar to this in a column next to the entries on the Base
Report (assuming at row 10 and "User#" entries are in column A, and that the
entries from the consolidated report start in column a and go down to row 8)

=VLOOKUP(A10,$A$1:$B$8,2)
and just drag it down the page as far as you need to go. Presumes data is
laid out like you showd on the Consolidated Report for the 1st 2 columns also.

To rapidly identify duplicates in the Base Report, if they are sorted
ascending using both column A and column B, then you can put this formula in
an empty cell on the SECOND row of that group:
=IF(AND(A11=A10,B11=B10),"DUPLICATE","")
That will identify cases where both column A and B in a row are duplicates
of contents of A and B in the row above them.

I'll work up some quick code to delete duplicate rows that can be run from a
macro for you and post again later.
 
J

JLatham

As promised, here's code that will delete duplicate rows where two entries in
two columns are duplicated between 2 adjacent rows. The list should be
sorted using both columns before calling the macro.

Sub RemoveDuplicateRows()
'Works from the bottom up on a sheet
'select the sheet to delete rows from
'before running this macro
'
'Change the column identifiers coming up
'to identify the columns to be examined
'for duplicate entries

'column expect "User#" in
Const FirstColumn = "A" ' change as needed

'column expect "Identifier#" in
Const SecdColumn = "C" ' change as needed

Dim FirstCellToCheck As String
Dim SecdCellToCheck As String
'find the end of the list in the first column
Range(FirstColumn & "65536").End(xlUp).Activate
'work up to 2nd row
Do While ActiveCell.Row > 1
FirstCellToCheck = FirstColumn & ActiveCell.Row
SecdCellToCheck = SecdColumn & ActiveCell.Row
If Range(FirstCellToCheck).Value =
Range(FirstCellToCheck).Offset(-1, 0).Value _
And Range(SecdCellToCheck).Value =
Range(SecdCellToCheck).Offset(-1, 0).Value Then
'the operating code line
Selection.EntireRow.Delete
End If
ActiveCell.Offset(-1, 0).Activate
Loop

End Sub

The line that begins with "If..." is one continuous line all the way to
"...Then"

Here's where you can get a workbook with these example formulas and the code
in them so you can use them or look at them closer in a working workbook.
The workbook does contain a macro, so if your Excel Macro Security setting is
at High, the delete dupe rows won't work. If it's at Medium, you'll get
alert that the code is in there, that's ok this time:
http://www.jlathamsite.com/uploads/ForBalthanon.xls
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top