Excel vba query - data integrity

R

Rachel Curran

Please can anybody help me with the following:

I have two excel spreadsheets, each sheet holds the same column
headings (25 headings)

I want to compare both sheets with a primary key (ID No:) and then
show the records that DO NOT match. I do not however want to do
vlookups, or if's statements because there will be approx 36000 rows
of data in each sheet.

From the records that DO NOT match I need to locate the information
that has changed. But I only want to bring in the changes that have
occurred, for example 1:

Spreadsheet1 - ID No:12345 DOB: 28/02/78 Grade: A Department: Finance
Hours: 37
Spreadsheet2 - ID No:12345 DOB: 28/02/78 Grade: B Department; HR
Hours:37

Result = ID No: 12345 DOB (blank)Grade: B Department: HR Hours:
(blank)

Example 2:

Spreadsheet1 - ID No:56789 DOB: 28/09/82 Grade: B Department: HR
Hours: 37
Spreadsheet2 - ID No:56789 DOB: 28/09/82 Grade: BS Department: HR
Hours: 25

Result = ID No: 56789 DOB (blank)Grade: BS Department:(blank)Hours: 25

Any help would be greatly appreciated, thankyou in advance.


Rachel
 
N

Nigel

Hi Rachel
Are you saying the the row on sheet1 is directly matched the same row on
sheet2? Or is the sequence different, or are their different records on
each sheet? Since you mention there is a key, I assume that the later is
true and you wish to test each row on sheet 1 with that on sheet 2 and vice
versa - unless one sheet is the master against which you wish to test the
other sheet entries.

I think the answers to the above would help point to a solution, but on the
face of it it will not be quick with 36k rows to cross check.

Cheers
Nigel
 
R

RB Smissaert

Either assign both sheets to arrays and loop through them or use SQL on both
sheets.
Both will be reasonably fast.

RBS
 
K

keepITcool

Rachel,

Have you tried an addin called Synkronizer?

I think it will do exactly what you want.(compare, filter & update
books, sheets & (excel) databases..

AND it will not just work for this particular situation, but you can use
it for all similar compare needs.

Just give it a go.. 30 day trial. http://www.synkronizer.com


keepITcool

< email : keepitcool chello nl (with @ and .) >
< homepage: http://members.chello.nl/keepitcool >
 
A

Al

It would be most efficient to export both sets of data to
Access. Then use queries to find the sets of data you are
looking for.
 
J

Jamie Collins

...
It would be most efficient to export both sets of data to
Access. Then use queries to find the sets of data you are
looking for.

Because Jet is the most efficient database engine available <g>? Or
because you would write horrible dynamic SQL such as:

SELECT
T2.ID,
IIF(T2.DOB=T1.DOB, '', T2.DOB) AS DOB,
IIF(T2.Grade=T1.Grade, '', T2.Grade) AS Grade,
IIF(T2.Department=T1.Department, '',
T2.Department) AS Department,
IIF(T2.Hours=T1.Hours, '', T2.Hours) AS Hours
FROM
SpreadSheet1 T1
INNER JOIN
SpreadSheet1 T2
ON T1.ID=T2.ID
;

Jamie.

--
 
J

Jamie Collins

To correct the typo (original query was valid but the self join is unwanted!):

SELECT
T2.ID,
IIF(T2.DOB=T1.DOB, '', T2.DOB) AS DOB,
IIF(T2.Grade=T1.Grade, '', T2.Grade) AS Grade,
IIF(T2.Department=T1.Department, '', T2.Department) AS Department,
IIF(T2.Hours=T1.Hours, '', T2.Hours) AS Hours
FROM
SpreadSheet1 T1
INNER JOIN
SpreadSheet2 T2
ON T1.ID=T2.ID
;

Jamie.

--
 
R

Rachel Curran

To correct the typo (original query was valid but the self join is unwanted!):

SELECT
T2.ID,
IIF(T2.DOB=T1.DOB, '', T2.DOB) AS DOB,
IIF(T2.Grade=T1.Grade, '', T2.Grade) AS Grade,
IIF(T2.Department=T1.Department, '', T2.Department) AS Department,
IIF(T2.Hours=T1.Hours, '', T2.Hours) AS Hours
FROM
SpreadSheet1 T1
INNER JOIN
SpreadSheet2 T2
ON T1.ID=T2.ID
;

Jamie.

--

hi Jamie,

Thankyou for helping me, i have tried what you suggested but
unfortunately i keep getting error messages with the syntax, could you
possibly look at my code and let me know whats wrong with it please?

He goes:

SELECT
[2ndJuly_Test].GCI,

IIF([2ndJuly_Test]. DATE_OF_BIRTH=[6thJuly_Test]. DATE_OF_BIRTH,
'"', [2ndJuly_Test]. DATE_OF_BIRTH) AS DATE_OF_BIRTH,

IIF([2ndJuly_Test].GRADE=[6thJuly_Test]. GRADE, "'',
[2ndJuly_Test].GRADE) AS GRADE,

IIF([2ndJuly_Test].DEPT_DESCR=[6thJuly_Test].DEPT_DESCR, '"',
[2ndJuly_Test].DEPT_DESCR) AS DEPT_DESCR,

IIF([2ndJuly_Test].BUSINESS_SUB_AREA=[6thJuly_Test].BUSINESS_SUB_AREA,
"'', [2ndJuly_Test].BUSINESS_SUB_AREA) AS BUSINESS_SUB_AREA

FROM
6thJuly_Test

INNER JOIN

2ndJuly_Test

ON [6thJuly_Test].GCI=[2ndJuly_Test].GCI
;

I really appreciate this, thankyou - also could you tell me what the
first line of code actually does and the last four lines please
(curiosity more than anything)

Thanks again

Rachel
 
J

Jamie Collins

Rachel Curran wrote ...
unfortunately i keep getting error messages with the syntax, could you
possibly look at my code and let me know whats wrong with it please?

Two points:

1) There should be no white space (e.g. space characters) between
either side of the dot that separates table name and column/field name
e.g.

MyTable.MyColumn

2) My intension was for the IIF function to return a null string,
represented by two single quotes and nothing in between i.e. Chr$(39)
& Chr$(39).

I prefer to alias the table names for brevity (I tend to write SQL
from scratch and it saves a bit of keying); you may be able to come up
with more meaningful names than T1 and T2 or prefer to use the full
table name.

Here's your query re-written with the above in mind (but untested):

SELECT
T2.GCI,

IIF(T2.DATE_OF_BIRTH=T1.DATE_OF_BIRTH,
'', T2.DATE_OF_BIRTH) AS DATE_OF_BIRTH,

IIF(T2.GRADE=T1. GRADE, '',
T2.GRADE) AS GRADE,

IIF(T2.DEPT_DESCR=T1.DEPT_DESCR, '',
T2.DEPT_DESCR) AS DEPT_DESCR,

IIF(T2.BUSINESS_SUB_AREA=T1.BUSINESS_SUB_AREA,
'', T2.BUSINESS_SUB_AREA) AS BUSINESS_SUB_AREA

FROM
6thJuly_Test T1

INNER JOIN

2ndJuly_Test T2

ON T1.GCI=T2.GCI;
could you tell me what the
first line of code actually does and the last four lines please
(curiosity more than anything)

You seem to want me to explain what a SELECT query with a JOIN is. No
offence, but I think that is beyond the scope of a newsgroup post!
It's good to be curious, though. If you have it, take a look in the MS
Access help files or Google it: there's a wealth of info out there.

HTH,
Jamie.

--
 
R

Rachel Curran

Rachel Curran wrote ...


Two points:

1) There should be no white space (e.g. space characters) between
either side of the dot that separates table name and column/field name
e.g.

MyTable.MyColumn

2) My intension was for the IIF function to return a null string,
represented by two single quotes and nothing in between i.e. Chr$(39)
& Chr$(39).

I prefer to alias the table names for brevity (I tend to write SQL
from scratch and it saves a bit of keying); you may be able to come up
with more meaningful names than T1 and T2 or prefer to use the full
table name.

Here's your query re-written with the above in mind (but untested):

SELECT
T2.GCI,

IIF(T2.DATE_OF_BIRTH=T1.DATE_OF_BIRTH,
'', T2.DATE_OF_BIRTH) AS DATE_OF_BIRTH,

IIF(T2.GRADE=T1. GRADE, '',
T2.GRADE) AS GRADE,

IIF(T2.DEPT_DESCR=T1.DEPT_DESCR, '',
T2.DEPT_DESCR) AS DEPT_DESCR,

IIF(T2.BUSINESS_SUB_AREA=T1.BUSINESS_SUB_AREA,
'', T2.BUSINESS_SUB_AREA) AS BUSINESS_SUB_AREA

FROM
6thJuly_Test T1

INNER JOIN

2ndJuly_Test T2

ON T1.GCI=T2.GCI;


You seem to want me to explain what a SELECT query with a JOIN is. No
offence, but I think that is beyond the scope of a newsgroup post!
It's good to be curious, though. If you have it, take a look in the MS
Access help files or Google it: there's a wealth of info out there.

HTH,
Jamie.

--

Thats a great help, thankyou - only one problem - this brings back all
the gci (primary key) even if no change occurred. So ther are the same
amount of rows in both the tables and query. ive tried "if not null"
as the criteria for the iif statements, but this still didnt work. So
majority of lines hold a gci with nothing populated in all other
columns. Is it possible to eliminate the rows with only the gci on.eg

Result

gci 12345 DOB(blank) NINO (blank) BUSINESS(blank)
gci 45678 DOB(blank) NINO (blank) BUSINESS(blank)
gci 00000 DOB(blank) NINO ab000000Z BUSINESS(blank)
etc

It does however give me all the changes that have occurred, which is
great - i suppose i could always export to excel and filter on non
blanks from that program.

Any ideas please?
 
J

Jamie Collins

Rachel Curran wrote ...
this brings back all
the gci (primary key) even if no change occurred. So ther are the same
amount of rows in both the tables and query.

Append a WHERE clause to your SELECT query e.g.

SELECT
<<column list>>
FROM
<<table list>>
WHERE
T1.DATE_OF_BIRTH <> T2.DATE_OF_BIRTH
OR T1.GRADE <> T2.GRADE
OR T1.DEPT_DESCR <> T2.DEPT_DESCR
OR T1.BUSINESS_SUB_AREA <> T2.BUSINESS_SUB_AREA;

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top