yet another remove duplicates question

D

Dave Helps

Hi

I've searched all around the net, but no "solution" seems to do what I'm
trying to achieve.

I have a database that imports data from a group of Excel spreadsheets. Due
to the manner in which these are generated, some duplicate information will
exist in the database after importing.

The table in question is called LOG, and has a unique primary key. The
remaining fields are EntryDate, EntryTime, EntryType and EntryMessage.

I would like to make a single query using SQL in Access that searches the
LOG table and removes all but the oldest record where EntryDate, EntryTime,
EntryType and EntryMessage are identical.

I figure there are few ways to do this:
1. Rename LOG to OLD_LOG, do a SELECT DISTINCT into LOG.
2. Use a GROUPBY and COUNT to return duplicates, and then delete those with
ENTRYID > MIN(ENTRYID).
3. Use a SELECT query to find duplicate records and save them into
DUPLICATES, then use a DELETE query WHERE LOG.* = DUPLICATES.*


The trouble is although I understand the principles of databases, I have
only a basic knowledge of SQL so I'm kindof winging it. Perhaps someone out
there would be kind enough to help?

Thanks

..Dave
 
M

Michel Walsh

Append the data into a table with an index not allowing duplicated field(s)
values. There will be errors, when you will append the data, but if the
error is only about 'duplicated values' were not appended to the table,
well, that is what you wanted.

More robust would be to use a GROUP BY query, GROUP on field(s) defining the
duplication, and aggregate all other fields with LAST:


SELECT f1, f2, LAST(f3), LAST(f4), LAST(f5)
FROM myTableWIthDups
GROUP BY f1, f2



will list you no-dup pair of values from any couple {f1, f2}



So, basically you don't delete anything, you SELECT without duplication,
with that method.


Hoping it may help,
Vanderghast, Access MVP
 
D

Dave Helps

Thanks for responding.

So there is a LAST command, is there also a FIRST?

The thing is I want the duplicate data removing, so how would I modify your
reponse so that it would delete the original table and replace it with the
results of the query you suggested?
 
M

Michel Walsh

Instead of deleting, you select, as suggested, and make a table out of it,
if necessary.

Yes, there is also a FIRST, but do not consider they MEAN earliest and
latest. They just refer to the first/last record seen by the database engine
WHILE executing the query. They do not correspond necessary to the first and
last record appended to the table. Their usefulness is that you can be sure
that all fields aggregated by LAST will come from the SAME record (among all
the records of the given GROUP). Same for FIRST.


Hoping it may help,
Vanderghast, Access MVP
 
D

Dale Fye

Dave, I am very leary of using First and Last. Is your EntryMessage a text
or a Memo field? If text, the following should work. Backup your table
before attempting this.

Create two queries, the first will identify the MIN(ID) of the duplicate
records, the second will delete all records that are not in the first query.

1. SELECT MIN(ID) as EarliestDup
FROM LOG
GROUP BY EntryDate, EntryTime, EntryType, EntryMessage

2. DELETE FROM LOG
WHERE DCOUNT("EarliestDup", "qry_EarliestDup", "EarliestDup = " &
Log.ID) = 0

Now that I think about it. When you are importing the data, why not import
it into a temporary table, then join that table to the LOG table on the 4
fields, and only import the records into the LOG table that don't have
matches? That would be alot simpler.

Dale
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top