Deleting Large mounts of duplicate data

2

2vayla

I have 26,000 names and 2,000 are duplicates. I needs to delete the names
with duplicate social security numbers. I know I would set up a delete query,
but I don't know what criteria to specify. Everytime I set up the delete
query, it will choose 5,000 names to delete opposed to the 2,000. How would I
narrow this down more to make sure I don't loose the rest of that data. Help!!
 
J

John Spencer

How do you know which record to keep and which record to delete? Do you have
a primary key in the table?

STEP 1: BACKUP your data before attempting the following.
STEP 2: BACKUP your data before attempting the following.

Without a backup you cannot restore the data if this does not work the way you
expect.

If it makes no difference to you which record you keep and which record you
delete then you could just
-- copy the table structure with no records
-- add a unique index on SSN
-- Append all the records in the original table to the new table
-- Replace the original table with the new table.

Of course that may not work because of the way you have the relationships set up.

Another way (assuming a primary key)
DELETE
FROM SSNTable
WHERE SSNTable.PrimaryKey NOT IN
(SELECT First(PrimaryKey)
FROM SSNTable
GROUP BY SSNField)

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
 
2

2vayla

Thank you John I will try this!

John said:
How do you know which record to keep and which record to delete? Do you have
a primary key in the table?

STEP 1: BACKUP your data before attempting the following.
STEP 2: BACKUP your data before attempting the following.

Without a backup you cannot restore the data if this does not work the way you
expect.

If it makes no difference to you which record you keep and which record you
delete then you could just
-- copy the table structure with no records
-- add a unique index on SSN
-- Append all the records in the original table to the new table
-- Replace the original table with the new table.

Of course that may not work because of the way you have the relationships set up.

Another way (assuming a primary key)
DELETE
FROM SSNTable
WHERE SSNTable.PrimaryKey NOT IN
(SELECT First(PrimaryKey)
FROM SSNTable
GROUP BY SSNField)

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top