How to de-duplicate according to certain criteria?

M

Maurinnew

Hi

I want to eliminate duplicates according to the following criterias:

1. If a member has at least 1 row with Date of Shot between 9/1/03-3/31/04
and Accept Calls = Yes and Asked Question = Yes, we'll keep that row.
2. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04 and Accept Calls = Yes, we'll keep that row.
3. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04, we'll keep that row.
3. Otherwise, if they have at least 1 row with Accept Calls = Yes and Asked
Question = Yes, we'll keep that row.
4. Otherwise, if they have at least 1 row with Enroll, we'll keep that row.
5. Otherwise, it doesn't matter which row we keep.
After that I run my query ( it looked like It run) but at last I got the
following Access messge
*******************************************************************************

At most one record can be returned by this subquery. (Error 3354)

***A subquery of this kind cannot return more than one record. Revise the
SELECT statement of the subquery to request only one record.
What I am doing wrong? There was no a syntax error message. I have a feeling
that it is an aggregate function CONTRADICTS BETWEEN CONDITION but do not
know how to fix it :-(

DELETE *
FROM makepart1 AS M1
WHERE EXISTS
(SELECT *
FROM makepart1 As M2
WHERE ( M2.[member id] = M1.[member id]
AND M2.[Date of Shot if Known] <> m1.[Date of Shot if Known]
AND M2.[Date of Shot if Known] NOT BETWEEN 9/1/2003 AND 3/31/2004
OR ( M2.[Date of Shot if Known] IS NULL
OR M2.[Asked Question]<>"yes"
OR M2.[Accepts Calls]<>"Yes")
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 AS M4
WHERE M4.[member id] = M1.[member id]
AND [Date of Shot if Known] IS NOT NULL AND [Date of Shot if Known]
BETWEEN 9/1/2003 AND 3/31/2004 AND [Accepts Calls] ="yes"
GROUP BY M4.[member id],M4.[Date of Shot if Known],M4.[Accepts Calls]
HAVING COUNT(*) > 1)));
 
D

david epsom dot com dot au

I see several subqueries there :~) Which one is Access
objecting to? This one looks suspicious:
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)

If it is grouping by M3.[Accepts Calls], can't it have one MAX()
for each M3.[Accepts Calls]?

(david)






Maurinnew said:
Hi

I want to eliminate duplicates according to the following criterias:

1. If a member has at least 1 row with Date of Shot between
9/1/03-3/31/04
and Accept Calls = Yes and Asked Question = Yes, we'll keep that row.
2. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04 and Accept Calls = Yes, we'll keep that row.
3. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04, we'll keep that row.
3. Otherwise, if they have at least 1 row with Accept Calls = Yes and
Asked
Question = Yes, we'll keep that row.
4. Otherwise, if they have at least 1 row with Enroll, we'll keep that
row.
5. Otherwise, it doesn't matter which row we keep.
After that I run my query ( it looked like It run) but at last I got the
following Access messge:
*******************************************************************************

At most one record can be returned by this subquery. (Error 3354)

***A subquery of this kind cannot return more than one record. Revise the
SELECT statement of the subquery to request only one record.
What I am doing wrong? There was no a syntax error message. I have a
feeling
that it is an aggregate function CONTRADICTS BETWEEN CONDITION but do not
know how to fix it :-(

DELETE *
FROM makepart1 AS M1
WHERE EXISTS
(SELECT *
FROM makepart1 As M2
WHERE ( M2.[member id] = M1.[member id]
AND M2.[Date of Shot if Known] <> m1.[Date of Shot if Known]
AND M2.[Date of Shot if Known] NOT BETWEEN 9/1/2003 AND 3/31/2004
OR ( M2.[Date of Shot if Known] IS NULL
OR M2.[Asked Question]<>"yes"
OR M2.[Accepts Calls]<>"Yes")
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 AS M4
WHERE M4.[member id] = M1.[member id]
AND [Date of Shot if Known] IS NOT NULL AND [Date of Shot if Known]
BETWEEN 9/1/2003 AND 3/31/2004 AND [Accepts Calls] ="yes"
GROUP BY M4.[member id],M4.[Date of Shot if Known],M4.[Accepts Calls]
HAVING COUNT(*) > 1)));
 
M

Maurinnew

David,

It looks like it is too cumbersome way to do it through SQL (IF IT IS
POSSIBLE AT ALL!). Although I am not familiar with VBA I attempted to create
a module with a repetion, checking conditions within the same [MEMBER ID].
This is my 1st code in VBA:-(

What do you think? Does it look reasonable in order to implement this task?
I doubt especially in the last condition. The point is that if above
conditions are not true ...then I still need any record (regardless the
criteria)


Sub deleteDupl()
Dim cmd As ADODB.Command
Dim rst As Recordset
Dim strSQL As String
Dim mID As String

Set cmd = New ADODB.Command
Set rst = New ADODB.Recordset

'fill a recordset with distinct member ID's
cmd.ActiveConnection = CurrentProject.Connection
strSQL = "SELECT DISTINCT [MEMBER ID] FROM makepart1"
cmd.CommandText = strSQL
Set rst = cmd.Execute
'delete cycle

if not rst.eof then rst.MoveFirst
dO wHILE nOT ('ORIG_')

mID =[Member id]

DO
DO CASE

case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004
AND [ACCEPT CALLS]="yes"
AND [ASKED QUESTION]="yes"

rst.MoveNext

[Date of Shot if Known] between 9/1/2003 AND 3/31/2004
AND [ACCEPT CALLS]="yes"

rst.MoveNext
case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004

AND [ASKED QUESTION]="yes"

rst.MoveNext
case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004

rst.MoveNext
CASE
[ACCEPT CALLS]="yes"
AND [ASKED QUESTION]="yes"

rst.MoveNext
CASE
[ACCEPT CALLS]="yes"

rst.MoveNext

CASE
[ASKED QUESTION]="yes"

rst.MoveNext
OTHERWISE

delete
END CASE

LOOP UNTIL [MEMBER ID]=mID
LOOP

Set rst = Nothing
Set cmd = Nothing

End Sub

david epsom dot com dot au said:
I see several subqueries there :~) Which one is Access
objecting to? This one looks suspicious:
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)

If it is grouping by M3.[Accepts Calls], can't it have one MAX()
for each M3.[Accepts Calls]?

(david)






Maurinnew said:
Hi

I want to eliminate duplicates according to the following criterias:

1. If a member has at least 1 row with Date of Shot between
9/1/03-3/31/04
and Accept Calls = Yes and Asked Question = Yes, we'll keep that row.
2. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04 and Accept Calls = Yes, we'll keep that row.
3. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04, we'll keep that row.
3. Otherwise, if they have at least 1 row with Accept Calls = Yes and
Asked
Question = Yes, we'll keep that row.
4. Otherwise, if they have at least 1 row with Enroll, we'll keep that
row.
5. Otherwise, it doesn't matter which row we keep.
After that I run my query ( it looked like It run) but at last I got the
following Access messge:
*******************************************************************************

At most one record can be returned by this subquery. (Error 3354)

***A subquery of this kind cannot return more than one record. Revise the
SELECT statement of the subquery to request only one record.
What I am doing wrong? There was no a syntax error message. I have a
feeling
that it is an aggregate function CONTRADICTS BETWEEN CONDITION but do not
know how to fix it :-(

DELETE *
FROM makepart1 AS M1
WHERE EXISTS
(SELECT *
FROM makepart1 As M2
WHERE ( M2.[member id] = M1.[member id]
AND M2.[Date of Shot if Known] <> m1.[Date of Shot if Known]
AND M2.[Date of Shot if Known] NOT BETWEEN 9/1/2003 AND 3/31/2004
OR ( M2.[Date of Shot if Known] IS NULL
OR M2.[Asked Question]<>"yes"
OR M2.[Accepts Calls]<>"Yes")
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 AS M4
WHERE M4.[member id] = M1.[member id]
AND [Date of Shot if Known] IS NOT NULL AND [Date of Shot if Known]
BETWEEN 9/1/2003 AND 3/31/2004 AND [Accepts Calls] ="yes"
GROUP BY M4.[member id],M4.[Date of Shot if Known],M4.[Accepts Calls]
HAVING COUNT(*) > 1)));
 
M

Maurinnew

Actually I just updated it but it doesn't work yet and generates a compile
error: syntax error.

However my concern is mostly logic according to my purpose...Can I dedup
this way without deleteng those who do not have dups?!

Sub deleteDupl()
Dim cmd As ADODB.Command
Dim rst As Recordset
Dim strSQL As String
Dim mID As String
Dim dShot As Date
Dim mAccept As String
Dim mAsked As String

Set cmd = New ADODB.Command
Set rst = New ADODB.Recordset

'fill a recordset with distinct member ID's
cmd.ActiveConnection = CurrentProject.Connection
strSQL = "SELECT DISTINCT [MEMBER ID] FROM makepart1"
cmd.CommandText = strSQL
Set rst = cmd.Execute
'delete cycle

If Not rst.EOF Then rst.MoveFirst
Do While Not (orig_)

mID = orig_.[MEMBER ID]
mAccept = orig_.[ACCEPTS CALLS]
mAsked = orig_.[Asked Question]
dShot = orig_.[date Of Shot If Known]


Do While orig_.[Member_id]=mID
Select Case mID
Case 1
dShot >= 09/01/03 AND dShot <= 3/31/2004 AND bAccept =
"y" And mAsked = "YES"

rst.MoveNext

Case 2

dShot between 9/1/2003 AND 3/31/2004 AND bAccept="y"

rst.MoveNext

Case 3
dShot between 9/1/2003 AND 3/31/2004 AND mAsked ="YES"

rst.MoveNext

Case 4
dShot between 9/1/2003 AND 3/31/2004

rst.MoveNext

Case 5
bAccept = "yes" And mAsked = "YES"

rst.MoveNext

Case 6

bAccept = "y"

rst.MoveNext

Case 7
mAsked = "YES"

rst.MoveNext

Case Else

rst.Delete

End Select

Loop
Loop


Set rst = Nothing
Set cmd = Nothing

End Sub





Maurinnew said:
David,

It looks like it is too cumbersome way to do it through SQL (IF IT IS
POSSIBLE AT ALL!). Although I am not familiar with VBA I attempted to create
a module with a repetion, checking conditions within the same [MEMBER ID].
This is my 1st code in VBA:-(

What do you think? Does it look reasonable in order to implement this task?
I doubt especially in the last condition. The point is that if above
conditions are not true ...then I still need any record (regardless the
criteria)


Sub deleteDupl()
Dim cmd As ADODB.Command
Dim rst As Recordset
Dim strSQL As String
Dim mID As String

Set cmd = New ADODB.Command
Set rst = New ADODB.Recordset

'fill a recordset with distinct member ID's
cmd.ActiveConnection = CurrentProject.Connection
strSQL = "SELECT DISTINCT [MEMBER ID] FROM makepart1"
cmd.CommandText = strSQL
Set rst = cmd.Execute
'delete cycle

if not rst.eof then rst.MoveFirst
dO wHILE nOT ('ORIG_')

mID =[Member id]

DO
DO CASE

case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004
AND [ACCEPT CALLS]="yes"
AND [ASKED QUESTION]="yes"

rst.MoveNext

[Date of Shot if Known] between 9/1/2003 AND 3/31/2004
AND [ACCEPT CALLS]="yes"

rst.MoveNext
case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004

AND [ASKED QUESTION]="yes"

rst.MoveNext
case
[Date of Shot if Known] between 9/1/2003 AND 3/31/2004

rst.MoveNext
CASE
[ACCEPT CALLS]="yes"
AND [ASKED QUESTION]="yes"

rst.MoveNext
CASE
[ACCEPT CALLS]="yes"

rst.MoveNext

CASE
[ASKED QUESTION]="yes"

rst.MoveNext
OTHERWISE

delete
END CASE

LOOP UNTIL [MEMBER ID]=mID
LOOP

Set rst = Nothing
Set cmd = Nothing

End Sub

david epsom dot com dot au said:
I see several subqueries there :~) Which one is Access
objecting to? This one looks suspicious:
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)

If it is grouping by M3.[Accepts Calls], can't it have one MAX()
for each M3.[Accepts Calls]?

(david)






Maurinnew said:
Hi

I want to eliminate duplicates according to the following criterias:

1. If a member has at least 1 row with Date of Shot between
9/1/03-3/31/04
and Accept Calls = Yes and Asked Question = Yes, we'll keep that row.
2. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04 and Accept Calls = Yes, we'll keep that row.
3. Otherwise, if they have at least 1 row with Date of Shot between
9/1/03-3/31/04, we'll keep that row.
3. Otherwise, if they have at least 1 row with Accept Calls = Yes and
Asked
Question = Yes, we'll keep that row.
4. Otherwise, if they have at least 1 row with Enroll, we'll keep that
row.
5. Otherwise, it doesn't matter which row we keep.
After that I run my query ( it looked like It run) but at last I got the
following Access messge:
*******************************************************************************

At most one record can be returned by this subquery. (Error 3354)

***A subquery of this kind cannot return more than one record. Revise the
SELECT statement of the subquery to request only one record.
What I am doing wrong? There was no a syntax error message. I have a
feeling
that it is an aggregate function CONTRADICTS BETWEEN CONDITION but do not
know how to fix it :-(

DELETE *
FROM makepart1 AS M1
WHERE EXISTS
(SELECT *
FROM makepart1 As M2
WHERE ( M2.[member id] = M1.[member id]
AND M2.[Date of Shot if Known] <> m1.[Date of Shot if Known]
AND M2.[Date of Shot if Known] NOT BETWEEN 9/1/2003 AND 3/31/2004
OR ( M2.[Date of Shot if Known] IS NULL
OR M2.[Asked Question]<>"yes"
OR M2.[Accepts Calls]<>"Yes")
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 as M3
WHERE M3.[MEMBER ID]=M1.[Member id]
GROUP BY M3.[member id], M3.[Date of Shot if Known] ,M3.[Accepts Calls]
HAVING COUNT(*) > 1)
AND [Date of Shot if Known] =
(SELECT MAX([Date of Shot if Known])
FROM makepart1 AS M4
WHERE M4.[member id] = M1.[member id]
AND [Date of Shot if Known] IS NOT NULL AND [Date of Shot if Known]
BETWEEN 9/1/2003 AND 3/31/2004 AND [Accepts Calls] ="yes"
GROUP BY M4.[member id],M4.[Date of Shot if Known],M4.[Accepts Calls]
HAVING COUNT(*) > 1)));
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top