Complicated Filter/Duplicate Question

H

H E Johnson

I'm building a log that needs to be able to spot duplicate IDs, but not block
them. They won't always be exactly the same, so I need it to flag "similar"
IDs by looking at the sets of numbers within the IDs and telling me that ID A
"matches" ID B. I would like it to be automatic. I would use the "Find
Duplicates" function, however, I can't specify what parts of my field I want
looked at. Can this be done? Flagging fields with "similar" parts?
 
J

John W. Vinson

On Tue, 12 May 2009 14:33:01 -0700, H E Johnson <H E
I'm building a log that needs to be able to spot duplicate IDs, but not block
them. They won't always be exactly the same, so I need it to flag "similar"
IDs by looking at the sets of numbers within the IDs and telling me that ID A
"matches" ID B. I would like it to be automatic. I would use the "Find
Duplicates" function, however, I can't specify what parts of my field I want
looked at. Can this be done? Flagging fields with "similar" parts?

Possibly, but unless you tell us your field's datatype and some examples of
what you consider "similar" it's impossible to give specific advice.
 
H

H E Johnson

The field will hold Identification. It can vary from just numbers, just
letters or a mix of both. The duplicates that I will be looking for will
probably have letters and numbers, such as ACBL 1452. But if I want to catch
it's duplicate, I have to be clever, because it can also be entered as
ACBL-1452 or ACBL 1452 B. And it's the same thing. So I was wanting it to
automatically (perhaps during the save, it can run a check; manually checking
would be time consuming.) look at the number portion of the ID. I hope that
helps.. and makes sense.
 
J

John W. Vinson

The field will hold Identification. It can vary from just numbers, just
letters or a mix of both. The duplicates that I will be looking for will
probably have letters and numbers, such as ACBL 1452. But if I want to catch
it's duplicate, I have to be clever, because it can also be entered as
ACBL-1452 or ACBL 1452 B. And it's the same thing. So I was wanting it to
automatically (perhaps during the save, it can run a check; manually checking
would be time consuming.) look at the number portion of the ID. I hope that
helps.. and makes sense.

Well, computers are of course very literal minded. Values are either the same
or different - "sort of the same" is a foreign concept! How about

AXYZ 1452
AXBC 1452

or

ABCL 14 52
ABCL 1 452
ABCL 1-452

Are they "the same"? or the same enough to force a manual check?

What you could do (at the cost of some inefficiency!) is have a little VBA
function that extracts just the numeric substring:

Public Function JustDigits(strIn As String) As String
Dim iPos As Integer
JustDigit = ""
For iPos = 1 to Len(strIn)
If IsNumeric(Mid(strIn, iPos, 1)) Then
JustDigit = JustDigit & Mid(strIn, iPos, 1)
End If
Loop
End Function

This will return "1452" for all the examples above, and you can compare the
value of the textbox to the value in the table. That's where the inefficiency
REALLY grabs, because if you have 12400 part numbers in the table you'll have
to parse every single one of them, unless you store the digit portion in a
separate indexed field.

It's clear that this numbering scheme was designed without computers in mind!
It's easy for a human to look at "ABCL 1452" and "ABCL-1452 B" and say "yeah,
those are the same" - but it's a lot more work for a poor dumb computer!
 
J

John W. Vinson

The field will hold Identification. It can vary from just numbers, just
letters or a mix of both. The duplicates that I will be looking for will
probably have letters and numbers, such as ACBL 1452. But if I want to catch
it's duplicate, I have to be clever, because it can also be entered as
ACBL-1452 or ACBL 1452 B. And it's the same thing. So I was wanting it to
automatically (perhaps during the save, it can run a check; manually checking
would be time consuming.) look at the number portion of the ID. I hope that
helps.. and makes sense.

Hrm. I posted this this morning but it seems to have gotten lost; here's a
copy...

Well, computers are of course very literal minded. Values are either the same
or different - "sort of the same" is a foreign concept! How about

AXYZ 1452
AXBC 1452

or

ABCL 14 52
ABCL 1 452
ABCL 1-452

Are they "the same"? or the same enough to force a manual check?

What you could do (at the cost of some inefficiency!) is have a little VBA
function that extracts just the numeric substring:

Public Function JustDigits(strIn As String) As String
Dim iPos As Integer
JustDigit = ""
For iPos = 1 to Len(strIn)
If IsNumeric(Mid(strIn, iPos, 1)) Then
JustDigit = JustDigit & Mid(strIn, iPos, 1)
End If
Loop
End Function

This will return "1452" for all the examples above, and you can compare the
value of the textbox to the value in the table. That's where the inefficiency
REALLY grabs, because if you have 12400 part numbers in the table you'll have
to parse every single one of them, unless you store the digit portion in a
separate indexed field.

It's clear that this numbering scheme was designed without computers in mind!
It's easy for a human to look at "ABCL 1452" and "ABCL-1452 B" and say "yeah,
those are the same" - but it's a lot more work for a poor dumb computer!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top