Grouping variable length values

  • Thread starter bluegrassstateworker
  • Start date
B

bluegrassstateworker

I have a field called contractnum which contains a variable-length
string from 8 -17
characters. My difficulty is that when I sort them, I would like to
group them based on on the section of digits just before the last.
The 6 entries below would be grouped as described. My problem is
differnet lengths in various entries.
The last digit in some entries may represent a change to the
original. For example,
123 and P123555 both represent original documents and the last digits
represent changes to those.

123 (=original number)
123 1 (a modification to the original 123)
123 2 (a second modification to the original 123)

These below would also be grouped together but are of a different
format but would still be sorted and grouped thusly:
P123555
P123555 1
P123555 2

Any ideas how to achieve this grouping would be appreciated.
 
M

Marshall Barton

bluegrassstateworker said:
I have a field called contractnum which contains a variable-length
string from 8 -17
characters. My difficulty is that when I sort them, I would like to
group them based on on the section of digits just before the last.
The 6 entries below would be grouped as described. My problem is
differnet lengths in various entries.
The last digit in some entries may represent a change to the
original. For example,
123 and P123555 both represent original documents and the last digits
represent changes to those.

123 (=original number)
123 1 (a modification to the original 123)
123 2 (a second modification to the original 123)

These below would also be grouped together but are of a different
format but would still be sorted and grouped thusly:
P123555
P123555 1
P123555 2


It seems like the presence of a space character is the key
to this problem. If the space can be used reliably to
delimit the modification, then you can group on an
expression like:

=IIf(Instr(contractnum, " ") > 0, Left(contractnum,
Instr(contractnum, " ") - 1), contractnum)
 
B

bluegrassstateworker

bluegrassstateworker said:
I have a field called contractnum which contains a variable-length
string from 8 -17
characters. My difficulty is that when I sort them, I would like to
group them based on on the section of digits just before the last.
The 6 entries below would be grouped as described. My problem is
differnet lengths in various entries.
The last digit in some entries may represent a change to the
original. For example,
123 and P123555 both represent original documents and the last digits
represent changes to those.
123 (=original number)
123 1 (a modification to the original 123)
123 2 (a second modification to the original 123)
These below would also be grouped together but are of a different
format but would still be sorted and grouped thusly:
P123555
P123555 1
P123555 2

It seems like the presence of a space character is the key
to this problem. If the space can be used reliably to
delimit the modification, then you can group on an
expression like:

=IIf(Instr(contractnum, " ") > 0, Left(contractnum,
Instr(contractnum, " ") - 1), contractnum)

--
Marsh
MVP [MS Access]- Hide quoted text -

- Show quoted text -

Thanks Marsh. This is a good avenue for me to pursue. There can be
multiple and varied spaces in contractnum but it is the one on the far
right that makes the difference.
 
M

Marshall Barton

bluegrassstateworker said:
Thanks Marsh. This is a good avenue for me to pursue. There can be
multiple and varied spaces in contractnum but it is the one on the far
right that makes the difference.


Better examples might have avoided this confusion.

You can find the last space using:

=IIf(InStrRev(contractnum, " ") > 0, Left(contractnum,
InStrRev(contractnum, " ") - 1), contractnum)

BUT if contract numbers can contain a space, how do you
propose to distinguish between:
a number with a space and no revision: X12345 6
and
a number without a space, but with a revision: X12345 6
???

It sounds like more examples of representative contract
numbers are needed to determine if this is even possible,
and. if it is possible, how to go about it.
 
B

bluegrassstateworker

Thanks Marsh. This is a good avenue for me to pursue. There can be
multiple and varied spaces in contractnum but it is the one on the far
right that makes the difference.

Better examples might have avoided this confusion.

You can find the last space using:

=IIf(InStrRev(contractnum, " ") > 0, Left(contractnum,
InStrRev(contractnum, " ") - 1), contractnum)

BUT if contract numbers can contain a space, how do you
propose to distinguish between:
a number with a space and no revision: X12345 6
and
a number without a space, but with a revision: X12345 6
???

It sounds like more examples of representative contract
numbers are needed to determine if this is even possible,
and. if it is possible, how to go about it.

--
Marsh
MVP [MS Access]- Hide quoted text -

- Show quoted text -

In looking at the data, there are more differences than I thought.
These contract numbers represent formats adopted in different years by
various groups. Below are some of the contract numbers (edited a
second time as noted by the number 2 at the end) that might be used:

Blank/null = an error. There should be data but there is not.
PO2 721 535353535 2
PON2 721 747474878 2
728 M-78665778 2
M-06234695244 2

My initial thought in the programming process was to sort them, then
compare the current contractnum value to the next contractnum value
with the last digit stripped off. If they are equal, then that is
what make them grouped together. I thought the instr() function
would work but the variety is making it more challenging.
 
B

bluegrassstateworker

In looking at the data, there are more differences than I thought.
These contract numbers represent formats adopted in different years by
various groups. Below are some of the contract numbers (edited a
second time as noted by the number 2 at the end) that might be used:

Blank/null = an error. There should be data but there is not.
PO2 721 535353535 2
PON2 721 747474878 2
728 M-78665778 2
M-06234695244 2

My initial thought in the programming process was to sort them, then
compare the current contractnum value to the next contractnum value
with the last digit stripped off. If they are equal, then that is
what make them grouped together. I thought the instr() function
would work but the variety is making it more challenging.- Hide quoted text -

- Show quoted text -

Below is my formula but something is amiss.
IIf(IsNull(Right([contractnum],
2)),Left([contractnum],Len([contractnum]-2)),[contractnum])

If I sort on the contract number, then all similar numbers will be
adjacent to each other. Since the last digit of an "edited contract"
will always have a space and a number then I can create a field in the
query of this report using the above formula. The idea is to strip
off the last two digits of an edited contract entry making the value
of this query field the same value of the "original contract" on which
I can have the report perform the grouping. I think I am on the right
path, something though is wrong with my formula. Any ideas? I will
need to tackle this in the morning with fresh coffee rather than the
end of the day.
 
M

Marshall Barton

bluegrassstateworker said:
In looking at the data, there are more differences than I thought.
These contract numbers represent formats adopted in different years by
various groups. Below are some of the contract numbers (edited a
second time as noted by the number 2 at the end) that might be used:

Blank/null = an error. There should be data but there is not.
PO2 721 535353535 2
PON2 721 747474878 2
728 M-78665778 2
M-06234695244 2

My initial thought in the programming process was to sort them, then
compare the current contractnum value to the next contractnum value
with the last digit stripped off. If they are equal, then that is
what make them grouped together. I thought the instr() function
would work but the variety is making it more challenging.


Take another look to see if **all** the numbers have a
single digit version part AND if **none** of the numbers
without revision have only a single digit after the space.

If both of those conditions holds true throughout the table,
then you can use:

=IIf(contractnum Like "* #", Left(contractnum,
Len(contractnum) - 2), contractnum)
 
B

bluegrassstateworker

bluegrassstateworkerwrote:




Take another look to see if **all** the numbers have a
single digit version part AND if **none** of the numbers
without revision have only a single digit after the space.

If both of those conditions holds true throughout the table,
then you can use:

=IIf(contractnum Like "* #", Left(contractnum,
Len(contractnum) - 2), contractnum)

After review, All numbers do NOT have a single digit at the end ....
they can go into the double digits. If I do a sort on all of the
contract numbers I <could> make some generalities perhaps in groupings
that would statistically work. What if I did a comparision of the
prior in part of a string length? For example:
PO2 721 0785432112
PO2 721 0785432112 2
PO2 721 0785432112 4
PO2 721 0785432112 5
PO 721 0785432135
PO 721 0785432135 2

If I consider a string of 6-10 elements (the more the better) of each
string in the comparision using the INSTR function and assign a
variable of 1 to that if the condition were true, and a 0 value if
false then that might work. In my favor is that it is rare that there
is a sequential difference of only one digit; others outside our
department are being issued unique numbers as well. If I assign a
value as such in a query field, then I can group on the change in that
field: every 0 value would be indicative of a new parent number and a
1 would be indicative of an edit. I would have a field with nothing
but 1s and 0s. Since all of the numbers are sorted, then that just
might achieve the effect I am desiring. What I will do is find the
minimum data length in the data set and the typical length of the
unedited contract number. Only the early imported numbers will be
short so I could increase accuracy when these historical numbers are
able to fall out of the system.

Would this approach be valid in its assumptions?
 
M

Marshall Barton

bluegrassstateworker said:
After review, All numbers do NOT have a single digit at the end ....
they can go into the double digits. If I do a sort on all of the
contract numbers I <could> make some generalities perhaps in groupings
that would statistically work. What if I did a comparision of the
prior in part of a string length? For example:
PO2 721 0785432112
PO2 721 0785432112 2
PO2 721 0785432112 4
PO2 721 0785432112 5
PO 721 0785432135
PO 721 0785432135 2

If I consider a string of 6-10 elements (the more the better) of each
string in the comparision using the INSTR function and assign a
variable of 1 to that if the condition were true, and a 0 value if
false then that might work. In my favor is that it is rare that there
is a sequential difference of only one digit; others outside our
department are being issued unique numbers as well. If I assign a
value as such in a query field, then I can group on the change in that
field: every 0 value would be indicative of a new parent number and a
1 would be indicative of an edit. I would have a field with nothing
but 1s and 0s. Since all of the numbers are sorted, then that just
might achieve the effect I am desiring. What I will do is find the
minimum data length in the data set and the typical length of the
unedited contract number. Only the early imported numbers will be
short so I could increase accuracy when these historical numbers are
able to fall out of the system.

Would this approach be valid in its assumptions?

That might be valid, but it would take a lot of complicated
code in a function to do it.

Here's a similar idea based on your idea, except it deals
with the entire number in one shot. It might(?) be too slow
to be practical though. If it is slow, make sure you have a
(unique?) index on the contract number field.

Give it a try and see if the Base field does what you want.

SELECT T.ContractNum,
Nz(X.ContractNum, T.ContractNum) As Base
FROM table AS T LEFT JOIN table AS X
ON T.ContractNum Like X.ContractNum & " #"
OR T.ContractNum Like X.ContractNum & " ##"

Note that you can not use the QBE with this kind of JOIN.
After you get the query straightened out in SQL view, save
it and never switch it back to design view.
 
B

bluegrassstateworker

bluegrassstateworkerwrote:
After review, All numbers do NOT have a single digit at the end ....
they can go into the double digits. If I do a sort on all of the
contract numbers I <could> make some generalities perhaps in groupings
that would statistically work. What if I did a comparision of the
prior in part of a string length? For example:
PO2 721 0785432112
PO2 721 0785432112 2
PO2 721 0785432112 4
PO2 721 0785432112 5
PO 721 0785432135
PO 721 0785432135 2
If I consider a string of 6-10 elements (the more the better) of each
string in the comparision using the INSTR function and assign a
variable of 1 to that if the condition were true, and a 0 value if
false then that might work. In my favor is that it is rare that there
is a sequential difference of only one digit; others outside our
department are being issued unique numbers as well. If I assign a
value as such in a query field, then I can group on the change in that
field: every 0 value would be indicative of a new parent number and a
1 would be indicative of an edit. I would have a field with nothing
but 1s and 0s. Since all of the numbers are sorted, then that just
might achieve the effect I am desiring. What I will do is find the
minimum data length in the data set and the typical length of the
unedited contract number. Only the early imported numbers will be
short so I could increase accuracy when these historical numbers are
able to fall out of the system.
Would this approach be valid in its assumptions?

That might be valid, but it would take a lot of complicated
code in a function to do it.

Here's a similar idea based on your idea, except it deals
with the entire number in one shot. It might(?) be too slow
to be practical though. If it is slow, make sure you have a
(unique?) index on the contract number field.

Give it a try and see if the Base field does what you want.

SELECT T.ContractNum,
Nz(X.ContractNum, T.ContractNum) As Base
FROM table AS T LEFT JOIN table AS X
ON T.ContractNum Like X.ContractNum & " #"
OR T.ContractNum Like X.ContractNum & " ##"

Note that you can not use the QBE with this kind of JOIN.
After you get the query straightened out in SQL view, save
it and never switch it back to design view.

--
Marsh
MVP [MS Access]- Hide quoted text -

- Show quoted text -

Everyone (and Marsh especially)
Here is what I did to create an acceptable result:

In the query, I created a field with the following syntax:
SPACES: InStrRev([contractnum]," "). This would allow me to analyze
any usable data trend.

I then surveyed the entries and found that all valid entries would be
greater than 16. Based on that assumption for a statistical majority
(thats the best I could hope for in this case), I created the
following additional calculated field in the query:

CORECONT: IIf(InStrRev([contractnum],"
")>16,Left([contractnum],Len([contractnum])-2),[contractnum])

This gave me the common contract number without any of the edits on
the end. I could do it this way since there was certainty there would
not be edits greater than 99. I used the INSTrRev function to obtain
the space at the end. Now, I can use the query containing both the
contract number and CORECONT fields to create a report grouping based
on CORECONT but display Contnum.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top