Help with a Regex Pattern

B

bob.eastman

I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.

I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob


Const sPattern As String = "([JjXxZz]\s)?\d{1,3}(?=\ )(\D\s)"

Set oRegex = New RegExp
oRegex.Pattern = sPattern
oRegex.Global = True

'check all of the rows
With ActiveSheet
S = .Cells(lR, iCWD).Value2
If oRegex.Test(S) = True Then
Set colmatches = oRegex.Execute(S)
strData = colmatches(0)
'convert the dewey to numeric if it does not
'have a leading alpha
If (bStringsT_IsLongInteger(strData) = True) Then
.Cells(lR, iCO).Value2 =
iStringsT_StringToIntegerNumber(strData)
Else
.Cells(lR, iCO).Value2 = colmatches(0)
End If
End If
End With
 
O

Otto Moehrbach

What do you mean you "are having trouble with the pattern"? What are you
trying to do? What do you want to see happen? What do you want Excel to do
for you? HTH Otto
 
J

Joel

I haven't try it, but I thought it would be this

Const sPattern As String = "([JjXxZz]?)d{1,3}(?=\ )[A-Za-z]"
 
J

Joel

Otto: It is obvious from the code he is working with C++. The patterns he is
working with goes back to the development of UNIX at Bell Labs (and probably
earlier than that). The syntax that Bob is using describes a custom language
that is used to describe strings consisting of words and characters.

Bell Labs did lots of research on Pattern Recognition like this to develope
efficient methods for searching for name in a Phone Book. Computers were
very slow and memory was very expensive in the 1970's. Bell Labs were
trying to save money by by find efffiecent methods for storing their phone
books electronically and finding name quickly using computers so operators
didn't have to manually look up people names.

That was one of the main reasons UNIX was developed. Bell labs had lots of
computer systems that couldn't talk to each other (phonebook, billing,
switching equipment) and wanted to develope one computer language that could
be used by all there computers.

Otto Moehrbach said:
What do you mean you "are having trouble with the pattern"? What are you
trying to do? What do you want to see happen? What do you want Excel to do
for you? HTH Otto
I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.

I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob


Const sPattern As String = "([JjXxZz]\s)?\d{1,3}(?=\ )(\D\s)"

Set oRegex = New RegExp
oRegex.Pattern = sPattern
oRegex.Global = True

'check all of the rows
With ActiveSheet
S = .Cells(lR, iCWD).Value2
If oRegex.Test(S) = True Then
Set colmatches = oRegex.Execute(S)
strData = colmatches(0)
'convert the dewey to numeric if it does not
'have a leading alpha
If (bStringsT_IsLongInteger(strData) = True) Then
.Cells(lR, iCO).Value2 =
iStringsT_StringToIntegerNumber(strData)
Else
.Cells(lR, iCO).Value2 = colmatches(0)
End If
End If
End With
 
R

Ron Rosenfeld

I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.
([JjXxZz]\s)?\d{3}\s[A-Za-z]




I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob

--ron
 
J

Joel

Ron: Shouldn't the '?' be optional? this would be the space after the first
optional character.

Ron Rosenfeld said:
I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.
([JjXxZz]\s)?\d{3}\s[A-Za-z]




I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob

--ron
 
R

Ron Rosenfeld

Joel,

As I understood the OP's request, both the initial letter and the subsequent
space are optional.

The "?" indicates that everything in the preceding is optional, since the
preceding is enclosed in parentheses.

So either

J 123 K

or

123 K

would be acceptable.

--ron


Ron: Shouldn't the '?' be optional? this would be the space after the first
optional character.

Ron Rosenfeld said:
I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.
([JjXxZz]\s)?\d{3}\s[A-Za-z]




I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob

--ron

--ron
 
J

Joel

I though the ? represent any one character. I would think these would be
better

([JjXxZz][ ]\s)\d{3}\s[ =][A-Za-z]


Ron Rosenfeld said:
Joel,

As I understood the OP's request, both the initial letter and the subsequent
space are optional.

The "?" indicates that everything in the preceding is optional, since the
preceding is enclosed in parentheses.

So either

J 123 K

or

123 K

would be acceptable.

--ron


Ron: Shouldn't the '?' be optional? this would be the space after the first
optional character.

Ron Rosenfeld said:
On 29 Apr 2007 06:55:01 -0700, (e-mail address removed) wrote:

I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.

([JjXxZz]\s)?\d{3}\s[A-Za-z]




I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob

--ron

--ron
 
D

Dana DeLouis

I have data which is in this format:
([JjXxZz]\s)?\d{3}\s[A-Za-z]

Just an idea. My thoughts might be to limit the search by including "^" and
"$"

"^([JjXxZz]\s)?\d{3}\s[A-Za-z]$"

What I'm thinking is that maybe something like
"A 123 k" might test True by ignoring the "A",
and
"ABC j 123 k" might test True by finding the pattern within a larger
string.

--
Dana DeLouis
Windows XP & Office 2007


Ron Rosenfeld said:
I have data which is in this format:

J 123 K

The J is optional, could be upper or lower case, and could be a J or
an X or a Z.

It is always followed by a space. The three digits. Then a space.
Then any alphabetic character upper or lower case, but not optional,
the character must be there.
([JjXxZz]\s)?\d{3}\s[A-Za-z]




I am having trouble with the pattern, could someone show me how to set
it up?

tia
bob

--ron
 
R

Ron Rosenfeld

What I'm thinking is that maybe something like
"A 123 k" might test True by ignoring the "A",
and
"ABC j 123 k" might test True by finding the pattern within a larger
string.

I guess one question which has not been answered is the reason for the regex.
Is it to extract any matching pattern from the string; or is it to ensure that
only the desired pattern is present.

If the latter, then your expression is appropriate.
--ron
 
R

Ron Rosenfeld

I though the ? represent any one character.

That is incorrect. At least with VBScript flavor of Regular Expressions, as
used in my expression, it indicates that the preceding (enclosed in parentheses
(is optional). To be more precise, it matches the preceding character or
subexpression zero or one time, so it is equivalent to {0,1}

It can also indicate a non-greedy quantifier.
--ron
 
R

Ron Rosenfeld

I guess one question which has not been answered is the reason for the regex.
Is it to extract any matching pattern from the string; or is it to ensure that
only the desired pattern is present.

If the latter, then your expression is appropriate.
--ron


Reading the OP's post again (and looking at his macro), it does seem as if he
wants to either match, or not match, the specific pattern. So your addition of
^ and $ would indeed be warranted.

And it's also possible to detect the format without regular expressions:

===============================
For Each c In rRng
If c.Text Like "[JjXxZz] ### [A-Za-z]" Or _
c.Text Like "### [A-Za-z]" Then
Debug.Print c.Text, "TRUE"
Else
Debug.Print c.Text, "FALSE"
End If
Next c
================================
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top