Basic regular expression question

R

Robert Crandal

I'm still a newbie at all this regular expression stuff, so forgive
this newb question....

My input data strings have the following format:

"Item1 scissors"
"Item2 notebooks"
"I3 pens"
"itm4 keyboards"
......

So, each line is basically formatted like this:

[string of characters] [several whitespace(s)] [string of characters]

Using regular expressions, how can I store each pair of items in
separate variables??

For example, if I read in the first line above,
I would like my variable sNum to store the "Item1" string, and a
variable named sObj would store "scissors". I guess I'm really
trying to parse each pair of items and store them in variables
using regular expressions, but I don't fully understand how to
create my own regular expression pattern strings yet.

Thanks!
 
R

Ron Rosenfeld

I'm still a newbie at all this regular expression stuff, so forgive
this newb question....

My input data strings have the following format:

"Item1 scissors"
"Item2 notebooks"
"I3 pens"
"itm4 keyboards"
.....

So, each line is basically formatted like this:

[string of characters] [several whitespace(s)] [string of characters]

Using regular expressions, how can I store each pair of items in
separate variables??

For example, if I read in the first line above,
I would like my variable sNum to store the "Item1" string, and a
variable named sObj would store "scissors". I guess I'm really
trying to parse each pair of items and store them in variables
using regular expressions, but I don't fully understand how to
create my own regular expression pattern strings yet.

Thanks!

What you show is two words separated by space(s).

Assuming that the words contain only letters, digits and possibly an underscore, and that there are only two words in each line, the regex is fairly simple:


^(\w+)\s+(\w+)

which means:

Assert position at the beginning of the string «^»

Match the regular expression below and capture its match into backreference number 1 «(\w+)»

Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Match the regular expression below and capture its match into backreference number 2 «(\w+)»

Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

A sample VBA macro which captures the Item number into the first element of 2 dimensional array; and the object into the second item of the array, might look like:

=================================
Option Explicit
Sub foo()
Dim re As Object, mc As Object
Const sPat As String = "(\w+)\s+(\w+)"
Dim InputData(0 To 3) As String
Dim i As Long
Dim Results() As String

InputData(0) = "Item1 scissors"
InputData(1) = "Item2 notebooks"
InputData(2) = "I3 pens"
InputData(3) = "itm4 keyboards"

Set re = CreateObject("vbscript.regexp")
re.Pattern = sPat
re.Global = True

For i = 0 To UBound(InputData)
If re.test(InputData(i)) = True Then
Set mc = re.Execute(InputData(i))
ReDim Preserve Results(0 To 1, 0 To i)
Results(0, i) = mc(0).submatches(0)
Results(1, i) = mc(0).submatches(1)
End If
Next i

End Sub
==========================

Hope this helps.
 
R

Robert Crandal

Hello Ron! I just wanted to say thank so much for your excellent help
again.
That code is working great!

I have a new question now. I just realized that the third element can
actually
contain multiple word elements. So, my data might actually look like this:

"Item1 scissors"
"Item2 red notebooks"
"Item3 number #2 pencils"

So, the data format really is:

[single string of characters] [whitespace(s)] [any string of characters
and optional whitespace(s)]

So....

I plan to basically reuse the code you gave me previously, but I need to
modify
the regular expression pattern so that the variable mc(0).submatches(1)
would get assigned strings like "scissors", or "red notebooks", or
"number #2 pencils"

How should I change the pattern string?

Thankx!
 
R

Rick Rothstein

I have a new question now. I just realized that the third element
can actually contain multiple word elements. So, my data might
actually look Like this:

"Item1 scissors"
"Item2 red notebooks"
"Item3 number #2 pencils"

So, the data format really is:

[single string of characters] [whitespace(s)] [any string of
characters and optional whitespace(s)]

So....

I plan to basically reuse the code you gave me previously, but I
need to Modify the regular expression pattern so that the variable
mc(0).submatches(1) would get assigned strings like "scissors", or
"red notebooks", or "number #2 pencils"

How should I change the pattern string?

I'm not sure if this will be helpful to you or not as I think you are
attempting to learn how to program with Regular Expressions; however,
assuming those "whitespaces" you mentioned are simply normal spaces, you can
do what you have asked without using Regular Expressions... straight VB is
enough. Here is Ron's macro revised to perform without Regular Expressions
and modified to handle the multipli-spaced data you just posted about...

Sub FooToo()
Dim i As Long, InputData(0 To 3) As String, Parts() As String, Results()
As String
InputData(0) = "Item1 scissors"
InputData(1) = "Item2 notebooks"
InputData(2) = "Item2 red notebooks"
InputData(3) = "Item3 number #2 pencils"
For i = 0 To UBound(InputData)
If InStr(InputData(i), " ") Then
Parts = Split(WorksheetFunction.Trim(InputData(i)), " ", 2)
ReDim Preserve Results(0 To 1, 0 To i)
Results(0, i) = Parts(0)
Results(1, i) = Parts(1)
End If
Next i
End Sub

Rick Rothstein (MVP - Excel)
 
R

Ron Rosenfeld

I plan to basically reuse the code you gave me previously, but I need to
modify
the regular expression pattern so that the variable mc(0).submatches(1)
would get assigned strings like "scissors", or "red notebooks", or
"number #2 pencils"

How should I change the pattern string?

Thankx!

If the "object" will be all on the same line:

"^(\w+)\s+(.+)"

However, because of the peculiarities of MS implementation in vba, if the "object" might span a second line, then you should use:

"^(\w+)\s+([\s\S]+)"

As an aid to writing and testing regular expressions, I would suggest a program titled RegexBuddy (www.regexbuddy.com)

And, as Rick is so fond of pointing out, you can do most anything using built-in VBA methods without using Regular Expressions, and they will often run more quickly if that is an issue. However, once you become fluent in Regular Expressions, it takes much less time to develop complex string manipulations using them than using VBA.

Of course, if speed is paramount, I suppose we should be writing in machine language <g>.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top