Need Regular Expressions?

A

alr

Hi All,

I am trying to locate and move some sensitive data from some text fields. I
have isolated the records that have the problem with queries using the
following type of criteria:

Like “*acct[#] ###*â€
Like “*acct [#]###*â€

Like “*###-##-####*â€
Like “*##-#######*†etc.

Unfortunately this is mixed with needed data so the offending text above
must be removed from within the text fields and relocated. I am accustomed to
this kind of text manipulation with the InStr, Right, Left, and Mid commands
but this will not work with only patterns.

I understand that this can be done with Regular Expressions, which are
exposed through VB Script, neither of which I have ever used.

I would be most appreciative of a reference or description of how to access
VB Script and Regular Expressions for this use. Queries are probably easier
for this application but I am comfortable with VBA and will probably need to
use Regular Expressions with VBA shortly as well.

Thanks for any assistance,

alr
 
M

Maury Markowitz

alr said:
I understand that this can be done with Regular Expressions, which are
exposed through VB Script, neither of which I have ever used.

I ran into this problem myself recently. A quick scan through my References
window showed at least two different regex libraries, and using them was easy.

The hard part is the regex itself -- they are error prone and break with the
slightest provocation. I forget where I heard this, but "if you have a
problem and you decide to fix it with regex, then you have two problems".

Maury
 
J

John Nurick

Here are a couple of examples using the rgxReplace function that Kevin
recommended, pasted from the Immediate window:

Test = "now P Jones 123-45-6789 men to come to acct# 123456 the aid of J
Smith 123-45-6789 now is the time acct# 123456 all good men"

?rgxreplace(Test,"\d{3}-\d{2}-\d{4}","***-**-****")
now P Jones ***-**-**** men to come to acct# 123456 the aid of J Smith
***-**-**** now is the time acct# 123456 all good men

?rgxReplace(Test,"(acct#\s)\d{6}","$1******")
now P Jones 123-45-6789 men to come to acct# ****** the aid of J Smith
123-45-6789 now is the time acct# ****** all good men


Hi All,

I am trying to locate and move some sensitive data from some text fields. I
have isolated the records that have the problem with queries using the
following type of criteria:

Like “*acct[#] ###*”
Like “*acct [#]###*”

Like “*###-##-####*”
Like “*##-#######*” etc.

Unfortunately this is mixed with needed data so the offending text above
must be removed from within the text fields and relocated. I am accustomed to
this kind of text manipulation with the InStr, Right, Left, and Mid commands
but this will not work with only patterns.

I understand that this can be done with Regular Expressions, which are
exposed through VB Script, neither of which I have ever used.

I would be most appreciative of a reference or description of how to access
VB Script and Regular Expressions for this use. Queries are probably easier
for this application but I am comfortable with VBA and will probably need to
use Regular Expressions with VBA shortly as well.

Thanks for any assistance,

alr
 
A

alr

Thanks very much to all that responded!

I now have a working piece of code that can be modified to accomplish
several things that this project has grown to include. The references were
very helpful.

BUT Especially Maury!!!

The whole office got a good laugh out of your comment "...then you have two
problems."

Of course this was at my expense since I must maintain this code. Any clues
on how to keep it simple?

Thanks again,

alr
 
M

Maury Markowitz

alr said:
The whole office got a good laugh out of your comment "...then you have two
problems."
Heh.

Of course this was at my expense since I must maintain this code. Any clues
on how to keep it simple?

Exactly that: don't use it for anything complex. It's fine for pulling text
out of a file and stuff like that, but the more complex the pattern becomes
the larger the chance that some file in the next 10 days will break it.

IE: never, ever, use regex to try to process CSV files!

Maury
 
J

John Nurick

Of course this was at my expense since I must maintain this code. Any clues
on how to keep it simple?

If you don't let them change the rules you won't have to change the
code.

IME it's easier to maintain one regex than the tens or hundreds of lines
of code that would otherwise be needed to implement the same rules. (And
the nice people who design and build regex engines must feel the same
way or they'd spend their time doing something else.)

Pace Maury, it's not regexes that are error prone but the people who
write them (me included). It's all too easy to develop a regex that
seems to do the job, and then move on to something else without testing
it thoroughly. Later on, with unexpected input, it may give you
unexpected output - but that means it's wrong, not broken<g>.
 
A

alr

Thanks very much for your input. What tools would be a better choice for
doing complex work of this type on a regular basis in a production
environment. I now have time to think this through and choose the best way to
proceed.

Thanks again,

alr
 
A

alr

Thanks for your input. I got things working but now I am working on a more
permanent solution. I need to save the removed text to a new field and run
this on some large quantities of data in a production environment.

Is there a better choice of tools or some suggestions for making the regex
reliable? Obviously rigorous and thorough testing of any solution is
required. If regex is the best choice, is VBA to VbScript to Regex an
acceptable way to get there? Are there other options? Is there some
documentation available? The lack of a manual for these tools has become a
major irritant in my life or a job security program.

Thanks again,

alr
 
J

John Nurick

I'm perfectly happy with regexes in a production situation, and have
never heard of problems caused by using the VBScript regular expression
object in VBA code (my own testing has included running queries that
create and destroy a regex object millions of times in order to see if
there are memory or resource leaks).

It's also worth noting that the .Net framework includes a regex engine
(whose regexes are a superset of those available in VBScript).

Here are a couple of links to get started:
http://msdn.microsoft.com/library/e...troductiontoregularexpressions.asp?frame=true
http://msdn.microsoft.com/library/e...gularexpressionsobjectpropmeth.asp?frame=true

The book Mastering Regular Expressions by Jeffrey Friedl (O'Reilly) is
the nearest there is to a standard text; I don't think it specifically
discusses the VBScript regex engine, but most of it will work anyway.
Likewise the O'Reilly Regular Expression Pocket Reference.
 
A

alr

Hi John,

Thanks, this all seems very reasonable except for a couple of things, could
you please explain:

1) How one gets queries to recognize regex
2) Would it be useful and reasonable to incorporate .NET somewhow to get a
more complete version of Regex? I have had no need of .NET thus far.

Thanks for your efforts,

alr
 
J

John Nurick

Thanks, this all seems very reasonable except for a couple of things, could
you please explain:

1) How one gets queries to recognize regex

In Access, by using a calculated field which calls a custom VBA function
that uses a regular expression engine. Examples are the rgxReplace()
function you already have and the rgxValidate() function you can
download from www.mvps.org/access.

rgxValidate() is a general-purpose data validation function which can be
used as a criterion to restrict queries to returning records that match
a particular regex.
2) Would it be useful and reasonable to incorporate .NET somewhow to get a
more complete version of Regex? I have had no need of .NET thus far.

No. The VBScript regex engine works pretty well; the one in .Net really
only adds a few advanced features such as zero-width assertions and
Unicode blocks; and the thought of instantiating a .Net class in VBA
code called from an Access query makes me think of taking a nuclear
power station on a camping holiday!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top