Remove extra space within text

S

SherryScrapDog

I just had some really great help with some Excel files from this group where
the formatting was done by various people over the last 15-20 years. These
are names for genealogy, and I am loading many Excel files into Access so
people can search by name and soundex. I don't know if the other problem I
am dealing with can be handled programatically, but some of the people in the
past entered extra spaces between the first and middle names. I am going
thru the files manually now and removing the extra space. I have more files
to do and thought it was worth asking if a macro could do this. Here are
some examples:
John William (I change to John William by using delete key)
Doris Doe Smith (I change to Doris Doe Smith)
John W. ( change to John W.)
I found many posts on this site that talks about Trim, but could not find
anything that specifically addressed just removing extra space within text.
Thanks in advance if there is any help for me, Sherry
 
R

Rick Rothstein \(MVP - VB\)

Given that you indicated you used Ron Rosenfeld's routine in your earlier
thread, I modified that to add the additional functionality...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(txt, " ")
Next c
End Sub

Now a word of caution... this is my first regular expression construction in
some 15 years, so I am rusty. What I posted works, but I can't guarantee it
is the most efficient construction. So, check back here to see if Ron posts
a better constructed regular expression solution than this one.

Rick
 
R

Rick Rothstein \(MVP - VB\)

This might actually be better... it also removes all leading and trailing
spaces, if any...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(Trim(txt), " ")
Next c
End Sub

Rick


Rick Rothstein (MVP - VB) said:
Given that you indicated you used Ron Rosenfeld's routine in your earlier
thread, I modified that to add the additional functionality...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(txt, " ")
Next c
End Sub

Now a word of caution... this is my first regular expression construction
in some 15 years, so I am rusty. What I posted works, but I can't
guarantee it is the most efficient construction. So, check back here to
see if Ron posts a better constructed regular expression solution than
this one.

Rick


SherryScrapDog said:
I just had some really great help with some Excel files from this group
where
the formatting was done by various people over the last 15-20 years.
These
are names for genealogy, and I am loading many Excel files into Access so
people can search by name and soundex. I don't know if the other problem
I
am dealing with can be handled programatically, but some of the people in
the
past entered extra spaces between the first and middle names. I am going
thru the files manually now and removing the extra space. I have more
files
to do and thought it was worth asking if a macro could do this. Here are
some examples:
John William (I change to John William by using delete key)
Doris Doe Smith (I change to Doris Doe Smith)
John W. ( change to John W.)
I found many posts on this site that talks about Trim, but could not find
anything that specifically addressed just removing extra space within
text.
Thanks in advance if there is any help for me, Sherry
 
S

SherryScrapDog

Rick, Thanks tons! It works so well I wish I had asked this before. You
have saved me hours and hours of work. This is a volunteer project for me
and I just love working on it, but as you can probably imagine, it gets
boring going thru these files cell by cell and deleting spaces and adding
periods. My goal is to give the Society a database with data as clean as
possible. Thanks, thanks and thanks again, Sherry (I will check Ron's post
if he does respond.)

Rick Rothstein (MVP - VB) said:
Given that you indicated you used Ron Rosenfeld's routine in your earlier
thread, I modified that to add the additional functionality...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(txt, " ")
Next c
End Sub

Now a word of caution... this is my first regular expression construction in
some 15 years, so I am rusty. What I posted works, but I can't guarantee it
is the most efficient construction. So, check back here to see if Ron posts
a better constructed regular expression solution than this one.

Rick


SherryScrapDog said:
I just had some really great help with some Excel files from this group
where
the formatting was done by various people over the last 15-20 years.
These
are names for genealogy, and I am loading many Excel files into Access so
people can search by name and soundex. I don't know if the other problem
I
am dealing with can be handled programatically, but some of the people in
the
past entered extra spaces between the first and middle names. I am going
thru the files manually now and removing the extra space. I have more
files
to do and thought it was worth asking if a macro could do this. Here are
some examples:
John William (I change to John William by using delete key)
Doris Doe Smith (I change to Doris Doe Smith)
John W. ( change to John W.)
I found many posts on this site that talks about Trim, but could not find
anything that specifically addressed just removing extra space within
text.
Thanks in advance if there is any help for me, Sherry
 
P

Peter T

Worksheet & VB/VBA Trim functions are slightly different, inasmuch the
Worksheet function replaces any multiple spaces in the middle of text with a
single space.

Your sample text as posted threw me at first (when pasted directly into a
cell), but try this

sTmp = Replace(sOrig, Chr(160), " ")
sOut = Application.WorksheetFunction.Trim(sTmp)

where sOrig contains your text.

As a cell formula -
=TRIM(SUBSTITUTE(A1,CHAR(160)," "))

Regards,
Peter T
 
R

Ron Rosenfeld

This might actually be better... it also removes all leading and trailing
spaces, if any...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(Trim(txt), " ")
Next c
End Sub

A bit shorter:

Option Explicit
Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
c.Value = Application.WorksheetFunction.Trim _
(re.Replace(c.Text, "$1."))
Next c
End Sub



--ron
 
S

SherryScrapDog

Rick,
Thanks for this suggestion too. This is a problem I have in these files
too, and I have a query in the Access database where I load the files to take
the spaces out. I might as well do it in the Excel files though and
eliminate one of the queries I do to load the files! Many thanks! Sherry
 
R

Ron Rosenfeld

Now a word of caution... this is my first regular expression construction in
some 15 years, so I am rusty. What I posted works, but I can't guarantee it
is the most efficient construction. So, check back here to see if Ron posts
a better constructed regular expression solution than this one.

It works fine.

And it is certainly possible to devise one regular expression that will also
remove leading and trailing spaces. But I think it is simpler to just use the
TRIM "worksheet" function (not the VBA function) to remove leading, trailing
and doubled spaces in one swoop.


But, you could run a third regex with pattern: ^\s+|\s+$

to remove leading and trailing white space characters.

Or, if you only wanted to remove <space> and <tab> characters:

"^[ \t]+|[ \t]+$"

(note the <space> at the start of the character class.)


--ron
 
D

Dana DeLouis

Another idea might be to move the Pattern outside the loop since it's a
constant.

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object

Set re = CreateObject("VbScript.RegExp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "\b([A-Z])\b(?!\.)"

With WorksheetFunction
For Each c In Selection.Cells
c.Value = .Trim(re.Replace(c.Text, "$1."))
Next c
End With
End Sub

--
Dana DeLouis



Ron Rosenfeld said:
This might actually be better... it also removes all leading and trailing
spaces, if any...

Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
txt = re.Replace(c.Text, "$1.")
re.Pattern = "\s{2,}"
c.Value = re.Replace(Trim(txt), " ")
Next c
End Sub

A bit shorter:

Option Explicit
Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
c.Value = Application.WorksheetFunction.Trim _
(re.Replace(c.Text, "$1."))
Next c
End Sub



--ron
 
S

SherryScrapDog

Hi Peter,
Thanks for this response! I can't believe how helpful everyone is in this
group and I appreciate it so much. I'm not exactly sure what you mean when
you say replace sOrig with my text and wonder if this would be the cell
column. However, the previous posts macro is working wonderful, and I'm not
sure what this would do different than what I have now. I can see why you
said my examples threw you because as I look at them now, they do not look
like they have the 2 spaces in them as I had typed them. Please let me know
if there is something this does that the other macro is not doing. And, I
assume this is a macro that you are giving me; let me know if it is something
else. This is my first attempt at doing things programmatically in Excel, so
I'm very ignorant and appreciate all the help I have received. Again,
Thanks! Sherry
 
R

Rick Rothstein \(MVP - VB\)

A bit shorter:
Option Explicit
Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
c.Value = Application.WorksheetFunction.Trim _
(re.Replace(c.Text, "$1."))
Next c
End Sub

I thought about that solution, but I was not sure what was more efficient in
the end... continually running out to the spreadsheet level to get one of
its functions within the loop or simply resetting the Pattern each time.

Can you, or anyone reading this thread, give me an idea of the "penalty"
incurred, if any, by a macro when it reaches out to the spreadsheet world in
order to execute one of its functions via the Application.WorksheetFunction
property?

Rick
 
P

Peter T

Hi Sherry,
I can't believe how helpful everyone is in this group

Just normal service !
I'm not exactly sure what you mean when
you say replace sOrig with my text and wonder if this would be
the cell column.

For use in the example as posted I meant first assign the text you want to
process to the variable sOrig. The variable would have been declared like
this
Dim sOrig as String
or it might be an argument in a function

purely for testing
copy "Doris Doe Smith" without the quotes into a cell, select the cell and
run this macro

Sub test1()
Dim s As String, sReturn As String

s = ActiveCell

sReturn = TrimSpaces(s)

MsgBox s & vbNewLine & sReturn

End Sub

Function TrimSpaces(sOrig As String) As String
Dim sTmp As String
Dim sOut As String

sTmp = Replace(sOrig, Chr(160), " ")
sOut = Application.WorksheetFunction.Trim(sTmp)
TrimSpaces = sOut

End Function

The above may not be efficient and not necessary to use the additional sTmp
variable. It depends on where your data (text to be processed) is coming
from and what you want to do with it. You might, for example, pass an array
to a function and loop each element. If it's a one off type of thing, copy
and paste all data to cells in a column, copy down the cell formula as
posted previously.

If you are sure the pseudo space chr(160) only crept into the examples in
your post, and never exists in your data, you'd only need the worksheet Trim
function.

If you want to filter or make further changes RegExp may will provide much
greater flexibility, otherwise go with the simplest and/or fastest method
you are sure will work for your needs.

Regards,
Peter T
 
S

SherryScrapDog

Thanks much Peter for clarification, and for teaching me more. I very much
enjoy leaning all I can! Sherry
 
P

Peter T

Can you, or anyone reading this thread, give me an idea of the "penalty"
incurred, if any, by a macro when it reaches out to the spreadsheet world in
order to execute one of its functions via the Application.WorksheetFunction
property?

Rick

Worksheet functions, whilst extremely efficient in cells, are slow when
called in VBA. It's often quicker to recreate the worksheet function.
Although in another part of this thread I suggested
Application.Worksheetfunction.Trim(
personally I wouldn't use it in in anything time sensitive, say as a UDF in
a large number of cells or as part of a long loop. Other ways to replace
multiple spaces with singles besides RegExp.

If using multiple worksheet functions in the same loop use 'wf' as in the
following

Sub foo()
Dim wf As WorksheetFunction
Set wf = Application.WorksheetFunction
s = wf.Trim(" a b c ")
MsgBox s
End Sub

Even setting a ref to WorksheetFunction object only speeds up a little.

Also I wouldn't use RegExp in a UDF and probably not in a function that was
called in a much larger loop. Although RegExp is extremely efficient,
creating the object is (relatively) slow. IOW, great in macro or function
that receives an array to process, but don't send each element to a separate
function that does CreateObject("vbscript.regexp"). I suppose could store
the object at module level.

Regards,
Peter T
 
R

Ron Rosenfeld

A bit shorter:

Option Explicit
Sub AddDot()
Dim c As Range
Dim txt As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
For Each c In Selection
re.Pattern = "\b([A-Z])\b(?!\.)"
c.Value = Application.WorksheetFunction.Trim _
(re.Replace(c.Text, "$1."))
Next c
End Sub

I thought about that solution, but I was not sure what was more efficient in
the end... continually running out to the spreadsheet level to get one of
its functions within the loop or simply resetting the Pattern each time.

Can you, or anyone reading this thread, give me an idea of the "penalty"
incurred, if any, by a macro when it reaches out to the spreadsheet world in
order to execute one of its functions via the Application.WorksheetFunction
property?

Rick

I'm not sure which is more efficient.

I agree with Dana about moving pattern outside the loop if you only are using a
single pattern. I was still thinking about perhaps using multiple patterns.

I suppose you could do something like:

=====================================
Sub AddDot()
Dim c As Range
Dim re As Object
Dim wsf As WorksheetFunction
Set wsf = Application.WorksheetFunction
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "\b([A-Z])\b(?!\.)"
For Each c In Selection
c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1.")))
Next c
End Sub
===========================================

to speed it up a bit, or even set a reference (tools/references) to Microsoft
VBSCript 5.5 and then use:

=============================================
Option Explicit
Sub AddDot()
'requires reference set to Microsoft _
' VBScript Regular Expressions 5.5
Dim c As Range
Dim re As RegExp
Dim wsf As WorksheetFunction
Set wsf = Application.WorksheetFunction
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "\b([A-Z])\b(?!\.)"
For Each c In Selection
c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1.")))
Next c
End Sub
==========================================

And I don't know whether, with this structure, it makes sense to explicitly
release the objects at the end of the Sub.

===================================================
Sub AddDot()
'requires reference set to Microsoft _
' VBScript Regular Expressions 5.5
Dim c As Range
Dim re As RegExp
Dim wsf As WorksheetFunction
Set wsf = Application.WorksheetFunction
Set re = New RegExp
re.IgnoreCase = True
re.Global = True
re.Pattern = "\b([A-Z])\b(?!\.)"
For Each c In Selection
c.Value = wsf.Proper(wsf.Trim(re.Replace(c.Text, "$1.")))
Next c
Set re = Nothing
Set wsf = Nothing
End Sub
===================================================
--ron
 
R

Rick Rothstein \(MVP - VB\)

Can you, or anyone reading this thread, give me an idea of the
Worksheet functions, whilst extremely efficient in cells, are slow when
called in VBA.

That was my gut feeling. I figured if calling a macro from the spreadsheet
is slow (I've seen several references to that fact), then it must also be
slow going the other way. Hence, I opted for the pattern toggling code that
I posted.


Now that is interesting.
I wouldn't use it in in anything time sensitive, say as a UDF in
a large number of cells or as part of a long loop.

I would agree with that.

Other ways to replace multiple spaces with singles besides RegExp.

The way I this over in the compiled VB world (which works here in the Excel
VBA world as well) is like this...

Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop

Also I wouldn't use RegExp in a UDF and probably not in a
function that was called in a much larger loop. Although RegExp
is extremely efficient, creating the object is (relatively) slow.
IOW, great in macro or function that receives an array to process,
but don't send each element to a separate function that does
CreateObject("vbscript.regexp").

In the compiled VB world, our tendency is to avoid calling out to any kind
of scripting type object as this is very slow to do from a compiled
executable and, very often, businesses have scripting completely turned off
for security reasons. However, since VBA macros or UDFs are not compiled
executable, I have no feeling for the efficiency calling a scripting object
from within one.


Rick
 
R

Ron Rosenfeld

In the compiled VB world, our tendency is to avoid calling out to any kind
of scripting type object as this is very slow to do from a compiled
executable and, very often, businesses have scripting completely turned off
for security reasons. However, since VBA macros or UDFs are not compiled
executable, I have no feeling for the efficiency calling a scripting object
from within one.

One could certainly write a VBA routine that did not use any external calls to
do this. And it might be faster, but maybe not worth the work.


Another option, which uses Regular Expressions but should be faster than
VBScript since it is written in C++ and compiled as an xll add-in, would be to
download and install Laurent Longre's free morefunc.xll add-in from
http://xcell05.free.fr/english/index.html

and then use the following formula:

=TRIM(REGEX.SUBSTITUTE(A1,"(\b([A-Z])\b(?!\.))","[1].",,,FALSE))

or, if you wanted to use it as a Sub, then:

=====================================
Option Explicit
Sub AddDot()
Dim c As Range
Const sPat As String = "(^\s*)|(\s*$)|(\s{2,})|(\b([A-Z])\b(?!\.))"
Const sRes As String = "[1=,2=,3= ,4=[4].]"

For Each c In Selection
c.Value = Run([Regex.substitute], c.Text, sPat, sRes, , , False)
Next c
End Sub
=======================================

This add-in has a number of other useful functions. I had avoided recommending
it for a while since he had not updated it to be compatible with XL2007, but it
now is, for the most part.

The other problem with the add-in, which is related to a limitation in XLL, is
that you cannot pass strings >255 characters. But that should not be an issue
for this application.
--ron
 
P

Peter T

Hi Rick,
That was my gut feeling. I figured if calling a macro from the spreadsheet
is slow (I've seen several references to that fact), then it must also be
slow going the other way. Hence, I opted for the pattern toggling code that
I posted.

By 'macro' I assume you mean a UDF but not quite sure what you mean by "slow
going the other way".
Now that is interesting.


I would agree with that.



The way I this over in the compiled VB world (which works here in the Excel
VBA world as well) is like this...

Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop

That's what I had in mind. It might be worth comparing re-calc time with a
UDF like that vs Worksheetfunction.trim in a large number of cells
In the compiled VB world, our tendency is to avoid calling out to any kind
of scripting type object as this is very slow to do from a compiled
executable and, very often, businesses have scripting completely turned
off for security reasons. However, since VBA macros or UDFs are not
compiled executable, I have no feeling for the efficiency calling a
scripting object from within one.

Even VBA compiles albeit in not the same way as VB. Unlike VB6, a workbook
and code can be saved with code totally un-compiled, partially compiled or
fully compiled. Considerations for re-distribution are file size vs speed
the first time code is run in a session. Ie, the code compiles as it's run
(if it had been saved un-compiled) and remains compiled thereafter in the
current session (unless edited).

There was a long thread a while back testing the merits of string
manipulation with RegExp and looping through a byte array, with proponents
on either side with me somewhere in the middle. In the testing I did I found
RegExp to work well (ie fast) when processing a large array in one procedure
but very slow in a UDF or even if called multiple times in a normal VBA
function. The reason was due to setting the reference to the scripting
object multiple times, ie each time the function was called. In a UDF the
byte array method won hands down.

Other issues, such as "very often, businesses have scripting completely
turned off" would equally affect VBA.

Regards,
Pete T
 
R

Ron Rosenfeld

That was my gut feeling. I figured if calling a macro from the spreadsheet
is slow (I've seen several references to that fact), then it must also be
slow going the other way. Hence, I opted for the pattern toggling code that
I posted.



Now that is interesting.


I would agree with that.



The way I this over in the compiled VB world (which works here in the Excel
VBA world as well) is like this...

Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop



In the compiled VB world, our tendency is to avoid calling out to any kind
of scripting type object as this is very slow to do from a compiled
executable and, very often, businesses have scripting completely turned off
for security reasons. However, since VBA macros or UDFs are not compiled
executable, I have no feeling for the efficiency calling a scripting object
from within one.


Rick


OK, after reading about all these delays, I decided to do some timing tests and
I find that the differences in time related to the procedure calls are far
outweighed by the time involved in reading or writing to/from a cell.

Here is the first run just picking out if a word is present in a string and
writing the results to a cell:

========================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = "Now is the time for all"

Dim t As Long
t = Timer
test1
Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec"


End Sub

Sub test1()
Dim c As Range
For Each c In Range("a:a")
If InStr(1, c.Text, "time") > 0 Then
c.Offset(0, 1).Value = "time"
End If
Next c
End Sub
Sub test2()
Dim c As Range
Dim re As New RegExp
Dim mc As MatchCollection
re.Pattern = "time"
For Each c In Range("a:a")
Set mc = re.Execute(c.Text)
c.Offset(0, 3) = mc(0)
Next c
End Sub
Sub test3()
Dim c As Range
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "time"
For Each c In Range("a:a")
Set mc = re.Execute(c.Text)
c.Offset(0, 3) = mc(0)
Next c
End Sub
============================================
InStr test 15.53 sec
RegExp early binding Test 15.45 sec
RegExp late binding test 16.59 sec
==============================================

And again, just testing your double space removal routine:

================================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec"


End Sub

Sub test1()
Dim c As Range
Dim SomeText As String
For Each c In Range("a:a")
SomeText = c.Text
Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop
c.Offset(0, 1) = SomeText
Next c
End Sub
Sub test2()
Dim c As Range
Dim re As New RegExp
Dim mc As MatchCollection
re.Pattern = "\s{2,}"
re.Global = True
For Each c In Range("a:a")
c.Offset(0, 3) = re.Replace(c.Text, " ")
Next c
End Sub
Sub test3()
Dim c As Range
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s{2,}"
re.Global = True
For Each c In Range("a:a")
c.Offset(0, 3) = re.Replace(c.Text, " ")
Next c
End Sub
========================================================
InStr test 16.72 sec
RegExp early binding Test 15.95 sec
RegExp late binding test 16.47 sec
======================================

Removing the read/write part, setting a reference to Regular Expressions, and
doing 1,000,000 operations, some differences show up:
=============================================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec"

End Sub
Sub test1()
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
For i = 1 To 2 ^ 20
Do While InStr(sTxt, " ") > 0
sTxt = Replace(sTxt, " ", " ")
Loop
Next i
Debug.Print sTxt
End Sub
Sub test3()
Dim re As New RegExp
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
re.Global = True
re.Pattern = "\s{2,}"
For i = 1 To 2 ^ 20
sTxt = re.Replace(sTxt, " ")
Next i
Debug.Print sTxt
End Sub
==============================================================
Now is the time for all
Instr Test .25 sec
Now is the time for all
RegExp test 2.05 sec
=============================================================

I also ran a test which shot down my expectation that the morefunc.xll addin
would run faster:

========================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "morefunc test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec"

End Sub
Sub test1()
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
For i = 1 To 2 ^ 20
Do While InStr(sTxt, " ") > 0
sTxt = Replace(sTxt, " ", " ")
Loop
Next i
Debug.Print sTxt
End Sub
Sub test2()
Dim sTxt As String, sPat As String
Dim i As Long
sTxt = Range("a1").Text
sPat = "\s{2,}"

For i = 1 To 2 ^ 20
sTxt = Run([regex.substitute], sTxt, sPat, " ")
Next i
Debug.Print sTxt
End Sub
Sub test3()
Dim re As New RegExp
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
re.Global = True
re.Pattern = "\s{2,}"
For i = 1 To 2 ^ 20
sTxt = re.Replace(sTxt, " ")
Next i
Debug.Print sTxt
End Sub
====================================================
Now is the time for all
Instr Test .75 sec
Now is the time for all
morefunc test 173.48 sec
Now is the time for all
RegExp test 2.28 sec
===================================


So really, the question comes up as to whether or not the timing differences
make a difference in the usual VBA applications, or will the timing differences
be irrelevant given the timings of the rest of the routines.
--ron
 
R

Rick Rothstein \(MVP - VB\)

If I read your results correctly, it would seem for a macro executing wholly
within the VBA world, using the regular expression scripting engine is
slower than using native, built-in VBA functions; however, once your macro
interacts with the spreadsheet world, any coding time differences get buried
by the time required for VBA to communicate with the spreadsheet.

Rick


Ron Rosenfeld said:
That was my gut feeling. I figured if calling a macro from the spreadsheet
is slow (I've seen several references to that fact), then it must also be
slow going the other way. Hence, I opted for the pattern toggling code
that
I posted.



Now that is interesting.


I would agree with that.



The way I this over in the compiled VB world (which works here in the
Excel
VBA world as well) is like this...

Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop



In the compiled VB world, our tendency is to avoid calling out to any kind
of scripting type object as this is very slow to do from a compiled
executable and, very often, businesses have scripting completely turned
off
for security reasons. However, since VBA macros or UDFs are not compiled
executable, I have no feeling for the efficiency calling a scripting
object
from within one.


Rick


OK, after reading about all these delays, I decided to do some timing
tests and
I find that the differences in time related to the procedure calls are far
outweighed by the time involved in reading or writing to/from a cell.

Here is the first run just picking out if a word is present in a string
and
writing the results to a cell:

========================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = "Now is the time for all"

Dim t As Long
t = Timer
test1
Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & "
sec"

t = Timer
test3
Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec"


End Sub

Sub test1()
Dim c As Range
For Each c In Range("a:a")
If InStr(1, c.Text, "time") > 0 Then
c.Offset(0, 1).Value = "time"
End If
Next c
End Sub
Sub test2()
Dim c As Range
Dim re As New RegExp
Dim mc As MatchCollection
re.Pattern = "time"
For Each c In Range("a:a")
Set mc = re.Execute(c.Text)
c.Offset(0, 3) = mc(0)
Next c
End Sub
Sub test3()
Dim c As Range
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "time"
For Each c In Range("a:a")
Set mc = re.Execute(c.Text)
c.Offset(0, 3) = mc(0)
Next c
End Sub
============================================
InStr test 15.53 sec
RegExp early binding Test 15.45 sec
RegExp late binding test 16.59 sec
==============================================

And again, just testing your double space removal routine:

================================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "InStr test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "RegExp early binding Test", Format(Timer - t, "#.00") & "
sec"

t = Timer
test3
Debug.Print "RegExp late binding test", Format(Timer - t, "#.00") & " sec"


End Sub

Sub test1()
Dim c As Range
Dim SomeText As String
For Each c In Range("a:a")
SomeText = c.Text
Do While InStr(SomeText, " ") > 0
SomeText = Replace(SomeText, " ", " ")
Loop
c.Offset(0, 1) = SomeText
Next c
End Sub
Sub test2()
Dim c As Range
Dim re As New RegExp
Dim mc As MatchCollection
re.Pattern = "\s{2,}"
re.Global = True
For Each c In Range("a:a")
c.Offset(0, 3) = re.Replace(c.Text, " ")
Next c
End Sub
Sub test3()
Dim c As Range
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\s{2,}"
re.Global = True
For Each c In Range("a:a")
c.Offset(0, 3) = re.Replace(c.Text, " ")
Next c
End Sub
========================================================
InStr test 16.72 sec
RegExp early binding Test 15.95 sec
RegExp late binding test 16.47 sec
======================================

Removing the read/write part, setting a reference to Regular Expressions,
and
doing 1,000,000 operations, some differences show up:
=============================================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec"

End Sub
Sub test1()
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
For i = 1 To 2 ^ 20
Do While InStr(sTxt, " ") > 0
sTxt = Replace(sTxt, " ", " ")
Loop
Next i
Debug.Print sTxt
End Sub
Sub test3()
Dim re As New RegExp
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
re.Global = True
re.Pattern = "\s{2,}"
For i = 1 To 2 ^ 20
sTxt = re.Replace(sTxt, " ")
Next i
Debug.Print sTxt
End Sub
==============================================================
Now is the time for all
Instr Test .25 sec
Now is the time for all
RegExp test 2.05 sec
=============================================================

I also ran a test which shot down my expectation that the morefunc.xll
addin
would run faster:

========================================
Option Explicit
Sub foo()
Range("a:d").Clear
Range("a:a").Value = " Now is the time for all "

Dim t As Long
t = Timer
test1
Debug.Print "Instr Test", Format(Timer - t, "#.00") & " sec"

t = Timer
test2
Debug.Print "morefunc test", Format(Timer - t, "#.00") & " sec"

t = Timer
test3
Debug.Print "RegExp test", Format(Timer - t, "#.00") & " sec"

End Sub
Sub test1()
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
For i = 1 To 2 ^ 20
Do While InStr(sTxt, " ") > 0
sTxt = Replace(sTxt, " ", " ")
Loop
Next i
Debug.Print sTxt
End Sub
Sub test2()
Dim sTxt As String, sPat As String
Dim i As Long
sTxt = Range("a1").Text
sPat = "\s{2,}"

For i = 1 To 2 ^ 20
sTxt = Run([regex.substitute], sTxt, sPat, " ")
Next i
Debug.Print sTxt
End Sub
Sub test3()
Dim re As New RegExp
Dim sTxt As String
Dim i As Long
sTxt = Range("a1").Text
re.Global = True
re.Pattern = "\s{2,}"
For i = 1 To 2 ^ 20
sTxt = re.Replace(sTxt, " ")
Next i
Debug.Print sTxt
End Sub
====================================================
Now is the time for all
Instr Test .75 sec
Now is the time for all
morefunc test 173.48 sec
Now is the time for all
RegExp test 2.28 sec
===================================


So really, the question comes up as to whether or not the timing
differences
make a difference in the usual VBA applications, or will the timing
differences
be irrelevant given the timings of the rest of the routines.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top