Need to convert html to text

D

Dudely

I pull a string off a webpage. The string looks something like this "I
%20am%20a%20string".

I want to stuff it into an Excel cell, but without the %20. As we all
know, %20 is the representation for the space character. I looked
everywhere I could think of to find a function that would convert to a
plain text string, but couldn't find anything supplied by Excel. So,
I decided to build my own.

I walk through the string and copy character by character, until I
find a percent character, skip over the 3 chars making up the %20 and
repeat until done. Seems to work fine, until I try to copy it into
the cell. At that point what I have are a bunch of unprintiable
characters. That's when I realized I had forgotten to use the CHR
function. However, when I tried to use the CHR function, it gives me
a type mismatch. So, at this point I admit I don't know what I'm
doing and I need some help.

The calls to Strconv are essentially useless, but again I admit I
don't know what I'm doing. I use it as a quick way to fill "x".

Thank you in advance



Dim i As Long
Dim x() As Byte
x = StrConv(Subject, vbFromUnicode) ' Convert string.
ReDim tmpSubject(UBound(x))

For i = 0 To UBound(x) ' get rid of %20 string
If x(i) = 37 Then ' 37 is the percent sign
i = i + 2 ' "Next" command will add 1 more
Debug.Print " "
tmpSubject(i) = ChrB(20)
Else
Debug.Print ChrB(x(i))
tmpSubject(i) = ChrB(x(i)) <==== type mismatch
End If
Next
Subject = StrConv(tmpSubject, vbUnicode) ' A failed experiment.
 
J

Jan Hyde (VB MVP)

Dudely <[email protected]>'s wild thoughts were released on
Thu, 16 Oct 2008 20:04:15 -0700 (PDT) bearing the following
fruit:
I pull a string off a webpage. The string looks something like this "I
%20am%20a%20string".

I want to stuff it into an Excel cell, but without the %20. As we all
know, %20 is the representation for the space character. I looked
everywhere I could think of to find a function that would convert to a
plain text string, but couldn't find anything supplied by Excel. So,
I decided to build my own.

mystring = Replace(mystring, "%20", " ")

J
 
D

Dudely

Now that's what I'm talking about! Thank you!!!

For my own (and possibly others) edification, assuming the replace
function were not available, how would one go about getting the same
result? In other words, what's wrong with the code I wrote and how
can it be fixed?

Thank you in advance!
 
S

Steve Rindsberg

For my own (and possibly others) edification, assuming the replace
function were not available,

The Replace function is part of VB/VBA6. As long as you've got Office 2000 or
later installed, it's there.
how would one go about getting the same
result? In other words, what's wrong with the code I wrote and how
can it be fixed?

something like this instead? ...

Sub Test()
Debug.Print DeSpace20("I%20am%20a%20string")
End Sub

Function DeSpace20(sInput As String) As String

Dim x As Long
Dim sTemp As String

x = 1

While x <= Len(sInput)
If Mid$(sInput, x, 3) = "%20" Then
sTemp = sTemp & " "
x = x + 3
Else
sTemp = sTemp & Mid$(sInput, x, 1)
x = x + 1
End If
Wend

DeSpace20 = sTemp

End Function
 
G

GeoffG

In other words, what's wrong with the code
I wrote and how can it be fixed?

The following explains why your code doesn't work.
You can copy and paste the following subprocedure.
I've had to make some assumptions.

Private Sub RemoveChars()

Dim Subject As String
Dim tmpSubject() As String
Dim NewSubject1 As String
Dim NewSubject2 As String
Dim I As Long
Dim x() As Byte

Subject = "I%20am%20a%20string"

' Convert string from Unicode and
' store in a Byte array:
x = StrConv(Subject, vbFromUnicode)

' Redimensioning here will make the tmpSubject
' array larger than necessary, but it
' doesn't matter. (Remember, we'll be
' replacing three characters with one
' character, so the new array will be smaller):
ReDim tmpSubject(UBound(x))

' Loop through string to get rid of %20:
For I = LBound(x) To UBound(x)

' 37 is the percent sign:
If x(I) = 37 Then

' the "Next" command will add 1 more:
I = I + 2

Debug.Print " "

' The ChrB() function returns a single byte.
' However, the string array, tmpSubject(),
' needs a two-byte, Unicode character.
' Therefore, you must add ChrB(0)
' (for English) to get the right character;
' otherwise, you'll store nothing in
' tmpSubject. Also, you must use &H20 or 32
' as the space-character number.
tmpSubject(I) = ChrB(&H20) + ChrB(0)

' ALTERNATIVE TO ABOVE CODE LINE:
' The Chr() function returns a two-byte
' character. Therefore, you could use it
' instead; i.e:
' tmpSubject(I) = Chr(32)

' However, as you'll see later, you might
' just as well use a variable instead:
NewSubject2 = NewSubject2 & Chr(32)

Else

' Again, add ChrB(0) on both the
' following code lines or use the Chr()
' function:
Debug.Print ChrB(x(I)) + ChrB(0)
tmpSubject(I) = ChrB(x(I)) + ChrB(0)

' Alternative, using the Chr() function:
NewSubject2 = NewSubject2 & Chr(x(I))

End If
Next

' The next code line will cause a type mismatch
' error so I've remmed it out. This happens because
' tmpSubject is a string array, not a string and
' the StrConv() function needs a string. It's
' interesting to note that, if tmpSubject were a
' VBA string, as required by the StrConv()
' function, you wouldn't need to convert it
' to Unicode; it would already be Unicode. If you
' then convert it, you'd end up with extra
' space characters between each character.
' A full explanation of this issue is given in
' Steven Roman's book "Win32 API Programming"
' (page 83). The book contains an excellent
' explanation of how strings are handled in VBA
' and C.
'
' It will also now be evident that, as you can't
' use the StrConv() function here, you might just
' as well have used the string variable,
' NewSubject2, above and concatenated each
' character to the string. (But, of course,
' in reality, you're better off using the
' Replace() function, as you know.)

'Subject = StrConv(tmpSubject, vbUnicode)

' The string can be reconstructed as follows:
For I = LBound(tmpSubject) To UBound(tmpSubject)
NewSubject1 = NewSubject1 & tmpSubject(I)
Next
Debug.Print "'" & NewSubject1 & "'"

' See the extra characters inserted by the
' StrConv() function when a regular string
' is used:
Subject = StrConv(NewSubject1, vbUnicode)
Debug.Print "'" & Subject & "'"

' Alternative string without the fuss:
Debug.Print "'" & NewSubject2 & "'"

End Sub

Another book you may wish to dig into is:
Visual Basic Language Developer's Handbook
by Ken Getz and Mike Gilbert.

See also:
http://msdn.microsoft.com/en-us/library/aa210721(office.11).aspx

Regards
Geoff





Now that's what I'm talking about! Thank you!!!

For my own (and possibly others) edification, assuming the replace
function were not available, how would one go about getting the same
result? In other words, what's wrong with the code I wrote and how
can it be fixed?

Thank you in advance!
 
D

Dudely

Now there's an awesome answer! It'll take me some time to understand
it :)

Thank you most kindly, and to everyone else as well that participated!
 
G

GeoffG

tmpSubject(i) = ChrB(x(i)) <==== type mismatch

The above line would have caused a type mismatch error if you had declared:

Dim tmpSubject() as Long

i.e not as a string array.

Regards
Geoff



Now there's an awesome answer! It'll take me some time to understand
it :)

Thank you most kindly, and to everyone else as well that participated!
 
K

Karl E. Peterson

Dudely said:
I pull a string off a webpage. The string looks something like this "I
%20am%20a%20string".

Windows actually provides functions for escaping and unescaping text, such that it
may be safely passed as part of a URL. To unescape text like this, you can write a
little function such as (indented to highlight wordwrap):

Private Declare Function UrlUnescape Lib "Shlwapi.dll" Alias "UrlUnescapeA"
(ByVal pszURL As String, ByVal pszUnescaped As String, ByRef pcchUnescaped As Long,
ByVal dwFlags As Long) As Long

Public Function UnescapeURL(ByVal URL As String) As String
' Purpose: A thin wrapper for the UrlUnescape API function.
Dim EscTxt As String
Dim nLen As Long

' Create a maximum sized buffer.
nLen = Len(URL) * 3
EscTxt = Space$(nLen)

If UrlUnescape(URL, EscTxt, nLen, 0) = 0 Then
UnescapeURL = Left$(EscTxt, nLen)
End If
End Function

?unescapeurl("I%20am%20a%20string.")
I am a string.

This has the advantage of handling other commonly escaped characters, as well, in
addition to spaces:

?unescapeurl("I%20am%20a%20%22quoted%22%20string.")
I am a "quoted" string.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top