HTML files into Excel

A

alfred

Hi,

I have >1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks
 
J

Joel

HTML files are text files. You would read them like any textt files and then
apply any filtering you need. Here is a genral format of VBA code that reads
and writtes text data.

You can see the HTML text by opening the HTML with Notepad or in Internet
explorer sectect View Menu - Source. It is basically string manipulations
using Left(),MID,and Right() functions. I have used VBA to modify XML files.
 
T

Tom Ogilvy

Have you tried just opening one of the html files in Excel using File=>Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=>copy, go to a new sheet and do Edit=>Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.
 
J

Joel

Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.
 
T

Tom Ogilvy

If I agreed, I wouldn't have posted.

Opening the files in excel should do exactly what I feel the OP wants to do.
Nonetheless, it is offered for his consideration to accept or reject.

I suspect he is still waiting for your promise:
Here is a genral format of VBA code that reads and writtes text data.

I hope it wasn't the simple paragraph that followed. Based on that, I
would have expected a generalized html parser that strips off all tags.
(which is essentially what I offered).
 
J

Joel

forgot to paste sample code. Tom just notified me of my mistake.

Sub ConvertCSV()

Const Sourcefile = "c:\temp\Origin.csv"
Const Destfile = "c:\temp\Destination.csv"
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 3

Set OriginCSV = _
CreateObject("Scripting.FileSystemObject")
Set FOrigin = _
OriginCSV.GetFile(Sourcefile)
Set FSOrigin = _
FOrigin.OpenAsTextStream _
(ForReading)


Set DestinationCSV = _
CreateObject("Scripting.FileSystemObject")
DestinationCSV.CreateTextFile Destfile
Set DestinationCSV = DestinationCSV. _
GetFile(Destfile)
Set FSDestination = DestinationCSV. _
OpenAsTextStream _
(ForWriting)



Do While FSOrigin.ATENDOFSTREAM = False


InputString = FSOrigin.readline


Else
'Loop until no more characters in line
First = True
Do While Len(InputString) > 0

'enter your code here

Loop

End If
Loop

FSOrigin.Close
FSDestination.Close

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top