How to extract the text content from .ppt files

A

apondu

hi,
I am a student working on a project for my academics. I am working
on parsing the the files such as .doc, .xls, .ppt,.pps. I wanted to
know how to extract the text content from the .ppt files. I am using
C#.Net for my project. It would be very helpful if u could provide some
code snippets too. Thank you in advance for the help.


Govardhana
 
I

Ishai Sagi

well, if the client machine has powerpoint installed, you can use the
office object model to load the power point application, open a ppt
file, loop over the slides and get the shapes in every slide (then
check what kind of shape, and if it has text in it)

or, simpler yet, use the object model to save the ppt as html in a temp
directory and then parse the html file...

For office2003:
look at
http://www.microsoft.com/downloads/...3a-ac14-4125-8ba0-d36d67e0f4ad&DisplayLang=en


for office xp:
http://www.microsoft.com/downloads/...1e-3060-4f71-a6b4-01feba508e52&DisplayLang=en
 
A

apondu

Hi,

Thanks for showing the interest in the query i had posted. Actually u
were saying to loop through the slides and check the shape and extract
the shape if its a text can u please help with some more information on
this and some code snippets for how to implementing the looping through
the slides and check the shape. It would be very helpful if u could do
this. waiting for u'r reply .

Thank You


Govardhan
 
I

Ishai Sagi

Here is a sample code in c# - I may have the presentations.open
wrong (never done that in C# and dont have time to debug now) but all
other things should work.
note that you need to referance the microsoft powerpoint com object
library for this....

Microsoft.Office.Interop.PowerPoint.Application appPpt = new
Microsoft.Office.Interop.PowerPoint.Application();
Microsoft.Office.Interop.PowerPoint.Presentation
objActivePresentation =
appPpt.Presentations.Open("c:\\mypresentation.ppt", true,
Microsoft.Office.Core.MsoTriState.msoTriStateMixed);


StringBuilder sb = new StringBuilder("");
foreach(Microsoft.Office.Interop.PowerPoint.Slide objSlide
in objActivePresentation.Slides)
{
foreach (Microsoft.Office.Interop.PowerPoint.Shape
objShape in objSlide.Shapes)
{
sb.Append(objShape.TextFrame.TextRange.Text);
}
}
MessageBox.Show(sb.ToString());
 
A

apondu

Hi

sorry for the trouble, everything is working fine but there's an error
at the following statement

sb.Append(objShape.TextFrame.TextRange.Text);

this statement is with in the innermost for loop. It say that the
"TextFrame (invalid request) this type of shape cannot have a TextRange
"

can u say me wht would be the problem and wht's the solution for this.
I really want to thank you for your response.

waiting for your reply

Thank You


Govardhan
 
I

Ishai Sagi

ah yes,
forgot about that - some shapes cant hold text. you should either check
if the shape has a textrange or just put it in a try\catch block to
skip shapes with no text
 
A

apondu

Hi,

Thanks a lot its working fine now. I needed to know whether i can ask u
doubts, I actually have some in Excel and word and i have even posted
them on to the google gropus. Can u help me in this. Waiting for u'r
reply

Thank You


Govardhan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top