Determine the filesize of an image in a document

P

Peter Karlström

Hi

In one of my projects I want to check all images in the active document and
determine the actual "filesize" of that image, i.e. the image "physical" size.

Next step will be do compress too large images in order to keep document
filesize to a minimum.

I have looked around in the InLineShape objekt without finding a proper
approach or property/method.

Is this possible? Can anybody help me?

Thanks in advance
 
J

Ji Zhou [MSFT]

Hello Peter,

Thanks for using Microsoft Newsgroup Support Service, this is Ji Zhou
[MSFT] and I will be working on this issue with you.

Firstly, I want to ensure I understand the issue correctly. We are trying
to get the physical size of images in the active document. If an image's
size is big, then we compress it. Any misunderstanding about your question,
please let me know.

I think, the approach depends on which version of Word we are using:

1.If we are using Word 2007 API and the .docx file format, the active
document exposes an property named
Document.WordOpenXML(http://msdn.microsoft.com/en-us/library/bb242889.aspx
). It will return the OpenXML content of the active document. If the user
does not modify the document in XML level, by default, the image
inlineshape's pkg:name will be image1, image2, image3.... And the
corresponding content in the OpenXML looks like:

<pkg:part pkg:name=\"/word/media/image1.gif\" pkg:contentType=\"image/gif\"
pkg:compression=\"store\">
<pkg:binaryData>(binary content represents the image)
</pkg:binaryData>
</pkg:part>

Thus, I think we can try to search the index of the string "<pkg:part
pkg:name=\"/word/media/image"+index.ToString() with String.Indexof() method
in Document.WordOpenXML. And then we can find the next "<pkg:binaryData>"
and "</pkg:binaryData>" mark's location with the same method. Then, we can
know the binary data's length which reflects the image's physical size
occupied in the document.

2.If we are using Word 2003 and the .doc file format, no API exposed to
developer to get the inline image's size. The only way I can think out is
calling document.SaveAs() method to save the active document in html
format. Then, all images are saved in the corresponding folder of that html
file. We can iterate through those image files to know their actual size.


Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
P

Peter Karlström

Hi Ji Zhou

Thanks for your reply.

In this project, we're dealing with Office 2003, so it's a pity it won't work.
Your suggestion will probably be a problem since this check will be done
everytime a document is opened.
Thanks anyway

Regards
--
Peter Karlström
Midrange AB
Sweden


"Ji Zhou [MSFT]" said:
Hello Peter,

Thanks for using Microsoft Newsgroup Support Service, this is Ji Zhou
[MSFT] and I will be working on this issue with you.

Firstly, I want to ensure I understand the issue correctly. We are trying
to get the physical size of images in the active document. If an image's
size is big, then we compress it. Any misunderstanding about your question,
please let me know.

I think, the approach depends on which version of Word we are using:

1.If we are using Word 2007 API and the .docx file format, the active
document exposes an property named
Document.WordOpenXML(http://msdn.microsoft.com/en-us/library/bb242889.aspx
). It will return the OpenXML content of the active document. If the user
does not modify the document in XML level, by default, the image
inlineshape's pkg:name will be image1, image2, image3.... And the
corresponding content in the OpenXML looks like:

<pkg:part pkg:name=\"/word/media/image1.gif\" pkg:contentType=\"image/gif\"
pkg:compression=\"store\">
<pkg:binaryData>(binary content represents the image)
</pkg:binaryData>
</pkg:part>

Thus, I think we can try to search the index of the string "<pkg:part
pkg:name=\"/word/media/image"+index.ToString() with String.Indexof() method
in Document.WordOpenXML. And then we can find the next "<pkg:binaryData>"
and "</pkg:binaryData>" mark's location with the same method. Then, we can
know the binary data's length which reflects the image's physical size
occupied in the document.

2.If we are using Word 2003 and the .doc file format, no API exposed to
developer to get the inline image's size. The only way I can think out is
calling document.SaveAs() method to save the active document in html
format. Then, all images are saved in the corresponding folder of that html
file. We can iterate through those image files to know their actual size.


Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
T

Tony Jollans

You can get a guide with something on these lines:

For Each s In ActiveDocument.InlineShapes
MsgBox "Guide size is " & UBound(s.Range.EnhMetaFileBits)
Next

However, this does not really reflect size on disk, especially in 2007 -
which may compress images itself (even before zipping), or may hold more
than one copy, depending on how the image has been edited.
 
P

Peter Karlström

Hi Tony

I much appreciate your reply, since it, after my testing, actually seems to
reflect size in some way. I will se if this can be of any use in the project.

Thanks a million

Best Regards
--
Peter Karlström
Midrange AB
Sweden


Tony Jollans said:
You can get a guide with something on these lines:

For Each s In ActiveDocument.InlineShapes
MsgBox "Guide size is " & UBound(s.Range.EnhMetaFileBits)
Next

However, this does not really reflect size on disk, especially in 2007 -
which may compress images itself (even before zipping), or may hold more
than one copy, depending on how the image has been edited.
 
J

Ji Zhou [MSFT]

Hello all,

Thanks for Tony's comment, as to why the EnhMetaFileBits property only
reflects but not equals to the image file size, I think I can give some
explanation. From the MSDN document
http://msdn.microsoft.com/en-us/library/aa172538.aspx, the EnhMetaFileBits
returns a variant that represents a picture representation of how a
selection or range of text appears. Thus, I think this property just
capture a picture of the range. It may be also used internally for
Range.CopyAsPicture() method. But, we cannot know which algorithm Word
adopts to generate the picture for us. It may omit some information to
compress it already.

To get a more accurate image size in Word, I think the approach is still
converting the document format to another type. The xml should be better
than html. The corresponding content of image in Office-2003-XML-format
Word document looks like:

<w:binData w:name="wordml://02000001.jpg">......(binary code of the image)
</w:binData>

Consequently, I use the following codes to get all image files' size:

private void ThisAddIn_Startup(object sender, System.EventArgs e)
{
this.Application.DocumentOpen += new
Microsoft.Office.Interop.Word.ApplicationEvents4_DocumentOpenEventHandler(Ap
plication_DocumentOpen);
}

void Application_DocumentOpen(Microsoft.Office.Interop.Word.Document Doc)
{
object visible = false;
Word.Document newDoc = this.Application.Documents.Add(ref missing, ref
missing, ref missing, ref visible);
newDoc.Content.FormattedText = Doc.Content.FormattedText;

object fileName = @"D:\temp.xml";
object fileFormat = Word.WdSaveFormat.wdFormatXML;
newDoc.SaveAs(ref fileName, ref fileFormat, ref missing, ref missing,
ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref
missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
newDoc.Close(ref missing, ref missing, ref missing);

String xmlContent = System.IO.File.ReadAllText(@"D:\temp.xml");
int startIndex = 1;
while (true)
{
int start = xmlContent.IndexOf(@"<w:binData w:name=" + '"' +
"wordml://", startIndex);
if (start == -1)
{
break;
}
else
{
startIndex = start;
int end = xmlContent.IndexOf(@"</w:binData>", startIndex);
startIndex = end;
MessageBox.Show("File size is about :" + ((end - start - 41) *
0.73).ToString() + " bytes");
}
}
}

I have tried it on my side. It works fine. Every time a document is opened,
the code will create an invisible document where we paste the opened
document's content. Then it saves the invisible document as xml format to
the disk. With searching in the saved xml content, we can find each
<w:binData></w:binData> mark and get the length of the binary content.

Of course, this approach is slower than the EnhMetaFileBits. So it depends
on your scenario which one to choose. Hope this helps. If you have any
future questions, please feel free to let me know.


Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top