Determine the filesize of an image in a document

Peter KarlstrÃ¶m · Aug 26, 2008

Hi

In one of my projects I want to check all images in the active document and
determine the actual "filesize" of that image, i.e. the image "physical" size.

Next step will be do compress too large images in order to keep document
filesize to a minimum.

I have looked around in the InLineShape objekt without finding a proper
approach or property/method.

Is this possible? Can anybody help me?

Thanks in advance

Ji Zhou [MSFT] · Aug 27, 2008

Hello Peter,

Thanks for using Microsoft Newsgroup Support Service, this is Ji Zhou
[MSFT] and I will be working on this issue with you.

Firstly, I want to ensure I understand the issue correctly. We are trying
to get the physical size of images in the active document. If an image's
size is big, then we compress it. Any misunderstanding about your question,
please let me know.

I think, the approach depends on which version of Word we are using:

1.If we are using Word 2007 API and the .docx file format, the active
document exposes an property named
Document.WordOpenXML(http://msdn.microsoft.com/en-us/library/bb242889.aspx
). It will return the OpenXML content of the active document. If the user
does not modify the document in XML level, by default, the image
inlineshape's pkg:name will be image1, image2, image3.... And the
corresponding content in the OpenXML looks like:

<pkg

art pkg:name=\"/word/media/image1.gif\" pkg:contentType=\"image/gif\"
pkg:compression=\"store\">
<pkg:binaryData>(binary content represents the image)
</pkg:binaryData>
</pkg

art>

Thus, I think we can try to search the index of the string "<pkg

art
pkg:name=\"/word/media/image"+index.ToString() with String.Indexof() method
in Document.WordOpenXML. And then we can find the next "<pkg:binaryData>"
and "</pkg:binaryData>" mark's location with the same method. Then, we can
know the binary data's length which reflects the image's physical size
occupied in the document.

2.If we are using Word 2003 and the .doc file format, no API exposed to
developer to get the inline image's size. The only way I can think out is
calling document.SaveAs() method to save the active document in html
format. Then, all images are saved in the corresponding folder of that html
file. We can iterate through those image files to know their actual size.

Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

Peter KarlstrÃ¶m · Aug 27, 2008

Hi Ji Zhou

Thanks for your reply.

In this project, we're dealing with Office 2003, so it's a pity it won't work.
Your suggestion will probably be a problem since this check will be done
everytime a document is opened.
Thanks anyway

Regards
--
Peter KarlstrÃ¶m
Midrange AB
Sweden

"Ji Zhou [MSFT]" said:
Hello Peter,

Thanks for using Microsoft Newsgroup Support Service, this is Ji Zhou
[MSFT] and I will be working on this issue with you.

Firstly, I want to ensure I understand the issue correctly. We are trying
to get the physical size of images in the active document. If an image's
size is big, then we compress it. Any misunderstanding about your question,
please let me know.

I think, the approach depends on which version of Word we are using:

1.If we are using Word 2007 API and the .docx file format, the active
document exposes an property named
Document.WordOpenXML(http://msdn.microsoft.com/en-us/library/bb242889.aspx
). It will return the OpenXML content of the active document. If the user
does not modify the document in XML level, by default, the image
inlineshape's pkg:name will be image1, image2, image3.... And the
corresponding content in the OpenXML looks like:

<pkgart pkg:name=\"/word/media/image1.gif\" pkg:contentType=\"image/gif\"
pkg:compression=\"store\">
<pkg:binaryData>(binary content represents the image)
</pkg:binaryData>
</pkgart>

Thus, I think we can try to search the index of the string "<pkgart
pkg:name=\"/word/media/image"+index.ToString() with String.Indexof() method
in Document.WordOpenXML. And then we can find the next "<pkg:binaryData>"
and "</pkg:binaryData>" mark's location with the same method. Then, we can
know the binary data's length which reflects the image's physical size
occupied in the document.

2.If we are using Word 2003 and the .doc file format, no API exposed to
developer to get the inline image's size. The only way I can think out is
calling document.SaveAs() method to save the active document in html
format. Then, all images are saved in the corresponding folder of that html
file. We can iterate through those image files to know their actual size.

Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/en-us/subscriptions/aa948868.aspx#notifications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://support.microsoft.com/select/default.aspx?target=assistance&ln=en-us.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.

Tony Jollans · Aug 27, 2008

You can get a guide with something on these lines:

For Each s In ActiveDocument.InlineShapes
MsgBox "Guide size is " & UBound(s.Range.EnhMetaFileBits)
Next

However, this does not really reflect size on disk, especially in 2007 -
which may compress images itself (even before zipping), or may hold more
than one copy, depending on how the image has been edited.

Peter KarlstrÃ¶m · Aug 27, 2008

Hi Tony

I much appreciate your reply, since it, after my testing, actually seems to
reflect size in some way. I will se if this can be of any use in the project.

Thanks a million

Best Regards
--
Peter KarlstrÃ¶m
Midrange AB
Sweden

Tony Jollans said:
You can get a guide with something on these lines:

For Each s In ActiveDocument.InlineShapes
MsgBox "Guide size is " & UBound(s.Range.EnhMetaFileBits)
Next

However, this does not really reflect size on disk, especially in 2007 -
which may compress images itself (even before zipping), or may hold more
than one copy, depending on how the image has been edited.

Ji Zhou [MSFT] · Aug 28, 2008

Hello all,

Thanks for Tony's comment, as to why the EnhMetaFileBits property only
reflects but not equals to the image file size, I think I can give some
explanation. From the MSDN document
http://msdn.microsoft.com/en-us/library/aa172538.aspx, the EnhMetaFileBits
returns a variant that represents a picture representation of how a
selection or range of text appears. Thus, I think this property just
capture a picture of the range. It may be also used internally for
Range.CopyAsPicture() method. But, we cannot know which algorithm Word
adopts to generate the picture for us. It may omit some information to
compress it already.

To get a more accurate image size in Word, I think the approach is still
converting the document format to another type. The xml should be better
than html. The corresponding content of image in Office-2003-XML-format
Word document looks like:

<w:binData w:name="wordml://02000001.jpg">......(binary code of the image)
</w:binData>

Consequently, I use the following codes to get all image files' size:

private void ThisAddIn_Startup(object sender, System.EventArgs e)
{
this.Application.DocumentOpen += new
Microsoft.Office.Interop.Word.ApplicationEvents4_DocumentOpenEventHandler(Ap
plication_DocumentOpen);
}

void Application_DocumentOpen(Microsoft.Office.Interop.Word.Document Doc)
{
object visible = false;
Word.Document newDoc = this.Application.Documents.Add(ref missing, ref
missing, ref missing, ref visible);
newDoc.Content.FormattedText = Doc.Content.FormattedText;

object fileName = @"D:\temp.xml";
object fileFormat = Word.WdSaveFormat.wdFormatXML;
newDoc.SaveAs(ref fileName, ref fileFormat, ref missing, ref missing,
ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref
missing,
ref missing, ref missing, ref missing, ref missing, ref missing);
newDoc.Close(ref missing, ref missing, ref missing);

String xmlContent = System.IO.File.ReadAllText(@"D:\temp.xml");
int startIndex = 1;
while (true)
{
int start = xmlContent.IndexOf(@"<w:binData w:name=" + '"' +
"wordml://", startIndex);
if (start == -1)
{
break;
}
else
{
startIndex = start;
int end = xmlContent.IndexOf(@"</w:binData>", startIndex);
startIndex = end;
MessageBox.Show("File size is about :" + ((end - start - 41) *
0.73).ToString() + " bytes");
}
}
}

I have tried it on my side. It works fine. Every time a document is opened,
the code will create an invisible document where we paste the opened
document's content. Then it saves the invisible document as xml format to
the disk. With searching in the saved xml content, we can find each
<w:binData></w:binData> mark and get the length of the binary content.

Of course, this approach is slower than the EnhMetaFileBits. So it depends
on your scenario which one to choose. Hope this helps. If you have any
future questions, please feel free to let me know.

Best regards,
Ji Zhou ([email protected], remove 'online.')
Microsoft Online Community Support

Delighting our customers is our #1 priority. We welcome your comments and
suggestions about how we can improve the support we provide to you. Please
feel free to let my manager know what you think of the level of service
provided. You can send feedback directly to my manager at:
(e-mail address removed).

This posting is provided "AS IS" with no warranties, and confers no rights.

How to center adjust text above and image in a two column document.	1	Aug 17, 2021
Manipulating an image option in the header by section	0	Nov 20, 2013
insert table in word document by vba	0	Aug 15, 2011
Image Size in Word	6	Oct 17, 2008
What type of image is this in Word ?	1	Jan 9, 2013
Looping over an image - Finding text over and below a image	0	Apr 9, 2009
Compress the file size of all images within a document	0	Mar 9, 2010
Image compression in PP2003 (lots of screenshots)	2	Aug 7, 2009

Determine the filesize of an image in a document

Peter KarlstrÃ¶m

Ji Zhou [MSFT]

Peter KarlstrÃ¶m

Tony Jollans

Peter KarlstrÃ¶m

Ji Zhou [MSFT]

Ask a Question

Similar Threads