MDB size estimation

Z

zofficedepot

Are there rules of thumb? Back in the days before Microsoft vanquished
programming efficiency, you could calculate sizes fairly precisely -
multiply integers by 2 and longs by 4 and so forth (allowing an
address table and code and certain overhead).

I'm basically trying to get a handle on some huge files, and relate
their size to their tables, and see from where the fat arises.
Ideally I could add up field lengths giving record lengths which I
could multiply by record counts. But there must be more to it.

Notably, what about all those strings that have a default size of 255?
 
J

John Spencer

Well, a field that has a size of 255 can store zero to 255 characters. The
problem is that the storage taken up is dependent of how many characters are
actually stored in the field. If you store 25 characters in the field, you
will need roughly one-tenth of the storage space as if you fill the field with
255 characters.

The Access 97 Developer's Handbook had a list of the overhead used by each
data type. I'm afraid I've never seen an updated list. I'm really not sure
that what you describe is possible. The ADH says that the figures it gives
were derived from a Microsoft KB article which states that the numbers can be
used to calculate only an estimated record size. Even if the overheads haven't
changed in JET 4 - and they very well may have done - I don't think you're
ever going to be able to calculate the exact size.

With that caveat, here's the info from the ADH ...

Overhead includes ...

Seven bytes per record for record overhead

One byte variable-length column overhead for each text, memo, OLE and
hyperlink column.

One additional byte for every 256 bytes of the total space occupied by all
text, memo, OLE and hyperlink data type columns.

One byte fixed-column overhead for each yes/no, byte, integer, long integer,
counter, single, double, date/time, and replication ID column.

The figures were derived from KB article Q114215, "INF: Estimating Microsoft
Access Table Sizes". I haven't checked if that article is still on-line, it
*might* have been updated for JET 4, on the other hand, it's equally possible
that it may have been archived by now.

Good luck.

--
Brendan Reynolds


From: Chris O'C via AccessMonster.com
Used a hex editor to see what is happening

Every data page in Jet 4 is 4096 bytes. No more, no less. If any data page
isn't exactly 4096 bytes, Jet won't open the db.

For data pages that contain records, the overhead is as follows:

1 - 14 bytes for the data page definition.
2 - 2 bytes for each row pointer.
3 - Row definition is variable with a minimum of 8 bytes. It's impossible to
predict exactly how many bytes for the entire row definition because you have
to set aside bytes for however many null columns in that row. How do you know
ahead of time which, if any, columns will be null when users enter data?
4 - x bytes for fixed length columns.
5 - y bytes for variable length columns.
6 - 2 bytes for each pointer to each variable length column in the row.

The more columns you have, the less data you can fit into a single data page
because the row definition has to define those columns for that row. The
maximum data for a row is about 4,000 bytes but you'll usually fit less in there.

John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
 
Ð

жанна Ñ€Ñбчикова

Are there rules of thumb? Back in the days before Microsoft vanquished
programming efficiency, you could calculate sizes fairly precisely -
multiply integers by 2 and longs by 4 and so forth (allowing an
address table and code and certain overhead).

I'm basically trying to get a handle on some huge files, and relate
their size to their tables, and see from where the fat arises.
Ideally I could add up field lengths giving record lengths which I
could multiply by record counts. But there must be more to it.

Notably, what about all those strings that have a default size of 255?
 
A

Albert D. Kallal

Notably, what about all those strings that have a default size of 255?

You have some really good information from John Spencer on this matter. The
simple fact is that the field length is not padded with blanks like the old
systems of Foxpro dBase etc. In those old system records were fixed length
and it they did not care much about space efficiency and simply wasted all
that extra space for no reason at all. (well except technology was not
really that advanced back then)
Back in the days before Microsoft vanquished
programming efficiency, you could calculate sizes fairly precisely

Actually in the name of efficiency is exactly why access and the jet
database engine works the way it does. It is quite efficient in the sense
that it doesn't store extra text when you define the field of particular
length. Furthermore it actually uses compression algorithms on each of the
text fields to compress and save even more text (and in table desing mode,
you can turn this feature off). Note that compression is not to be confused
with a compact and repair opton that you use on a regular basis to keep the
size the database from bloating too much.
 
D

David W. Fenton

You have some really good information from John Spencer on this
matter. The simple fact is that the field length is not padded
with blanks like the old systems of Foxpro dBase etc. In those old
system records were fixed length and it they did not care much
about space efficiency and simply wasted all that extra space for
no reason at all. (well except technology was not really that
advanced back then)

There was a *very* good reason -- performance. It was much quicker
to parse the record linearly than by using tokens to delimit the
data. On the other hand, it also meant having to read more data, but
apparently the designers of the dBase format felt (at least at the
time they were designing it, i.e., early 80s) that the fixed-length
model was faster to read/write. It certainly would improve write
performance, since a variable length record would require rewriting
all the fields *after* a field that had grown/shrunk.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top