Subtotals in very large datasets

H

heather

I have a number of very large datasets of household water consumption which I
want to combine so I have an average reading per house per year (eg I have
four spreadsheets per year, for an area (or subset of area, it is a city, and
some are VERY large)). Each row has an identifier, LotPlan, most of which are
unique, but there are some duplicates (where this is so, they refer to the
same parcel, so I can just average or sum the readings with no problem). I
have deleted all zero records.

What I want to do is run subtotals and average the Avg daily water
consumption per LotPlan (some have four readings, some may only have one).
However, Excel will not let me subtotal, as it gives me the end of data
error, which I figure is related to the size of the spreadsheet. What I was
doing was appending the data so I had one worksheet with all the information,
identified by LotPlan, and also by reading period; however, I cannot do this
either, as then it is more than 65K rows. Are there any ways I can get around
these problems in Excel 2003 or 2007? If not, can I do it in Access or SPSS
or other software? I have 4Gig RAM, although it is still very slow! Thanks
for any help.
 
S

smartin

heather said:
I have a number of very large datasets of household water consumption which I
want to combine so I have an average reading per house per year (eg I have
four spreadsheets per year, for an area (or subset of area, it is a city, and
some are VERY large)). Each row has an identifier, LotPlan, most of which are
unique, but there are some duplicates (where this is so, they refer to the
same parcel, so I can just average or sum the readings with no problem). I
have deleted all zero records.

What I want to do is run subtotals and average the Avg daily water
consumption per LotPlan (some have four readings, some may only have one).
However, Excel will not let me subtotal, as it gives me the end of data
error, which I figure is related to the size of the spreadsheet. What I was
doing was appending the data so I had one worksheet with all the information,
identified by LotPlan, and also by reading period; however, I cannot do this
either, as then it is more than 65K rows. Are there any ways I can get around
these problems in Excel 2003 or 2007? If not, can I do it in Access or SPSS
or other software? I have 4Gig RAM, although it is still very slow! Thanks
for any help.

A few thoughts...

In E2007 AFAIK you will not have this problem unless you have up around
1M rows. But, I do not have E2007 so cannot say for sure.

In E2003 you might be able to get subtotals adequately by using a Pivot
Table. Oh, maybe not since the consolidated data exceeds 64K rows.

Were it me hitting the limit, as it were, it would be time to port the
data to something more capable. Access is a logical choice. With the
consolidated data in one table Access can easily aggregate totals and
averages.
 
H

heather

Thanks, after posting, I did as you said - ported it to Access, and it is
working fine! Still need to figure out some things re averaging in Access,
but shouldn't be an issue. (I couldn't do a pivot table in 2003 or 2007
because of the row limitation...)
thanks again
 
A

Ashish Mathur

If ou are using Excel 2003, one option you have is to take the data to
Access and then create a pivot in Excel where the data source will be the
Access database.

--
Regards,

Ashish Mathur
Microsoft Excel MVP
www.ashishmathur.com
 
S

smartin

You might be there already, but in Access it is as simple as creating a
query on the table, click the summation symbol (fancy 'E' [Sigma] icon),
Group By the dimensions (LotPlan, 'reading period'), and Average the
value 'water consumption'.
 
S

Shane Devenshire

Hi,

A few points:

1. If the data source is greater than 65,535 in 2003 it is best to put the
data into Access. (Because the Excel 2003 solution for handling more than
65,536 rows in the source is pretty tricky.)
2. If the data source is greater than 1,048,575 in 2007 (1 row for a
header) then again Access is the prefered solution.
3. If the resulting Pivot Table would have more than 65,536 rows then you
can still create the pivot table in 2003, but some of it won't be viewable.
Of course if you had a page filter then you could hide or display blocks of
data, so the fact that you could see it all at one time would not be critical.
4. Similar pont for Excel 2007 pivot tables as #3 but with the greater
number of displayable rows.
5. In Access choose the Query object and click New, pick your table from
the Add Tables (Show Tables) dialog box. Add the fields you want to see by
double-clicking each one. Click the Total button and leave Group By on the
Total row under LotPlan, open the drop down on the Total row under the column
you want to average and choose Avg. Don't include unnecessary fields, Access
will group on them also, although there are workarounds. Note that Access
will only display the averages, not the detail and averages; if you want both
you should consider a Report.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top