T. Valko said:
You made some good points for consideration
Thanks.
but I think you may have drifted from the question posed:
Granted, "CPU utilization" may not be the true consideration,
but I think they meant overall efficiency.
Whether you call it "CPU utilization" or "overall efficiency", my point
remains the same. It might have gotten lost in the "noise" (TMI).
I understood the OP's question as it was posed, and I understand that SUMIFS
might be more efficient than SUMPRODUCTS, at least under some/most
circumstances.
I do not believe I "drifted" from those facts. In fact, I reiterated them.
("Yes, in some contexts, SUMIF is more efficient than SUMPRODUCT".)
But my point is still the same: the difference is usually "mouse nuts" in
the grand scheme of things. Ergo, users are usually misdirected when they
focus on such miniscule details.
Arguably, since I believe the differences are probably miniscule, it really
should not matter if the OP chooses SUMIFS because he has deluded himself
into believing it will make a significant difference performance-wise.
But I'm concerned with the principle of the matter.
Certainly, there are instances where the choice of function does make a
difference. But those are functions that have global impact; for example,
volatile functions. That's what I mean by paying "reasonable attention to
good design".
In Excel 2007 SUMIFS is "significantly" more efficient than SUMPRODUCT.
[....]
Range = 18,000 rows
[....]
.....................SUMPRODUCT vs SUMIFS
[....]
4 conditions........0.077808...........0.014776
How complex were the formulas that you had in the referenced cells? That
is, how significant was the "extraneous" computations compared to the SUMIF
and SUMPRODUCT evaluation themselves?
Perhaps the following will illustrate the concepts I am trying to explain.
I cannot offer a counter-example using SUMIFS, since I don't have Excel
2007. Perhaps you can follow my lead (below) and post an update.
In B1:B18000, I have the constants 1 through 18000. In C1:F18000, I have
the formula =CHOOSE(RANDBETWEEN(1,4),"w","x","y","z"). Of course, since
RANDBETWEEN() is volatile, this will ensure that all 72,000 cells are
re-evaluated when they are calculated.
I compare the performance of each of the following solutions separately:
A1: =SUMPRODUCT(--(C1:C18000="w"),B1:B18000) +
SUMPRODUCT(--(D1
18000="x"),B1:B18000) +
SUMPRODUCT(--(E1:E18000="y"),B1:B18000) +
SUMPRODUCT(--(F1:F18000="z"),B1:B18000)
A2: =SUMIF(C1:C18000,"w",B1:B18000) +
SUMIF(D1
18000,"x",B1:B18000) +
SUMIF(E1:E18000,"y",B1:B18000) +
SUMIF(F1:F18000,"z",B1:B18000)
A3: =SUMPRODUCT(--(C1:C18000="w"),B1:B18000)
A4: =SUMIF(C1:C18000,"w",B1:B18000)
A1 and A2 are not the same logic as SUMIFS, of course. And I am making 4
function calls per execution instead of 1, incurring more overhead.
But then again, that's the point: to demonstrate the effect of overhead v.
individual function execution.
When I measure the time to calculate A3 and A4 10 times without
recalculating C1:F18000, I get the following results:
A3 (SUMPROD): total 0.294 sec, avg 26.278 msec
A4 (SUMIF): total 0.073 sec, avg 6.656 msec
So the SUMIF solution seems to be 3.9 to 4.0 times faster (on my computer).
(Note: "total" does not equal to 10 times "avg" because "total" includes
macro overhead as well, e.g. looping logic, timer function calls, timer
maintenance, and VB overhead.)
When I measure the time to calculate A1 and A2 without recalculating
C1:F18000, I get the following results:
A1 (SUMPROD): total 0.747 sec, avg 61.958 msec
A2 (SUMIF): total 0.285 sec, avg 25.677 msec
So the SUMIF solution seems to be 2.4 to 2.6 times faster, but the
difference is about 37% less.
When I measure A1 and A2 with recalculation of C1:F18000 each time, I get
the following results:
A1 (SUMPROD): total 5.030 sec, avg 442.844 msec
A2 (SUMIF): total 4.576 sec, avg 405.718 msec
So the SUMIF solution seems to be only about 1.1 times faster.
As you can see, as the overhead computation increases, the difference
between the SUMIF and SUMPRODUCTS diminishes to the point where there is
almost no difference.
To reiterate my point: whether or not to focus on the performance
difference between SUMIFS and SUMPRODUCT depends on the relative
contribution that those calculations make to the whole set of calcuations.
If the user presumes that their contribution is large, we can see that even
though SUMIFS might be significantly faster, the total time difference is
not likely to be large (less 1 sec in my case).
On the other hand, if their contribution is small, we can see that the
difference between SUMIFS and SUMPRODUCT is diminished significantly by the
other calculations.
In either case, it does not seem to make sense to worry about the
performance difference between SUMIFS and SUMPRODUCT in particular.
One disadvantage of SUMIFS is that it's limited to
"straight comparisons" (just like SUMIF).
Of course, it does not make sense to ask about the relative performance of
SUMIFS v. SUMPRODUCT if SUMIFS cannot do the job. In that case, SUMPRODUCTS
is "infinitely" more efficient ;-).
So, since the OP is considering SUMPRODUCT v. SUMIFS, it is reasonable to
assume that both functions can be used. Ergo, the SUMPRODUCT expression
uses equivalently simple "straight comparisons".
An aside, for my edification....
There is no difference in efficiency between these formulas:
=SUMIFS(E:E,A:A,"w",B:B,"x",C:C,"y",D
,"z")
=SUMIFS(E1:E18000,A1:A18000,"w",B1:B18000,"x",C1:C18000,"y",D1
18000,"z")
That could make a significant difference between SUMPRODUCT v. SUMIFS with
sparse tables; more correctly, tables of potentially long columns, but with
significantly fewer rows filled in at the top.
Arguably, that is something for the OP to consider, if it is applicable.
But I'm curious about the "no difference" assertion.
I presume that is based on the assumption that some range property (I forget
which one) demarcates the last-used cell in a column. Right?
Is that demarcation always correct?
In Excel 2003, I notice that when I put a value or formula in some distance
cell (e.g. AI65000), then delete it, ctrl+End still goes to that distance
cell.
Does that also screw up the last-used cell demarcation in that column? For
example, would that cause the range AI:AI to be processed as AI1:AI65000,
for example? In Excel 2007 as well as Excel 2003?
----- original message -----
T. Valko said:
You made some good points for consideration but I think you may have
drifted from the question posed:
So my question is: Is it more efficient (CPU utilization)
to use SUMIFS or SUMPRODUCT?
Granted, "CPU utilization" may not be the true consideration, but I think
they meant overall efficiency.
In Excel 2007 SUMIFS is "significantly" more efficient than SUMPRODUCT.
Range = 18,000 rows
Average calc time (5 calcs)
.....................SUMPRODUCT vs SUMIFS
1 condition..........0.025024...........0.009200
2 conditions........0.042650...........0.012822
3 conditions........0.060202...........0.014202
4 conditions........0.077808...........0.014776
Another advantage is that SUMIFS only calculates based on the used range.
There is no difference in efficiency between these formulas:
=SUMIFS(E:E,A:A,"w",B:B,"x",C:C,"y",D
,"z")
=SUMIFS(E1:E18000,A1:A18000,"w",B1:B18000,"x",C1:C18000,"y",D1
18000,"z")
One disadvantage of SUMIFS is that it's limited to "straight comparisons"
(just like SUMIF).
--
Biff
Microsoft Excel MVP
JoeU2004 said:
Clarification....
I'm building an Excel model that will summarize data
from a table (18,000 rows x 65 columns)
[....]
for calculations that will take several seconds, not
to mention many minutes in your case
I'm sorry: I don't know how long your calculations will take. I don't
know the complexity of and how many SUMIFS or SUMPRODUCT formulas you are
talking about.
On the other hand, if your calculations do not take "several seconds, not
to mention many minutes", it is even more questionable whether you should
focus on the CPU utilization of SUMIFS v. SUMPRODUCT.
----- original message -----
JoeU2004 said:
I'm building an Excel model that will summarize data from
a table (18,000 rows x 65 columns), and I want to minimize
the number of calculations / CPU utilization.
So my question is: Is it more efficient (CPU utilization)
to use SUMIFS or SUMPRODUCT?
In my experience, it is a misdirection to focus on the CPU efficiency of
one particular operation, unless you are designing interrupt code. (And
even then, it depends.)
Generally, as long as you pay reasonable attention to good design, you
should choose the function to use based on which one makes more sense in
the context (readability, ease of maintenance, etc). If it's a toss-up
based on those factors, it is probably a toss-up with respect to
efficiency overall.
Especially for calculations that will take several seconds, not to
mention many minutes in your case, the total computation time is
influenced more by extraneous factors, namely: other workbook
calculations and external system processing such as interrupts.
I do not have Excel 2007, so I cannot compare SUMIFS to SUMPRODUCT. The
following is a comparison of SUMIF and SUMPRODUCT in Excel 2003.
Yes, in some contexts, SUMIF is more efficient than SUMPRODUCT. This is
based on a macro that I used to compare the calculation time for
SUMIF(B1:B10000,">=0",C1:C10000) and
SUMPRODUCT(--(B1:B10000>=0),C1:C10000).
But the important thing to notice is: the total run time of the macro
was about the same in both cases. This demonstrates the general
principle stated above: extraneous factors often outweigh the
differences between small parts of the entire operation.
Moreover, the difference between SUMIF and SUMPRODUCT depends on the
complexity of the formulas in the referenced cells. For example, when
the referenced cells contained constants, SUMPRODUCT took 5 times longer
than SUMIF. But when the referenced cells contained =RAND(), SUMPRODUCT
took 3.5 times longer. This, again, demonstrates the general principle:
as the complexity of the calculations in the referenced cells increases,
the differences between SUMIF and SUMPRODUCT decreases.
(In all cases, on my computer, we are talking about 2 to 13 milliseonds
per formula calculation. But the actual times and even the relative
differences will vary greatly from system to system.)
Finally, it should be noted that these calculation comparisons were done
in a pristine environment where the only calculations involved the SUMIF
or SUMPRODUCT formula and the referenced cells. In real life, if your
workbook is complicated enough to have many tens of thousands SUMIFS or
SUMPRODUCT formulas, it probably also has many thousands of other
formulas that may or may not be recalculated. Again, that overhead may
outweigh the difference between the SUMIFS and SUMPRODUCT efficiency.
One last comment, for which I have no empirical data. I suspect that
the performance difference between SUMIFS and SUMPRODUCT is less than
the difference between SUMIF and SUMPRODUCT. The basis for that is:
generally, more complex features cost more in over-all computational
time, even when used in an equivalent manner (one criterion argument, in
this case). For example, whereas the internal implementation of SUMIF
can be optimized for the one criterion, SUMIFS must be able to handle a
variable number of arguments, just as SUMPRODUCT does.
(Of course, SUMPRODUCT will always have the additional overhead of
adding products. But as that is "mouse nuts" compared to the other
overhead.)
I'm sorry if this is not the simple canned response ("this one" or "that
one") you are looking for. But the fact is: performance choices are
not simple to make. My point is simply: don't sweat the small stuff.
Undoubtedly, that "microfocus" will cause you to lose sight of the "big
picture".