Data analysis: find average, spikes, etc.

F

Fred Marshall

I have an array of data with separate (labeled) vertical sections of
data that all run together one after another. A Pivot Table will do
things like find the sum for each section, etc.

I call them "sections" because the data has been sorted accordingly. All
the data for one section is contiguous in a column - then the next ..
and so forth.

But, I'm not very good with Pivot Tables and here's what I'd want
conceptually:

1) Find the average of values in one column for each section in that column.

2) Subtract the average from the data in that one column for each section.

3) Detect values of the difference from (2) that are particularly large.

I can count the number of values in each section but I can't figure out
how to use that information usefully. So, I'm looking for ways to do
what's needed.

I don't want to use VB...

Fred
 
J

Jim Cone

With the first column of data in E5:E100...
Enter "=Average(E5:E100)" in E1
Enter " =Stdev(E5:E100)" in E2

Select the column data and use two Conditional Formatting formula similar to...
=E5>($E$1+(2*$E$2)) 'Red font
=E5<($E$1-(2*$E$2)) 'Red interior
'--
Jim Cone
Portland, Oregon USA
http://www.contextures.com/excel-sort-addin.html





"Fred Marshall" <fmarshallx@remove_the_xacm.org>
wrote in message
I have an array of data with separate (labeled) vertical sections of
data that all run together one after another. A Pivot Table will do
things like find the sum for each section, etc.

I call them "sections" because the data has been sorted accordingly. All
the data for one section is contiguous in a column - then the next ..
and so forth.

But, I'm not very good with Pivot Tables and here's what I'd want
conceptually:

1) Find the average of values in one column for each section in that column.

2) Subtract the average from the data in that one column for each section.

3) Detect values of the difference from (2) that are particularly large.

I can count the number of values in each section but I can't figure out
how to use that information usefully. So, I'm looking for ways to do
what's needed.

I don't want to use VB...

Fred
 
F

Fred Marshall

Jim said:
With the first column of data in E5:E100...
Enter "=Average(E5:E100)" in E1
Enter " =Stdev(E5:E100)" in E2

Select the column data and use two Conditional Formatting formula similar to...
=E5>($E$1+(2*$E$2)) 'Red font
=E5<($E$1-(2*$E$2)) 'Red interior
'--
Jim Cone
Portland, Oregon USA
http://www.contextures.com/excel-sort-addin.html
Jim,

I don't think that does it...
The data looks like this

Account Amount Average
1 5 5.25
1 6
1 7
1 3
2 2 2.50
2 3
2 4
2 3
2 1
2 2
etc.

I don't care where the Average values are shown as long as they are
associated with their "Account" or section of the data. In fact, they
could repeat for each row in each section.
Don't need STDEV.

Thanks,

Fred
Ilwaco, WA USA
 
M

Ms-Exl-Learner

Jim,

I don't think that does it...
The data looks like this

Account         Amount  Average
1               5       5.25
1               6      
1               7
1               3
2               2       2.50
2               3      
2               4
2               3
2               1      
2               2
etc.

I don't care where the Average values are shown as long as they are
associated with their "Account" or section of the data.  In fact, they
could repeat for each row in each section.
Don't need STDEV.

Thanks,

Fred
Ilwaco, WA USA


Based on your example data your first row contains the column header
and the Account Number is present in Column A and Amount is in Column
B.

Copy and paste the below formula in C2 cell.

=IF(COUNTIF($A$2:$A2,$A2)=1,SUMPRODUCT(($A$2:$A$11=$A2)*($B$2:$B$11))/
COUNTIF($A$2:$A$11,$A2),"")

Drag the C2 cell formula to the remaining cells of Column C depends
upon the A & B Column Data.

If you want to increase the formula to some more cells then Change the
end cells A11 & B11 to your desired range, if required.

Hope it’s clear!
 
F

Fred Marshall

Ms-Exl-Learner said:
Based on your example data your first row contains the column header
and the Account Number is present in Column A and Amount is in Column
B.

Copy and paste the below formula in C2 cell.

=IF(COUNTIF($A$2:$A2,$A2)=1,SUMPRODUCT(($A$2:$A$11=$A2)*($B$2:$B$11))/
COUNTIF($A$2:$A$11,$A2),"")

Drag the C2 cell formula to the remaining cells of Column C depends
upon the A & B Column Data.

If you want to increase the formula to some more cells then Change the
end cells A11 & B11 to your desired range, if required.

Hope it’s clear!

Thanks for the reply!

I can get it to work on a simple example but not on the actual data.
Part of the problem is that there are probably too many "sections".
But, I think I can handle that OK.

I wonder if you can explain the notation you're using for SUMPRODUCT:

SUMPRODUCT(($A$2:$A$11=$A2)*($B$2:$B$11))

I don't understand the use of x:y=z nor the "*" buried in the middle.
all I see for an argument is
(x:y,r:s) .... notation and here I don't even see the comma.
What are the underlying notation rules you're using?

Thanks,

Fred
 
A

AM

Thanks for the reply!

I can get it to work on a simple example but not on the actual data.
Part of the problem is that there are probably too many "sections".
But, I think I can handle that OK.

I wonder if you can explain the notation you're using for SUMPRODUCT:

SUMPRODUCT(($A$2:$A$11=$A2)*($B$2:$B$11))

I don't understand the use of x:y=z nor the "*" buried in the middle.
all I see for an argument is
(x:y,r:s) .... notation and here I don't even see the comma.
What are the underlying notation rules you're using?

Thanks,

Fred

The dollar signs make it an absolute call of that range A2:A11. The
equals makes me think you snipped out an IF statement or something.
The asterisk is common character for multiplication so it is multiplying
the value result of the first set of parentheses and the result of the
second.

The equal sign does seem strange or incorrect.
 
F

Fred Marshall

AM said:
The dollar signs make it an absolute call of that range A2:A11. The
equals makes me think you snipped out an IF statement or something.
The asterisk is common character for multiplication so it is multiplying
the value result of the first set of parentheses and the result of the
second.

The equal sign does seem strange or incorrect.

I looked it up. It's an array type of notation.....

Fred
 
M

Max

The simpler version to this:
SUMPRODUCT(($A$2:$A$11=$A2)*($B$2:$B$11))

would be:
SUMIF($A$2:$A$11,$A2,$B$2:$B$11)
which is easier to grasp intuitively what's happening
if you are dealing with a single criteria

To handle multiple criteria the power of Sumproduct can of course be
drawn upon, eg:
SUMPRODUCT((Cond1)*(Cond2)*(Cond3),SumRange)
where all range sizes in the conditions and sum range need to be
identical
 
M

Ms-Exl-Learner

Yes, but me too forget to use the sumif function while giving reply to
the OP. In addition to that in sumif we can refer it to the whole
column like this SUMIF(A:A,A2,B:B) and need not to worry about
selecting the ranges and protecting it using $ (Dollar) symbol. But
we can’t use the Sumproduct to whole column upto 2003 and this is
another advantage in using Sumif.
 
F

Fred Marshall

Well, I got the original working OK. Thanks!

It's now just that there are 9500 entries in total. So, I'm having to
apply the formula manually for a limited number of cells in the column
to avoid the maximum number of terms in each cell formula.

Now, it would be something to avoid having to do that!!

Fred
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top