"Drop the lowest" in computing average

  • Thread starter Matthew Leingang
  • Start date
M

Matthew Leingang

Hello,

First, let me say that I have a solution to this problem but I am looking
for a better one. Second, I apologize if this gets a little long.

I keep scores for homework assignments in a spreadsheet, recording not the
percentage but the numerator (points achieved) and the denominator (points
possible). A student's homework score is computed by summing all the
numerators and dividing by the sum of all the denominators. This way a
perfect score on a 50-point problem set improves your homework score more
than a perfect score on a 10-point problem set would.

But if only it were that easy! We have a policy of "dropping" the n lowest
problem sets, where n is usually 2 or 3. This is to give the students some
slack. My colleagues and I have tried various interpretations of
"lowest"--lowest by percentage, lowest by z-score relative to the rest of
the class; and other hacks. I've finally decided that the "lowest" score is
the one that improves your total score the most if you neglect it. That is,
for each problem set, sum the numerators skipping this one, sum the
denominators skipping this one, and compare that to the original homework
score. I'm trying to find the best way to do this in Excel.

Let's suppose the numerators are in A1:Z1 and the denominators are in A2:Z2.
Then the change in homework score by dropping the problem set in column G
(say) is (SUM(A1:Z1) - G1)/(SUM(A2:Z2) - G2). I put these changes in
AA1:AZ1. Then in BA1 I have IF(RANK(AA1,$AA1:$AZ1)<=3,0,1), and so on down
to BZ1. I'm sure I don't have to put these in their own cells, but it helps
with debugging and conditionally formatting the original scores.

The adjusted numerator is then SUMPRODUCT(A1:Z1,BA1:BZ1), and the adjusted
denominator is SUMPRODUCT(A2:Z2,BA1:BZ1). Then I have to hide 52 columns of
intermediate values. :)

It seems like there ought to be a one-cell formula to compute the adjusted
numerator. I'm not that good with array functions, though. I tried
something like

SUMPRODUCT(A1:Z1,IF(RANK((SUM(A1:Z1)-A1:Z1)/(SUM(A2:Z2)-A2:Z2),
(SUM(A1:Z1)-A1:Z1)/(SUM(AZ:Z2)-A2:Z2))<=3,0,1)

But this gives a non-descriptive error (not to mention the fact that the
first and second arguments are identical!).

Additional complications:

* I need to break ties so exactly three are dropped. I do this by
adding to the change cell (SUM(A1:Z1) - G1)/(SUM(A2:Z2) - G2) a small number
times the column number. That's klunky but it works.

* I sometimes need to override policy and force a problem set to be
dropped. Currently my IF(RANK(AA1,$AA1:$AZ1)<=3,0,1) formula also
short-circuits to 0 if AA1 has the string "DROP" in it. Can I instead look
for a comment attached to the cell?

Thanks for any answers, advice, or clues.

--Matthew Leingang
 
J

JE McGimpsey

I'm not sure I follow. Do you really mean that you drop the scores that
achieve the most improvement? That seems counterintuitive to me.

To simplify, take an example where there are four scores, and you drop
one.

A B C D
1) 9 7 37 9
2) 10 10 50 10

So before dropping a score, the average is SUM(A1:D1)/SUM(A2:D2) = 77.5%

Dropping the lowest score, in column B (7/10 = 70%) results in (55)/(70)
= 78.57%

Dropping the score in column C (37/50 = 74%) results in (25)/(30) =
83.33%

So you really want to drop a higher % score (74% vs 70%) on a more
important assignment (50 vs 10)???

That seems bizarre!
 
N

N Harkawat

Matthew

Assuming that you want to drop the 3 smallest % scores that are most
detrimental to the overall average I came up with the following formula.
based on this example below in the range A1:A8

Numerator Denominator
10 10
15 20
800 1100
30 30
40 45
50 60


=SUM(IF((A2:A7/B2:B7)>MAX(SMALL((A2:A7/B2:B7),{1,2,3})),A2:A7))/SUM(IF((A2:A7/B2:B7)>MAX(SMALL((A2:A7/B2:B7),{1,2,3})),B2:B7))

Array entered (ctrl+shift+enter)
gives me a score of 94.11%

In the above example the scores 2nd,3rd and 6th were dropped since they
were they were the smallest 3 % of the total

You can adopt it to fit your range accordingly.

Hope it helps...
 
M

Matthew Leingang

Dear JE,

Thanks for your response. Yes, you've got it right. If we're married to
the idea that homework scores should be computed as total achieved/total
possible and not average(achieved/possible) for each problem set, then I
think this is the fairest way to interpret "dropping the lowest", at least
from the POV of the student.

In practice, there are about 35 problem sets which are mostly the same size
although there are some larger and some smaller. Most of the time the
scores dropped are zeroes. So the situation you describe -- I'm impressed
by your example-choosing skills, btw -- would be rare for me.

Oh, and this component amounts to 10-15% of a student's course grade, and we
grade on a curve, so I'm not sure how much a difference in course grades
this policy actually makes. But the students appreciate the policy because
it's (somewhat) forgiving.

--Matt
 
M

Matthew Leingang

Hi,

I shouldn't have said "average" in the subject line, I guess, because your
formula drops the lowest three average scores. As I explained in the last
post, I want to drop the scores which improve the total achieved/total
possible after dropping.

But thanks for the formula. I might be able to play with SUM and IF and
adapt it to my policy.

--Matt


Matthew

Assuming that you want to drop the 3 smallest % scores that are most
detrimental to the overall average I came up with the following formula.
based on this example below in the range A1:A8

Numerator Denominator
10 10
15 20
800 1100
30 30
40 45
50 60


=SUM(IF((A2:A7/B2:B7)>MAX(SMALL((A2:A7/B2:B7),{1,2,3})),A2:A7))/SUM(IF((A2:A7/
B2:B7)>MAX(SMALL((A2:A7/B2:B7),{1,2,3})),B2:B7))

Array entered (ctrl+shift+enter)
gives me a score of 94.11%

In the above example the scores 2nd,3rd and 6th were dropped since they
were they were the smallest 3 % of the total

You can adopt it to fit your range accordingly.

Hope it helps...
 
J

JE McGimpsey

Matthew Leingang said:
Thanks for your response. Yes, you've got it right. If we're married to
the idea that homework scores should be computed as total achieved/total
possible and not average(achieved/possible) for each problem set, then I
think this is the fairest way to interpret "dropping the lowest", at least
from the POV of the student.

I'll take your word for it, especially since you said that most of the
weighting is the same. Still, if you drop a larger chunk, it's not
really giving everybody the same total possible, so you potentially end
up grading some students more harshly for attempting more.

However, having taught at several levels, there's very likely no
completely "fair" way to grade, anyway.
 
B

bj

if your data is in columns A and B
in column C1 enter 1 and copy down to the bottom of the column
in d1 enter
=if(C1=0,0,(sum(A:A)-A1)/(Sum(B:B)-B1))
copy down to the end of your data.
find the max in D
enter in C next to the max 0
refind the max in D
enter in C next to the new max 0
refind the max in D
enter in C next to the new max 0
The three rows with 0s in C are the three you should eliminate if you want
to eliminate three.
The score was the third max in D.

I think that you can eliminate them in sequence and have the right answer.
I would check it against the method you already have.
The value that was
 
B

bj

I screwed up, I simplified the equation I had tried and simplified it to much

Please change the equation in D1 to
=if(C1=0,0,(sumproduct(($A$1:$A$100),$C$1:$C$100)-A1),/(Sumproduct($B$1:$B$100,$C$1:$C$100)-B1))
 
M

Myrna Larson

I'm sure there is NO worksheet formula that will achieve the OP's stated
goal.

I set up a scenario with a total of 10 exams and 3 to be dropped. The number
of questions on each exam varied between 10 and 50, the score (% correct)
varied between 30% and 100%.

I tried the following 6 methods

a. Discard those with the smallest (absolute) number of correct answers,
regardless of total number of questions, i.e. an exam with 7 / 10 correct
would be discarded, an exam with 8 / 50 would be kept. (Before you scoff,
see below).

b. Discard those with the largest number of wrong answers (again, absolute
number, not %), regardless of total number of questions. With that approach,
7/10 has 3 wrong and would be kept, 8/50 has 42 wrong and would be
discarded.

c. Discard those with the lowest scores in percent 8/50 (16%) discarded,
7/10 (70%) kept.

d. For each exam, calculate number correct - number wrong, discard the
lowest 3. 7-3=4, keep; 8-42=-34, discard.

e. For each exam, calculate the change in score (using the method the OP
describes) if this exam is dropped. Discard those that produced the biggest
(positive) improvement in score. For an "uncomplicated" method, this seems
to me to be the most likely to be correct.

f. Brute force: List the 120 combinations of 7 exams selected out of the 10,
and for each of them, calculate the score based on those 7 exams only. For
this, I used the (rather long) VBA macro that I developed several years ago
to list combinations.

Conclusions:

1. Without question, the 6th method, "f", does what the OP says he wants,
and is therefore gives the "correct" answer.

2. Comparing the results, typically one or more of the first 5 methods also
gave the correct answer. If there was only one, most often it was (b): i.e.
discard exams with the greatest number of wrong answers. But sometimes it
was (e).

3. I found instances in which NONE of the first 5 methods were correct. In
this case, the difference was typically 1.5% or less.

4. Given points 2 and 3, it would seem that one must use method (f).

5. Method (f) can't be accomplished with a worksheet formula using only
built-in functions: there is no formula that tabulates combinations (there's
one that gives you the number of combinations, but not their make-up). A VBA
macro would be needed.

6. IMO, this isn't worth the effort involved, if it would even be possible
for the OP to do. It would be horribly slow to calculate for a large class
and/or a large number of exams unless you coded it in, say VB or C and
created an DLL add-in

All of that said, I'm not sure the logic here is correct: In calculating the
score as total number correct divided by total questions, as though it was
just one large exam, one seems to be saying that the way the questions are
organized into exams is not relevant. But in making the discards, you are
discarding an entire exam -- right and wrong answers -- so the grouping IS
influencing which questions are discarded and which are kept.

If the grouping of questions into exams is truly irrelevant, maybe a better
approach would be to discard, say, 10% of the answers. Remove wrong answers
first, then enough correct answers to make the required 10%. Anybody who
started out with a score of 90% or better would end up with 100.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top