calculating probabilty of x number of successes with multiple tria

M

mk9999999

how do i calculate the probability of a certain number of successes when
there are multiple trials each with different probabilities e.g.

Trial P (of success)
1 0.2
2 0.4
3 0.9
4 0.6
5 0.5
6 0.4
7 0.7
8 0.3
9 0.1
10 0.5
11 0.6
12 0.7
13 0.9
14 0.1
15 0.2
16 0.2
17 0.2
18 0.5
19 0.6
20 0.7
21 0.9

How would i calculate the probability of 10 successes for example?

I could calculate it manually, but there must be a quicker way of doing it
using excel?

cheers
 
J

Jerry W. Lewis

You have not provided enough information to answer the question. What
follows assumes that the trials are independent.

There are 352716=COMBIN(21,10) different ways to get 10 successes from 21
trials. You could enumerate them and sum the probability of each. For
example, the probability that the first 10 trials are successes and the rest
failures would be 3E-09, as calculated by the array formula
=PRODUCT(B2:B11)*PRODUCT(1-B12:B22), assuming that the stated probabilities
are in B2:B22. Of course this approach would be exceptionally tedious and
time consuming.

If it were my homework assignment, I would review properties of probability
generating functions
http://en.wikipedia.org/wiki/Probability_generating_function
and use 0.05=BINOMDIST(10,21,HARMEAN(B2:B22),FALSE) as a crude ball-park
estimate to ensure that I had done the work correctly

Jerry
 
J

Jerry W. Lewis

Sorry, I meant AVERAGE instead of HARMEAN for the ballpark calculation.

Jerry
 
M

mk9999999

Hi jerry,

thanks for your help - i forgot to mention that the trials are independent
of each other.

your methods are all good but unfortunately i need an exact answer rather
than an estimation. if this is not possible in excel would it be possible to
do in any other porgrams?

cheers
 
J

Jerry W. Lewis

The generating function approach does give the exact answer, but is not
easily implemented in Excel. You could do the generating fuctions manually,
or use a program that can do symbolic math, such as Maple, Mathematica, or
the open source package Maxima http://maxima.sourceforge.net/index.shtml

Jerry
 
J

joeu2004

There are 352716=COMBIN(21,10) different ways to get 10 successes from
21 trials. You could enumerate them and sum the probability of each.
[....]
Of course this approach would be exceptionally tedious and time consuming.

Only took less than 63 milliseconds for a VBA UDF to run through all
combinations. Took less than 2.4 seconds for the UDF to sum the
probability of exactly 10 successes over all combinations. Of course,
YMMV depending on the speed of your system. Took 5-10 minutes to
write the VBA UDF.

To the OP (mk9999999): Is this a programming assignment or an
assignment in statistics or probability?
 
J

Jerry W. Lewis

I agree that modern computer exection time is quite fast, but for me the real
issue would be the time required to program and validate a routine that would
run through all the combinations. That is the beauty of the generating
function approach.

Jerry

joeu2004 said:
There are 352716=COMBIN(21,10) different ways to get 10 successes from
21 trials. You could enumerate them and sum the probability of each.
[....]
Of course this approach would be exceptionally tedious and time consuming.

Only took less than 63 milliseconds for a VBA UDF to run through all
combinations. Took less than 2.4 seconds for the UDF to sum the
probability of exactly 10 successes over all combinations. Of course,
YMMV depending on the speed of your system. Took 5-10 minutes to
write the VBA UDF.

To the OP (mk9999999): Is this a programming assignment or an
assignment in statistics or probability?
 
J

joeu2004

I agree that modern computer exection time is quite fast, but for me the real
issue would be the time required to program and validate a routine that would
run through all the combinations.  That is the beauty of the generating
function approach.

Yes, stick with what you do best.

As for the program development time: as I said, 5-10 minutes.
Validation comes from scaling down to a tractable problem that can be
solved easily by hand. That was the lion's share of the 5+ minutes.
As trivial as the programming seems to be, I like to step through or
add debug prints to be sure that everything is copacetic. As it turns
out, my only error was in my "hand" calculations -- probably a typo on
the calculator.

But I agree that a mathematical solution often provides greater
insight.

My real point was: the OP might consider a programming solution that
does indeed enumerate all combinations exhaustively, if that is
acceptable and easier for him/her. You dismissed the possibility
based on incorrect presumptions ("exceptionally tedious and time-
consuming").
 
J

joeu2004

Took less than 2.4 seconds for the UDF to sum the
probability of exactly 10 successes over all combinations.  Of course,
YMMV depending on the speed of your system.  Took 5-10 minutes to
write the VBA UDF.

Reduced to about 0.9 seconds with a "tedious", but obvious
optimization. Makes the program look less "elegant". Add 2-3 minutes
to the development time ;-).
 
J

Jerry W. Lewis

Yet a 3rd way to skin this cat, using renewal theory instead of either
enumeration or the probability generating function.

Jerry
 
J

Jerry W. Lewis

Since Lori has already done the OP's homework assignment, I am curious how
you implemented an enumeration approach that was so fast and could be
programmed so quickly. My enumeration program required the better part of a
minute to run, and was tedious to program.

Jerry

joeu2004 said:
There are 352716=COMBIN(21,10) different ways to get 10 successes from
21 trials. You could enumerate them and sum the probability of each.
[....]
Of course this approach would be exceptionally tedious and time consuming.

Only took less than 63 milliseconds for a VBA UDF to run through all
combinations. Took less than 2.4 seconds for the UDF to sum the
probability of exactly 10 successes over all combinations. Of course,
YMMV depending on the speed of your system. Took 5-10 minutes to
write the VBA UDF.

To the OP (mk9999999): Is this a programming assignment or an
assignment in statistics or probability?
 
J

joeu2004

Since Lori has already done the OP's homework assignment

I really wish she hadn't. I was avoiding posting details myself. But
now that the cat is out of the bag.....

I am curious how
you implemented an enumeration approach that was so fast and could be
programmed so quickly. My enumeration program required the better part of a
minute to run, and was tedious to program.

Well, I might simply have a much faster system. For example, Lori
states that iandjmsmith's "xtextc3" function takes "a few
milliseconds" on her system. It takes about 70 microseconds(!) on
mine. I have to execute a loop of 10,000 calls to measure time that
small. Otherwise, time measurement is limited by the interrupt
frequency of the system clock -- probably on the order of 1-10
milliseconds.

Another possible factor: declaring all variables, especially loop
control variables. For example, when I strip my function down to
simply count the number of combinations (i.e. remove everything except
"cnt=cnt+1" in the innermost loop), the function takes about 250
milliseconds without declarations, but only about 60 milliseconds with
declarations. That 4X factor is pretty consistent on my system. When
I rewrote my function in its original form (it had morphed as result
of improvements), I inadvertently omitted the declaration for "i",
which is used as a control variable in the innermost loop. Without
the declaration, the function takes about 8.4 seconds on my system,
instead of about 2.2 seconds, which is what I had measured before and
now again, having added the declaration.

The function "prob10" below is the original function that I slapped
together in 5-10 minutes. Yes, it is tedious to repeat the several
similar statements. You will notice that the format is not pretty.
For example, no indentation. I also took the shortcut of using Timer
instead of getTickCount because I did not want to take the time to
find where I recorded how to declare getTickCount. (I better not time
these functions across midnight <wink>.) I should note that I have
written programs of this form many times over the years. So the
structure is very familiar; the skeletal code flows quickly from the
keyboard. The only part I struggle with is the VBAisms for copying
ranges. I am still something of a VBA neophyte.

Clearly, "prob10" is an inefficient implementation, even for
enumerating all possibilities. As I noted, it takes about 2.2 seconds
on my system. If I remove the innermost i-loop (which computes the
"failure" part of the solution), the function takes about 550
milliseconds. (But of course, it produces the wrong answer.)

The function "xprob10" below removes "constant" calculations from the
innermost loop -- that is, calculations whose results are invariant in
more-inner loop. Now, __that__ was a tedious change. But again, my
style of quick-and-dirty programming made it fast to write. Rather
than coding from top to bottom, I code similar statements all at the
same time. Not only does it make it fast to repeat, but also it makes
it easy to validate correctness and catch typos at the outset, so it
is more likley to work the first time. This function takes about 250
milliseconds on my system.

Of course, I am, by no means, holding either function up as either a
stellar example of programming technique or an elegant solution to the
problem. Normally, I would not publish such sloppy work. At a
minimum, I would take the time to indent. But more likely, I would
take more time to try to understand the concepts and program a truly
elegant solution (like "xtestc3"), if I can.

I am merely responding to your request to see how I could throw a
function together so quickly, which albeit not a refined
implementation, is an adequate solution insofar as it would have
allowed the OP to answer his/her question in less time than it took to
post and wait for a helpful, if not dispositive response. At the very
least, it would provide the OP with a way of validating a better,
perhaps more-clever implementation that might be worthy of turning in
with the assignment.

HTH.


' first enumeration implementation.
' takes about 2.2 seconds on my system (YMMV).
' computes probability of exactly 10 successes in 10 or more trials
' with varying independent probabilities of success in "rng"

Function prob10(rng As Range) As Double
Dim p As Variant
Dim startTime As Double, endtime As Double
Dim n As Long, cnt As Long, i As Long
Dim t1 As Long, t2 As Long, t3 As Long, t4 As Long, t5 As Long
Dim t6 As Long, t7 As Long, t8 As Long, t9 As Long, t10 As Long
Dim succ As Double
Debug.Print
Debug.Print "----- prob10 "; Time
startTime = Timer
p = rng.Value
n = UBound(p, 1)
If n < 10 Then Exit Function
For t1 = 1 To n - 9
For t2 = t1 + 1 To n - 8
For t3 = t2 + 1 To n - 7
For t4 = t3 + 1 To n - 6
For t5 = t4 + 1 To n - 5
For t6 = t5 + 1 To n - 4
For t7 = t6 + 1 To n - 3
For t8 = t7 + 1 To n - 2
For t9 = t8 + 1 To n - 1
For t10 = t9 + 1 To n
cnt = cnt + 1
succ = p(t1, 1) * p(t2, 1) * p(t3, 1) * p(t4, 1) * p(t5, 1) * _
p(t6, 1) * p(t7, 1) * p(t8, 1) * p(t9, 1) * p(t10, 1)
For i = 1 To n
If i <> t1 And i <> t2 And i <> t3 And i <> t4 And i <> t5 And i <> t6
And _
i <> t7 And i <> t8 And i <> t9 And i <> t10 Then succ = succ * (1 -
p(i, 1))
Next i
prob10 = prob10 + succ
Next t10: Next t9: Next t8: Next t7: Next t6
Next t5: Next t4: Next t3: Next t2: Next t1
endtime = Timer
Debug.Print cnt & Format(endtime - startTime, " 0.000000 "); prob10
End Function


' second, "improved" enumeration implementation. (I use the term
advisedly.)
' takes about 0.250 seconds on my system (YMMV).
' computes probability of exactly 10 successes in 10 or more trials
' with varying independent probabilities of success in "rng"

Function xprob10(rng As Range) As Double
Dim p() As Double, q() As Double
Dim startTime As Double, endtime As Double
Dim n As Long, cnt As Long, i As Long
Dim t1 As Long, t2 As Long, t3 As Long, t4 As Long, t5 As Long
Dim t6 As Long, t7 As Long, t8 As Long, t9 As Long, t10 As Long
Dim fail1 As Double, fail2 As Double, fail3 As Double, fail4 As Double
Dim fail5 As Double, fail6 As Double, fail7 As Double, fail8 As Double
Dim fail9 As Double, fail10 As Double, fail As Double
Dim succ1 As Double, succ2 As Double, succ3 As Double, succ4 As Double
Dim succ5 As Double, succ6 As Double, succ7 As Double, succ8 As Double
Dim succ9 As Double
Debug.Print
Debug.Print "----- xprob10 "; Time
startTime = Timer
n = rng.Count
If n < 10 Then Exit Function
ReDim p(1 To n), q(1 To n)
For i = 1 To n: p(i) = rng(i): q(i) = 1 - p(i): Next
fail1 = 1
For t1 = 1 To n - 9
succ1 = p(t1): fail2 = fail1
For t2 = t1 + 1 To n - 8
succ2 = succ1 * p(t2): fail3 = fail2
For t3 = t2 + 1 To n - 7
succ3 = succ2 * p(t3): fail4 = fail3
For t4 = t3 + 1 To n - 6
succ4 = succ3 * p(t4): fail5 = fail4
For t5 = t4 + 1 To n - 5
succ5 = succ4 * p(t5): fail6 = fail5
For t6 = t5 + 1 To n - 4
succ6 = succ5 * p(t6): fail7 = fail6
For t7 = t6 + 1 To n - 3
succ7 = succ6 * p(t7): fail8 = fail7
For t8 = t7 + 1 To n - 2
succ8 = succ7 * p(t8): fail9 = fail8
For t9 = t8 + 1 To n - 1
succ9 = succ8 * p(t9): fail10 = fail9
For t10 = t9 + 1 To n
fail = fail10: For i = t10 + 1 To n: fail = fail * q(i): Next
xprob10 = xprob10 + succ9 * p(t10) * fail
cnt = cnt + 1
fail10 = fail10 * q(t10)
Next t10
fail9 = fail9 * q(t9)
Next t9
fail8 = fail8 * q(t8)
Next t8
fail7 = fail7 * q(t7)
Next t7
fail6 = fail6 * q(t6)
Next t6
fail5 = fail5 * q(t5)
Next t5
fail4 = fail4 * q(t4)
Next t4
fail3 = fail3 * q(t3)
Next t3
fail2 = fail2 * q(t2)
Next t2
fail1 = fail1 * q(t1)
Next t1
endtime = Timer
Debug.Print cnt & Format(endtime - startTime, " 0.000000 "); xprob10
End Function
 
J

joeu2004

Function xprob10(rng As Range) As Double
[....]
fail1 = 1
For t1 = 1 To n - 9
succ1 = p(t1): fail2 = fail1
For t2 = t1 + 1 To n - 8
[....]

Not to beat a dead horse and certainly not to promote this approach
more than it deserves (which is simply in the genre of quick-and-dirty
solutions), but it occurred to me that in the OP's original posting,
10 successes in 21 trials was merely an example, and iandjmsmith's
excellent "xtestc3" function provides the generality to solve any K-in-
N problem. So I wondered how much more effort it would take to hack a
general solution along the same (inelegant) lines of generating all
combinations.

The structure of the "xprob10" function lends itself very nicely to a
recursive implementation, "probN" below, which finds the probability
of exactly K successes in any N trials. Initially, I was worried
about the execution time of a recursive solution; I was tempted to
avoid real recursion. But for the 10-in-21 case, although it does
take about twice as long, on my system, that is only about 460
milliseconds -- still not prohibitive. (Again, YMMV.)

The point again is not that this is an example of a good solution, but
that it should not dismisssed out-of-hand as an impracticable
solution, as long as the number of iterations is reasonable.


' hack implementation to compute probability of exactly K successes
' in K or more trials with varying independent probabilities of
success
' in "rng".
' For 10 in 21, takes about 0.460 seconds on my system (YMMV).

Private p() As Double, q() As Double
Private cnt As Long, n As Long
Private prob As Double


Function probN(rng As Range, nsucc As Long) As Double
Dim i As Long
Dim startTime As Double, endTime As Double
startTime = Timer
cnt = 0
prob = 0
n = rng.Count
If n < nsucc Then GoTo endit
ReDim p(1 To n), q(1 To n)
For i = 1 To n: p(i) = rng(i): q(i) = 1 - p(i): Next
If nsucc > 0 Then
Call probNloop(1, n - (nsucc - 1), 1, 1)
ElseIf nsucc = 0 Then
prob = 1
For i = 1 To n: prob = prob * q(i): Next
End If
endit:
probN = prob
endTime = Timer
Debug.Print nsucc & " in " & n & ": " & cnt & Format(endTime -
startTime, " 0.000000 ") & prob
End Function


Private Sub probNloop(ByVal tmin As Long, ByVal tmax As Long, ByVal
succ As Double, ByVal fail As Double)
Dim t As Long
Dim xfail As Double
For t = tmin To tmax
If tmax < n Then
Call probNloop(t + 1, tmax + 1, succ * p(t), fail)
Else
xfail = fail
For i = t + 1 To n: xfail = xfail * q(i): Next
prob = prob + succ * p(t) * xfail
cnt = cnt + 1
End If
fail = fail * q(t)
Next t
End Sub
 
J

Jerry W. Lewis

joeu2004 said:
Not to beat a dead horse and certainly not to promote this approach
more than it deserves (which is simply in the genre of quick-and-dirty
solutions), but it occurred to me that in the OP's original posting,
10 successes in 21 trials was merely an example, and iandjmsmith's
excellent "xtestc3" function provides the generality to solve any K-in-
N problem. So I wondered how much more effort it would take to hack a
general solution along the same (inelegant) lines of generating all
combinations.

I would still argue for the probability generating function as the simplest,
most elegant, and most general solution. In Maxima, only 2 lines of code are
required

p:[2,4,9,6,5,4,7,3,1,5,6,7,9,1,2,2,2,5,6,7,9]/10;
expand(product( 1-p+p*t, i,1,length(p)));

which returns a polynomial in t, where Pr(X=k) is the coefficient of t^k.
This also extends immediately to more complicated probability models as
described in the wikipedia article cited in my original post.

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top