מיכ×ל (מיקי) ×בידן said:
Although the chance of repetition two identical RAND() results
is like winning the Lotto - can one assure it won't happen ?
Not exactly.
KB 828795 claims to describe the algorithm and constants used by RAND()
(before XL2010). The claim is: "more than 10^13 numbers will be generated
before the repetition". I presume that "more than 10^13" is exactly
27,814,431,486,576 (about 2.78E+13), which is 30268*30306*30322.
So I guess one's "assurance" depends on your confidence in the correctness
of KBs in general, or at least KB 828795 in particular.
From my experience with other KBs, I must say that my confidence is less
than 100%. Many of the KBs that I have read about the numerical properties
of Excel algorithms contain technical errors, often material errors.
I can say with impunity that __if__ the algorithm and constants in KB 828795
are correct, each modulo expression for IX, IY and IZ runs through its
respective maximum sequence, not including zero, as long as Excel ensures
that IX, IY and IZ are not zero when the algorithm is seeded. Thus, the
algorithm will indeed produce about 2.78E+13 combinations of IX, IY and IZ.
I cannot say with impunity that when those factors are combined to produce
the pseudo-random number -- presumably IX/30269.0 + IY/30307.0 + IZ/30323.0
modulo 1 -- that produces 2.78E+13 unique floating point values, especially
taking the limits of IEEE 64-bit floating point representation and perhaps
Intel 80-bit floating point computation into account.
I believe that can be proved only by producing and sorting all 2.78E+13
pseudo-random numbers, then comparing them pairwise. I think that would be
computationally prohibitive on a single computer with an Intel CPU -- at
least, mine.
I did attempt to vet the algorithm and constants described in KB 828795 --
that is, to prove or disprove that they are correct. I was not successful.
My method was to generate a pseudo-random number with RAND() -- actually
several -- and try to reverse-engineer the factors IX, IY and IZ assuming
the constants 30269, 30307 and 30323 and the Wichman-Hill algorithm
described in KB 828795. I came very close; close enough to think that KB
828795 might be correct, but different enough to suspect that it might not
be. The difference in binary representations was large enough that I was
unable to explain it based on the numerical properties of alternative
floating point methods.
As an aside, I suspect that the algorithm described in a wiki article --
http://en.wikibooks.org/wiki/Statistics/Numerical_Methods/Numerics_in_Excel
-- is incorrect __if__ KB 828795 is correct about generating "more than
10^13" unique pseudo-random numbers. (Admittedly, that's a big "if".) The
wiki algorithm will not generate all 2.78E+13 combinations of IX, IY and IZ.
Each modulo expression will produce zero; and once that happens, the modulo
expression is stuck at zero. Of course, if/when that happens, an
implementation could self-correct by reseeding the PRNG. But depending on
the seed, the cycle can be very short. So I think the wiki algorithm is
unlikely. Every other description of the Wichman-Hill algorithm is similar
to KB 828795, at least functionally, if not in syntax.
(I hope my comments do not spark a diatribe about the weaknesses of the
Wichman-Hill(1982) algorithm. The weakness are well-known. That is why
XL2010 abandoned it.)
----- original message -----