Functional Principles and Design Decisions for PRNGD.
=====================================================

PRNGD has been designed to act as a /dev/urandom replacement. It features
an EGD compatible socket interface, so that it can be used instead of EGD,
which is a /dev/random replacement.
In the following I want to explain the design properties of PRNGD, leading
to its strong and weak points.

- PRNGD shall always return random bytes:
  * EGD collects entropy into a pool by calling programs and reading its output.
  * Other processes read random bytes from EGD emptying the pool and EGD
    refills by calling more processes. If the random bytes are read faster
    than EGD can refill, EGD will not return random bytes until the pool is
    refilled.
    This makes EGD unusable if you have a large number of processes requiring
    entropy (e.g. inetd started processes like imap/pop daemons).
  * PRNGD uses a _pseudo_ random number generator to generate the random bytes.
    Thus it can never run out of stuff and will always return random bytes.
    On the other hand, the random bytes generated are not truly random
    (actually, those generated by EGD also are not truly random) and there
    is a risk involved that by sucking lots of entropy from the daemon an
    attacker might guess the contents of the random pool and break your keys.
  * This potential risk cannot be avoided, it is present by design, as only
    a _pseudo_ random number generator can avoid the problem of running out
    of entropy. /dev/urandom faces the same problem.
    Only a hardware RNG using thermal noise or radioactive decay can generate
    truly random bytes.
    See below on how PRNGD tries to minimize this risk.
- PRNGD should be low on resource usage:
  * EGD is written in PERL and hence allows easy porting, but it forces a
    perl interpreter to be running. From my experience this is eating up
    resources.
  * PRNGD is written in C, using the OpenSSL built in PRNG. Most activities
    are performed using system or library calls and trying to avoid
    spawning external processes. It will never spawn more than one process
    at a time.
    On a 3 year old HP-UX box, PRNGD tends to eat around 10-20 Minutes of CPU
    time per month, depending on the amount of entropy requested (and hence
    the amount of operations to be performed inside the PRNG). The memory
    footprint is around 100K.
- PRNGD should be robust against system malfunctions:
  * EGD sometimes tend to run out of entropy and does not refill.
    I don't speak PERL, so I am not completely sure, but it seems that EGD
    records "failure" of started gathering processes and does not use them
    any longer. If the system runs out of memory or out of processes, no
    gatherers can be started and all gatherers are disabled.
    (This is just my theory, I don't speak perl, as stated above, and don't
    have a clue on how to debug EGD.)
  * PRNGD will always use the same gatherer processes, regardless whether they
    fail at any time or not. This way a transient resource shortage is simply
    ignored and PRNGD will continue to work.
  * In case any gatherer fails, PRNGD will "kill -9" it after some time to
    not leave any processes hanging around.
- PRNGD should provide good random bytes:
  * There is an excellent paper by Peter Gutmann:
    http://www.cs.auckland.ac.nz/~pgut001/pubs/random2.pdf
    Read it!
  * PRNGD uses the OpenSSL built in PRNG with the limitations listed in
    Peter Gutmann's paper, so we employ additional measures to avoid them.
  * On startup, PRNGD tries to seed its internal pool as good as possible
    by reading back its saved entropy state and calling all gatherers.
    (The entropy state is saved at shutdown time by retrieving random bytes
    from the PRNG, so that it does not reveal information about the internal
    state bits. It is fed back as coming from untrusted source like any other
    input.)
  * Whenever entropy is requested, PRNGD will request 1024 bytes from the
    PRNG pool (=one time the pool size), to completely mix around the pool.
    This way, all bits retrieved are influenced by all bits in the pool.
    An additional "random" amount of bytes is retrieved to move the state
    pointer inside the PRNG to a new "unknown" position. These bytes retrieved
    are thrown away.
    Then the amount of random bytes requested is retrieved.
    Using these techniques, the random bytes finally returned are always
    from a different position inside the pool and all bits off the pool
    have influenced the result. In between, random bits have been thrown
    away, so than an attacker could never get a continous stream of bytes
    from the internal PRNG. This should make it harder for an attacker to
    guess the internal state (difficult enough, as you only get back 256 bytes
    = 2048 bits at a time, leaving 8192-2048=6144 unknown states).
  * Whenever entropy is added to the pool, the bytes are changed a little
    bit by some "obfuscator mask", so that nobody can add its own "chosen
    entropy" using the EGD entropy submit function. (The actual effect is
    probably small.)
  * PRNGD uses the following seeding:
    + Quite often (by default around every 17 seconds), a seed_stat()
      is performed by stat()ing a file or directory like /etc/passwd,
      /tmp, ... which is changed or accessed very frequently. This will
      only give a very small amount of bits every time, but every bit helps :-)
    + Less frequently (by default around every 49 seconds), an external
      gathering process is spawned (similar to what EGD does, but in the case
      of PRNGD the frequency is not related to the retrieval of random bytes).
      The output of the process is mixed into the pool.
    + The exact schedule is not fixed, but it depends on the intervals given
      above (default 17 and 49 seconds). When PRNGD is idle, after the shorter
      interval (here 17 seconds), a seed_stat() is performed. The external
      gatherer is started, if more than 49 seconds have passed since the
      last gatherer was started. Since 49 cannot be diveded by 17, the
      external gatherer is not spawned with a frequency of 49 seconds, but
      with some uncertainty.
      To further increase this uncertainty, this decision is performed after
      the select() call, which will also be triggered by external processes
      communicating with PRNGD...
    + Whenever the call to a gatherer process is finished, additional bits are
      mixed into the pool by "internal seeding". Internal seeding is using
      cheap system calls to times(), gettimeofday(), getpid(), getrusage()
      where available. Each of these calls will not provide much entropy
      (only some microsecond values are uncertain with respect to granularity
      etc on a ntp-synchronized host, the resource usage will be quite static
      etc), but every bit helps...
