Tamper-Proof Public Random Numbers

There are a number of reasons society benefits from the presence of a stream of random numbers (or entropy for generating them) that can’t be entirely influenced by any single party (and that the inability to influence can be mathematically proven). When you pull the lever (or nowadays, push the button) on a slot machine, you are trusting the casino to use actual random numbers (rather than choosing particular numbers to make winning less likely when betting more, or similar). When you buy a lottery ticket, you are trusting the operator that the numbers chosen aren’t chosen to ensure there isn’t a winner, or that the winner is a particular person. And when you sign a message with ECDSA (which is used in Bitcoin), you are trusting that the curve you’re using (secp256k1) didn’t have its parameters chosen in such a way that signatures can be forged. In summary, tamper-proof public random numbers give us provably-fair gambling and backdoor-free cryptography.

How can we actually use blockchains to create random numbers that nobody influences, though?

Amazingly, blockchains are so good at providing tamper-proof public random entropy, that they could reliably continue to do so (with some degradation to the quality of the randomness) even in the event that 100% of the hashpower is owned by a single malicious party, as long as the underlying hash function used by the blockchain hasn’t been compromised!

Note: the “random numbers” or “random entropy” that can be provided by a blockchain aren’t truly random (‘true’ randomness is provided by observing things like cosmic background radiation or radioactive decay): a PoW blockchain acts as a distributed pseudorandom number generator (PRNG), which generates, when used correctly, well-distributed apparently-random data. The power isn’t in this data being technically random, but that the data is created in such a way that a malicious party would be unable to significantly influence it.

Interestingly enough, while PoW miners do create the hashes of blocks, they actually don’t control the data in the hash to any significant extent. As soon as they find a hash that falls below the target, they publish the block, without concern for the remaining bits in the hash. If a miner wanted to influence a single extra bit in the hash (say, ensure that the last bit is a zero), they would be at a 50% disadvantage to all other miners, because they would have to, on average, throw away every other solution which was still valid. In fact, even if a single miner had 100% of the hash power, they are still not in significant control of the random numbers provided by the block hashes, because it’s computationally infeasible to (short of finding a vulnerability in the underlying hash function) find a 256-bit collision (it would, on average, require 57,896,044,618,658,097,711,785,492,504,343,953,926,634,992,332,820,282,019,728,792,003,956,564,819,968, or 5.78 x 1078 hashes). All they could reasonably do is stop mining so that they stop providing the random numbers, but they wouldn’t be able to produce random numbers that they fully control.

In traditional PoW (the type used by Bitcoin), there are six fields in the header of a block: version, previous block hash, merkle root hash, timestamp, nBits (scientific-notation-like difficulty), and nonce. The miner can influence some of these fields–they can choose the version, they can choose any timestamp within a particular range, and they choose the nonce. While traditionally the nonce is incremented during the mining process, a miner could conceivably choose a single nonce or small range of allowed nonces, and constantly change the timestamp and merkle root hash (by changing the coinbase transaction) in an attempt to find a block. In fact, the nonce only allows for 232 possibilities when mining (around 4GH), and the granularity of the timestamp is a second, so miners that mine >~4GH/s (nowadays, almost all of them) actually do need to attempt multiple merkle roots every second. The previous block hash isn’t useful in this discussion (since it’s already accounted for by the last block). The merkle root hash is also random (in a similar way to the block hash itself). A miner could influence some, but not all, of the bits in the merkle root.

The highest quality of random number generation involves taking a bunch of blocks (which, during normal network operation, are likely from different sources), and using their block hashes and merkle roots to create a seed which is used to seed a PRNG. PRNGs are state machines designed such that, given an initial configuration, can continually produce well-distributed ‘random’ numbers in a deterministic manner. Two identical PRNGs (the same algorithm and seed) will produce identical ‘random’ numbers. Of course, which blocks will be used and how exactly they will be used (what PRNG will be used, how the data from the blocks will be combined to create a seed, etc.) needs to be announced before any of the blocks are created, because another way to influence apparently-uninfluenced numbers is to choose how to treat the source of randomness after the randomness is already known. 

It’s important to note that, in the event of a single entity controlling all of the hashrate of a blockchain, they could conceivably influence the PRNG to a small degree–it wouldn’t be infeasable for them to, for example, ensure that the first number out of the PRNG was even (it would require twice the work, on average, of mining the block normally). But ensuring that the first 32-bit number out of the PRNG was exactly 0x82193FE7 would mean that only, on average, 1 in 232 valid blocks the malicious miner found could be used. If, however, even a tiny portion of the hashrate was controlled by a miner who wasn’t colluding with the malicious miner, the significant disadvantage of the attacker would mean that the attacker would need 232 times more hashing power than the sum of the non-malicious hashing power to have a 50% chance of successfully producing the intended affect on the PRNG, which would consume more than 4 billion times the electrical power which everyone expects the miner to consume–something someone would surely notice as the oceans began to boil and life on earth came to a screeching halt.

The random numbers produced by a blockchain aren’t great for every purpose, though. Some random numbers (such as those used as private keys) should, of course, not be public information. You could still use entropy from the blockchain to generate random numbers that need to stay private, but you would need to keep the details of which blocks were used and what PRNG you used private. You’re far better off simply wiggling your mouse around and typing on your keyboard to give your computer additional entropy to work with. Remember the flow-chart on the front page: there isn’t a trust issue in generating private keys (unless you don’t trust your machine, which still isn’t solved by using an external source of entropy). And even if you did signatures by hand to prevent a computer from ever seeing your private keys, you’d still be better off creating random numbers with a coin or a die.

The bottom line: Under normal operation, blockchains provide a source of public randomness which no single party can influence to any significant degree. Even when the majority or entirety of the hashrate is under the control of a malicious party, the entropy provided by the blockchain is still reasonably reliable for purposes where it would still be infeasible (given the laws of thermodynamics) for the malicious miner to fully influence the PRNG, as the entropy is still not entirely chosen by the miner.