When I started writing children’s books, they were for my own children. Since I never stop singing the praises of science, I wasn’t much concerned about how scientifically literate they would be. But how am I doing outside my own family? I don’t know! That’s where you come it 😁

# Tag: statistics

## When will we have a quantum computer? Never, with that attitude

We are quantum drunks under the lamp post—we are only looking at stuff that we can shine photons on.

In a recently posted paper, M.I. Dyakonov outlines a simplistic argument for why quantum computing is impossible. It’s so far off the mark that it’s hard to believe that he’s even thought about math and physics before. I’ll explain why.

Find a coin. I know. Where, right? I actually had to steal one from my kid’s piggy bank. Flip it. I got heads. Flip it again. Heads. Again. Tails. Again, again, again… HHTHHTTTHHTHHTHHTTHT. Did you get the same thing? No, of course you didn’t. That feels obvious. But why?

Let’s do some math. Wait! Where are you going? Stay. It will be fun. Actually, it probably won’t. I’ll just tell you the answer then. There are about 1 million different combinations of heads and tails in a sequence of 20 coin flips. The chances that we would get the same string of H’s and T’s is 1 in a million. You might as well play the lottery if you feel that lucky. (You’re not that lucky, by the way, don’t waste your money.)

Now imagine 100 coin flips, or maybe a nice round number like 266. With just 266 coin flips, the number of possible sequences of heads and tails is just larger than the number of atoms in the entire universe. Written in plain English the number is 118 quinvigintillion 571 quattuorvigintillion 99 trevigintillion 379 duovigintillion 11 unvigintillion 784 vigintillion 113 novemdecillion 736 octodecillion 688 septendecillion 648 sexdecillion 896 quindecillion 417 quattuordecillion 641 tredecillion 748 duodecillion 464 undecillion 297 decillion 615 nonillion 937 octillion 576 septillion 404 sextillion 566 quintillion 24 quadrillion 103 trillion 44 billion 751 million 294 thousand 464. *Holy fuck*!

So obviously we can’t write them all down. What about if we just tried to count them one-by-one, one each second? We couldn’t do it alone, but what if all people on Earth helped us? Let’s round up and say there are 10 billion of us. That wouldn’t do it. What if each of those 10 billion people had a computer that could count 10 billion sequences per second instead? Still no. OK, let’s say, for the sake of argument, that there were 10 billion other planets like Earth in the Milky Way and we got all 10 billion people on each of the 10 billion planets to count 10 billion sequences per second. What? Still no? Alright, fine. What if there were 10 billion galaxies each with these 10 billion planets? Not yet? Oh, fuck off.

Even if there were 10 billion universes, each of which had 10 billion galaxies, which in turn had 10 billion habitable planets, which happened to have 10 billion people, all of which had 10 billion computers, which count count 10 billion sequences per second, it would still take 100 times the age of all those universes to count the number of possible sequences in just 266 coin flips. Mind. Fucking. Blown.

Why I am telling you all this? The point I want to get across is that humanity’s knack for pattern finding has given us the false impression that life, nature, the universe, or whatever, is simple. It’s not. It’s really fucking complicated. But like a drunk looking for their keys under the lamp post, we only see the simple things because that’s all we can process. The simple things, however, are the exception, not the rule.

Suppose I give you a problem: *simulate the outcome of 266 coin tosses*. Do you think you could solve it? Maybe you are thinking, *well you just told me that I couldn’t even hope to write down all the possibilities—how the hell could I hope to choose from one of them*. Fair. But, then again, you have the coin and 10 minutes to spare. As you solve the problem, you might realize that you are in fact a computer. You took an input, you are performing the steps in an algorithm, and will soon produce an output. You’ve solved the problem.

A problem you definitely could **not** solve is to simulate 266 coin tosses if the outcome of each toss depended on the outcome of the previous tosses in an arbitrary way, as if the coin had a memory. Now you have to keep track of the possibilities, which we just decided was impossible. Well, not impossible, just really really really time consuming. But all the ways that one toss could depend on previous tosses is yet even more difficult to count—in fact, it’s *uncountable*. One situation where it is not difficult is the one most familiar to us—when each coin toss is completely independent of all previous and future tosses. This seems like the only obvious situation because it is the only one we are familiar with. But we are only familiar with it because it is one we know how to solve.

Life’s complicated in general, but not so if we stay on the narrow paths of simplicity. Computers, deep down in their guts, are making sequences that look like those of coin-flips. Computers work by flipping transistors on and off. But your computer will never produce every possible sequence of bits. It stays on the simple path, or crashes. There is nothing innately special about your computer which forces it to do this. We never would have built computers that couldn’t solve problems quickly. So computers only work at solving problems that can we found can be solved because we are at the steering wheel forcing them to the problems which appear effortless.

In quantum computing it is no different. It can be in general very complicated. But we look for problems that are solvable, like flipping quantum coins. We are quantum drunks under the lamp post—we are only looking at stuff that we can shine photons on. A quantum computer will not be an all-powerful device that solves all possible problems by controlling more parameters than there are particles in the universe. It will only solve the problems we design it to solve, because those are the problems that can be solved with limited resources.

We don’t have to track (and “keep under control”) all the possibilities, as Dyakonov suggests, just as your digital computer does not need to track all its possible configurations. So next time someone tells you that quantum computing is complicated because there are so many possibilities involved, remind them that all of nature is complicated—the success of science is finding the patches of simplicity. In quantum computing, we know which path to take. It’s still full of debris and we are smelling flowers and picking the strawberries along the way, so it will take some time—but we’ll get there.

## The power of simulation: birthday paradox

The birthday paradox goes… *in a room of 23 people there is a 50-50 chance that two of them share a birthday*.

OK, so the first step in introducing a paradox is to explain why it is a paradox in the first place. One might think that for each person, there is **1/365** chance of another person having the same birthday as them. Indeed, I can think of only one other person I’ve met that has the same birthday as me—and he is my twin brother! Since I’ve met far more than **23** people, how can this be true?

This reasoning is flawed for several reasons, the first of which is that the question wasn’t asking about if there was another person in the room with a specific birthday—any pair of people (or more!) can share a birthday to increase the chances of the statement being true.

The complete answer gets heavy into the math, but I want to show you how to convince yourself it is true by **simulating** the experiment. Simulation is programming a computer or model to act as if the real thing was happening. Usually, you set this up so that the cost of simulation is much less than doing the actual thing. For example, putting a model airplane wing in a wind tunnel is a simulation. I’ve simulate the birthday paradox in a computer programming language called Python and this post is available in notebook-style here. Indeed, this is much easier than being in a room with 23 people.

Below I will *not* present the code (again, that’s over here), but I will describe how the simulation works and present the results.

## The simulation

Call the number of people we need to ask before we get a repeated birthday *n*. This is what is called a random variable because its value is not known and may change due to conditions we have no control over (like who happens to be in the room).

Now we simulate an experiment realising a value for *n* as follows.

- Pick a random person and ask their birthday.
- Check to see if someone else has given you that answer.
- Repeat step 1 and 2 until a birthday is said twice.
- Count the number of people that were asked and call that
*n*.

Getting to step 4 constitutes a single experiment. The number that comes out may be *n = 2* or *n = 100*. It all depends on who is in the room. So we repeat all the steps many many times and look at how the numbers fall. The more times we repeat, the more data we obtain and the better our understanding of what’s happening.

Here is what it looks like when we run the experiment one million times.

So what do all those numbers mean? Well, let’s look at how many times *n = 2* occurred, for example. In these one million trials, the result 2 occurred 2679 times, which is relatively 0.2679%. Note that this is close to 1/365 ≈ 0.274%, which is expected since the probability that the second person has the same as the first is exactly 1/365. So each number of occurrences divided by one million is approximately the probability that we would see that number in a single experiment.

We can then plot the same data considering the vertical axis the probability of needing to see *n* people before a repeated birthday.

Adding up the value of each of these bars sums to 100%. This is because one of the values must occur when we do an experiment. OK, so now we can just add up these probabilities starting at *n = 2 * and increasing until we get to 50%. Visually, it is the number which splits the coloured area above into two equal parts. That number will be the number of people we need to meet to have a 50-50 shot at getting a repeated birthday. Can you guess what it will be?

Drum roll… **23**! Tada! The birthday paradox simulated and solved by simulation!

## But, wait! There’s more.

What about those leap year babies? In fact, isn’t the assumption that birthdays are equally distributed wrong? If we actually tried this experiment out in real life, would we get **23** or some other number?

Happily, we can test this hypothesis with real data! At least for US births, you can find the data over at fivethirteight’s github page. Here is what the actual distribution looks like.

Perhaps by eye it doesn’t look too uniform. You can clearly see 25 Dec and 31 Dec have massive dips. Much has been written about this and many beautiful visualizations are out there. But, our question is whether this has an effect on the birthday paradox. Perhaps the fact that not many people are born on 25 Dec means it is easy to find a shared birthday on the remaining days, for example. Let’s test this hypothesis by simulating the experiment with the real distribution of birthdays.

To do this, we perform the same 4 steps as above, but randomly sampling answers from the actual distribution of birthdays. The result of another one million experiments is plotted below.

And the answer is the same! The birthday paradox persists with the actual distribution of birthdays.

## Nerd sniping

The above discussion is very good evidence that the birthday paradox is robust to the actual distribution of births. However, it does not constitute a mathematical *proof*. An experiment can only provide *evidence*. So I will end this with a technical question for those mathematical curiosos out there. (What I am about to do is also called Nerd Sniping.)

Here is the broad problem: quantify the above observation. I think there is more than one question here. For example, it should be possible to bound the 50-50 threshold as a function of the deviation from a uniform distribution.

(Cover image credit: Ed g2s, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=303792)