Will we see the start of a pandemic in 2020 from a new virus that will go on to kill tens of millions of people in the next few years? This is the question we’re all asking (more or less explicitly) given the appearance of a new coronavirus in Wuhan.
One way to think about this is to adopt a Bayesian approach to the question. That is, be explicit about your base forecast (your “prior“) and then update that forecast for new information. Nate Silver has a great chapter on Bayesian reasoning in his 2012 book The Signal and the Noise. I’m going to follow his approach in this article. You can also find a really accessible explanation on Bayesian reasoning and a derivation of Bayes’s theorem at the LessWrong blog here and here.
Let’s call “A” the event that 2020 sees the start of a pandemic from a new virus that goes on to kill tens of millions of people over the following few years. We want to estimate the probability of A, given that we are observing a new virus in China today. But before we can do that, Bayesian reasoning requires us to be explicit about our prior, what was our forecast for event A before we found out about the virus in Wuhan?
Before we knew about Wuhan, if someone had asked us on January 1 to estimate the chance of a pandemic starting in 2020 from a new virus, what would we have said? That is, what would we have said the probability of A – “P(A)” – would be? Well, over the last 100 years there have been two bad pandemics emerge from new viruses: 1918 Spanish Flu, and HIV/AIDS. Both have caused tens of millions of deaths pretty quickly after first arising (within two years for the Spanish Flu and within two decades, I think, for HIV/AIDS). I think that’s all. I’m not going to count Smallpox, Measles etc – all of which have killed hundreds of millions over decades, because I’m talking about new viruses that rise up and very quickly reach a pandemic level. Neither am I going to talk about Marburg, Ebola, Zika, MERS, SARS, or any of the other new viruses of the last 100 years, because none of them has killed tens of millions of people in a global pandemic. I’m trying to look at the chances of the thing happening that we all fear.
So, given there have been two viruses in the last 100 years to reach bad pandemic levels, let’s assign our prior probability – before knowing about Wuhan – at 2%. We’d say P(A) = 2% (“the probability of a bad pandemic arising this year is 2%”). This would be a sensible estimate given our past experience.
Now, of course we have a new event, “B”, which is “a new virus has appeared”. We now need to update our forecast for “A”. Before Wuhan our hypothesis of a bad pandemic starting in 2020 was forecast to be 2%. After Wuhan, our forecast for that hypothesis changes. But how? I’m going to follow Nate Silver’s method – so have a look at his book.
Firstly, we need to look at the chance of a new virus appearing (“B”) if our hypothesis (“A”) is true. We write this “P(B|A)” – the probability of B happening given that A is true. So, what is the chance that a new virus arises if 2020 sees the start of a pandemic from a new virus? Well that chance is 100%. You’re not going to see a pandemic from a new virus unless a new virus has appeared. So P(B|A) = 100%.
Secondly, we now need to ask the question, what is the chance of a new virus appearing somewhere in the world (event “B”), if our hypothesis “A” is false? That is, are there any circumstances in which a new virus can appear this year, but 2020 still not see the start of a pandemic from a new virus? And if so what is that probability?
Well, we know that new viruses arise a lot and don’t go on to become pandemics (I mentioned a few earlier: Marburg, Ebola, Zika, MERS, SARS). So, let’s just limit ourselves to coronaviruses, given that the Wuhan virus is a coronavirus. We have seen two other coronaviruses appear over the last twenty years where a global pandemic killing tens of millions of people wasn’t starting. That is, we have seen coronaviruses arise under circumstances where our hypothesis “A” is false. Therefore, let’s be really aggressive and say the chance of a new virus arising in a year when a global pandemic is not starting is only 33% (I think that’s incredibly aggressive and should be higher, given that the last two coronaviruses haven’t gone on to cause pandemics, but let’s be aggressive and push this chance low). So, P(B|~A) = 33%. That is, the probability of a new virus appearing given that 2020 is not the year a pandemic starts is 33%.
We now use Bayes’s Theorem to update our forecast for our hypothesis “A”. (Again, if you want to see an intuitive derivation of Bayes’s Theorem, have a look here). The formal expression of the theorem is:
New forecast for A = P(B|A) * P(A) / [(P(B|A) * P(A)) + (P(B|~A) * P(~A))]
Plugging in our assumptions, Bayes would say to update our forecast of a global pandemic starting in 2020 from 2% to about 6%.
This is really counter-intuitive, but Bayes is rigorous. When your prior is low, even a new event that seems frightening only moves the needle up a bit. In other words our forecast has tripled for a pandemic in 2020 from what it was at the start of the year, but it is still only a bit over 1 in 20.
Now that’s no reason for complacency. The thing about Bayesian reasoning is that you should continually update your priors as new information comes to hand. As we understand this new virus more we can update for factors like its rate of spread and its Case Fatality Rate. And as we do these updates we have to load in the new higher prior, not the old one of 2%. So the forecast could rapidly increase (“could”, not “will”). But right now in these early days when it is really hard to get a fix on relevant characteristics of the virus, our forecast should still be quite low.