Did CoVID leak from a lab? A Bayesian approach

Nick IngramStrategy4 Comments

Wuhan Institute of Virology

I’ve resisted the temptation to write about this, but I need to get something out. None of the push-back on the lab-leak theory is acknowledging how unusual it is for a new virus to appear in the only city in China with a BSL4 lab. Of all the large cities in China – all with wet markets I assume – how weird is it that the virus first appears in the one big city with a lab?

How weird? Well we can try to quantify how weird by looking at Bayes’ theorem.

I blogged about Bayes’ theorem last year here. Have a read about the logic of it there and also look at the other pages I link to for its derivations.

Now, let’s try a thought experiment. We are at the start of 2020. You tell me: “There’s a new coronavirus that’s appeared in a big city in China”. (By the way, let’s define “big city” as over 10m people – Wuhan has 11m). You ask me: “What’s the chance that that virus has escaped from a lab, given that there is only one such lab that deals with coronaviruses in China?”

That’s all you initially tell me. You don’t tell me that the virus has appeared in Wuhan, which has the lab. You just say, it’s appeared in a big city in China.

Well, with that much information, what’s the chance it leaked from a lab? What’s my “prior”?

Let’s call this “event A” (my prior). Event A is: “the coronavirus we have just seen in a large Chinese city is from a leak from the only lab in China with coronaviruses”. If that’s all the info you give me, I would say the probability that it’s from a lab leak is 2%. A 1 in 50 chance.

Why as high as 1 in 50? Well, if you’ve been keeping up with the media recently you would know that lab leaks are not rare. The SARS1 virus leaked a few times from labs in China and elsewhere over the years since its initial naturally caused outbreak. And the Brits had a bad foot and mouth outbreak in recent years that came from a lab leak. Some people argue that the 1977 flu, that looked identical to the 1950 flu (and was a descendant from the 1918 flu) was a lab leak in Russia. But there are other theories about that too. Anyway, my point is, lab leaks are not unheard of, so a 1 in 50 chance is reasonable.

Now. You then tell me a new piece of information. In Bayes’ language, that’s a new event. Let’s call it “event B”. You tell me: “the city that has the outbreak is Wuhan, which has the only lab in China dealing with coronaviruses”. Event B is therefore: “The city where the outbreak occurred is the city with the only lab”.

What does that new piece of information – that new “event” – do to my thinking? Well, I now have to “update my prior”. Technically I have to work out P(A|B). That is, “what is the probability that the virus is from a lab leak, given that the virus has appeared in the city with the only lab in China it can leak from?”

Well, (and I’m just going to put the maths out there, you can check my formula elsewhere), Bayes’ formula tells me how to update my prior. It says:

New forecast for A (my updated prior) =

P(A|B) = P(B|A) * P(A) / [(P(B|A) * P(A)) + (P(B|~A) * P(~A))]

So let’s unpack these terms:

  • “Event B|A” is the event that I observe a virus in a big city with the only lab that does coronaviruses under the condition that the virus has leaked from that lab. Well, what are the chances that if the virus has leaked from the lab that I will see it in the city in which the lab resides? I’d say 100%. So P(B|A) = 100%.
  • P(A) is my prior – the probability I first judged of the virus I see in a city being a leak from the lab. We’ve set that at 2%.
  • Event “B|~A” is interesting. It’s “I observe a virus in a big city with the lab, but the virus didn’t leak from the lab.” How would I estimate the probability of B|A. Well, there are 11 cities in China over 10m people. One of those cities got the virus. What are the chances that it was the city with the lab – given there was no lab leak – I’d say 1 in 11 (I assume they are all equally likely – they all have wet markets etc) – or 9%. So P(B|~A) = 9%
  • What’s the probability of ~A? Well I initially assumed that the probability of the observed virus being from a lab leak was 2%, so P(~A) is 98%.

Plug all these values for the terms above into the formula and you get an updated prior of 18.33%.

And this is my point. How weird is it that it shows up in Wuhan if it wasn’t a lab leak? Weird enough to increase my prior from 1 in 50 to almost 1 in 5.

Let me spell this out, because I don’t think the media quite gets just what a coincidence this is. Pushing my conservative prior up by a factor of 10 is pretty big.

That is, if you were to tell me that a new coronavirus has just appeared in a large city in China. And then you told me there’s only one lab in China actively studying coronaviruses. And then asked me to estimate the chance that the virus arose from a lab leak, I would say 2%.

But then, you tell me that the city where that virus has appeared is the city with that unique coronavirus lab, I would have to update my prior. And I would update it to 18%.

From a 1 in 50 chance to a 1 in 5 chance.

So – this is not a 50/50 chance, but the magnitude of the change is huge. And it should underscore what a big implicit assumption is being made by virologists who keep saying it has to be a natural origin.

Of course if you have a different prior to 1 in 50, you get a different updated result:

  • If your “prior” on the lab leak is 1 in 100, then your updated prior, when you find out the city is Wuhan should go to 10%. Still a huge move – and still a probability that you shouldn’t casually dismiss.
  • If your “prior” was more cynical – say a 1 in 20 chance that it’s from a lab leak (5%), then your update, as soon as you find out it’s Wuhan, goes to 37% – more than a 1 in 3 chance.

Hopefully this post shows why I think the conversation in the media, and even amongst virologists, has lacked a certain nuance. It really is very weird that Wuhan is where the virus first shows up if it wasn’t caused by a lab leak. I’m not saying it is a lab leak. I’m saying that casual dismissals too often fail to account for how weird it is.

4 Comments on “Did CoVID leak from a lab? A Bayesian approach”

  1. I think you mad an incorrect comparison.
    I assume you are using an annual probability of a Lab Leak(once in 50 years).
    What is the annual probability of a SARS type out break in China?
    Then you can scale down this probability to Wuhan.

    I think you have to start with an annual natural origin probability of China.
    Please let me know what you think?

  2. Interesting post. I wish I understood the mathematics of Bayes’ theorem and how to apply it, but unfortunately I don’t.

    I have a layperson’s question that to me seems central to the origins debate. It’s about the very different opinions people have on the likelihood of a coronavirus outbreak from an accident at a coronavirus research laboratory – which I think is related to the first prior in your post, and which you refer to near the end of your post in terms of people having different values for the prior.

    Opinions on lab risk seem to be so incredibly wide-ranging that I’m wondering how Bayes’ theorem would help resolve it. Or is it just a case of accepting that everyone has their own lab-risk prior and then everyone applies Bayes’ theorem and comes up with a different estimate for the probability that the origin was a lab-accident?

    I’m over-simplifying things, but imagine if there was consensus that a natural outbreak of 1,000+ coronavirus cases might happen in Wuhan 1 in 5,000 years. Then, if I think a 1,000+ case outbreak from a lab is also 1 in 5,000 years, that will make me 50/50 on the origins issue. But someone who thinks a lab accident outbreak is a 1 in 500-year event will favour the lab origins hypothesis. And someone else who thinks a lab accident outbreak is a 1 in 1-million-year event will favour the natural spillover hypothesis.

    In this case, for Bayes’ theorem to help us resolve the argument, do we need to take a few steps back and first look at the priors different people used to come up with their estimates of lab-risk – and reassess those priors?

    I’ve a related 10-minute read on this – nothing on Bayes’ theorem, just a layperson attempting to weigh natural spillover likelihood against the likelihood of a lab-accident origin. https://david-griffiths.medium.com/covid-19-origins-weighing-the-odds-28f1a012f396

    1. Hi David. You’re exactly right – it depends enytirely on how you build your priors. Lab leaks are common. Although a lab leak of a previously unknown virus hasn’t happened before. But if the WIV had these viruses it had collected and was cataloguing and researching on, there’s no reason to think that those viruses are in some way better protected from leaking out.

  3. Interesting stuff. Ok, so you have examined the probability of the market being infected by the lab.

    What is the probability you might calculate for the inverse condition? What is the probability it’s not from the lab, given the nature of the research at the physically proximate lab?

Leave a Reply

Your email address will not be published.