Forecast from black box: applied machine learning to the prediction of a pandemic COVID-19

Epidemic model with SIR model (from Susceptible, Infected, Recovered — that is, susceptible, infected, recovered) and its variations: SEIR (which divides the victims of the disease on “contact” and the infected people), SIRS (which takes into account the “shelf life” of immunity had been ill) and others. They differ mainly in what types of infections are better.

Such modeling can be called “modeling from the first principles” — you determine which factors are dealing with, determine the laws to which these factors are, pick up the odds and get a tool for prediction.

But in order for your prediction was good, one must know these values. In terms of the novel coronavirus, that with the knowledge we have is not very good: it is new, its an epidemic happened to us the first time and has not yet ended.

We don’t know for example what exactly have COVID-19 the basic factor of reproduction (R0) — how many people infect one patient. And this parameter for the model key. There are those who believe that the R0 of the coronavirus is four, there are those who believe that the two — but the deviation per unit leads to that the obtained curves are very different.

You can adjust these figures, of course, and that a lot of people involved, but the value of such, especially on a short distance, like weeks — doubtful. The same underlying factor of the playback is not an absolute metric, it is influenced by many external factors. Somewhere in quarantine and the coefficient should fall, but how much is impossible to say: in different countries it’s different, so clearly the effect of the quarantine can not be estimated. In addition, we do not know the exact number of cases (this is strongly tied, in particular, on the number of tests), the unknown number of asymptomatic carriers, and so on. And all this also affects what the model predicts.

I looked at the tools that type: has opened several scientific articles and quickly realized that trying to build a short-term forecast for the solution of differential equations is almost hopeless. Only one, in my opinion, of the participants of the competition, did there R0 every day a little change: his model is not the worst results but not the best.

Leave a Reply

Your email address will not be published.