Parameter-free COVID Model Based on Encounter Density Data

Qi-Jun Hong About me Twitter CDC Last updated: July 31, 2020 (Next update scheduled: August 3)
Previous Projections: May 25 May 29 Jun 02 Jun 04 Jun 05 Jun 08 Jun 15 Jun 19 Jun 22 Jun 25 Jun 29 Jul 03 Jul 06 Jul 10 Jul 13 Jul 16 Jul 20 Jul 23 Jul 27
Source code on GitHub
This is a personal project and these are my own views.
Creative Commons License
Covid19 Encounter Model by Qi-Jun Hong is licensed under a Creative Commons Attribution 4.0 International License.
Based on a work at https://github.com/qjhong/covid19.

My model projects that the current wave of COVID cases has peaked, with record high near 80,000 cases/day.

Fatality is increasing. Its 7-day average has already surpassed 1,000 deaths/day, and will continue to rise and stay above the level for a very long period, likely throughout August.

States at risk: FL, CA, TX, GA, OK, MS, NV, LA, TN, IL, NC, SC, and almost all southern, midwestern and western US...

Top 5 States by Daily New Cases Next Day August 01: FL(9130), CA(7645), TX(6868), GA(3494), AZ(2185)

Top 5 States by Daily New Cases in 5 Days August 05: FL(8506), CA(7264), TX(5965), GA(3457), TN(2208)

Top 5 States by Daily New Cases in 10 Days August 10: FL(7894), CA(6805), TX(5111), GA(3357), TN(2205)

Projection of the Next 20 Days

State Projection: AK AL AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX US UT VA VT WA WI WV WY
Daily new confirmed cases:

DNC

Total deaths:

DNC

Daily deaths:

DNC

Daily Tests and Daily New Cases

(Data source: the COVID Tracking Project)

DNC

Daily New Cases in 50 US States

(Data source: the COVID Tracking Project)

DNC

What is the idea?

today's "Daily New Confirmed Cases" + today's "Encounter Density" ==> today's newly infected Cases ==> next 2-3 weeks' "Daily New Confirmed Cases"

(Encounter Density D data source: Unacast's Social Distancing Scoreboard, which analyzes cell phone location data, counts "Human Encounters", defined as two cell phone devices that were in the same place at the same time, and then derives the probability and "Encounter Density".)

My model uses current "Encounter Density" D to predict future "Reproductive Number" R and "Daily New Confirmed Cases". This is the most fundamental idea and assumption in this model.

Why "Encounter" data?

Daily New Confirmed Cases data is "outdated". People who get confirmed today were infected days ago through "Human Encounter" with other contagious people, and it took days to develop symptoms, seek tests, and get confirmed (infected -> symptomatic -> tested -> confirmed). In other words, today's "Daily New Confirmed Cases" is outdated data and it can be inferred from past "Daily New Confirmed Cases" data + past "Encounter" data.

Encounter data is up-to-date. Typically yesterday's Human Encounter Density data is available online today. (Encounter Density D data source: Unacast's Social Distancing Scoreboard, which analyzes cell phone location data, counts "Human Encounters", defined as two cell phone devices that were in the same place at the same time, and then derives the probability and "Encounter Density".)

How does it work?

The strong correlation between R and D (D is shifted by ~22 days) is evident in this figure. While social distancing quickly brought down R, easing policy is slowly increasing R back above 1.

Using (1) R and D relation in the past as a training set, (2) future D as input, and (3) machine learning / regression, my model can predict future R, and ultimately future Daily New Cases.

DNC

Shown in red is "Daily Reproductive Number" R_d, which is obtained through fitting of existing "Daily New Confirmed Cases". By definition, if day 1 daily new cases in N, day 2 number will be N*R_d. Hence, the ultimate goal is to keep R_d under 1.

The black dots are Adjusted Encounter Density D_adj, shifted forward by ~22 days. These two curves are remarkably close. For example, R started to quickly decrease at around 3/20. This coincides with a sudden decrease of D at the end of Feb, ~20 days before. R reached bottom at ~4/15 and stayed at the level till ~5/15. This overlaps with low D between 3/20 to 4/20. The amount of shift is optimized to maximize overlap, and the value is determined as ~22 days. The values are normalized to pre-pandamic levels, so 1.0 means activity level before pandemic hit US.

DNC