From predictions to prescriptions: A data-driven response to COVID-19
https:
doi.org/10.1007/s XXXXXXXXXX
From predictions to prescriptions: A data-driven response
to COVID-19
Dimitris Bertsimas1,2 · Leonard Boussioux2 · Ryan Cory-Wright2 · Arthur Delarue2 · Vassilis Digalakis2 ·
Alexandre Jacquillat1,2 ·Driss Lahlou Kitane2 ·Galit Lukin2 ·Michael Li2 · Luca Mingardi2 ·
Omid Nohadani3 · Agni Orfanoudaki2 · Theodore Papalexopoulos2 · Ivan Paskov2 · Jean Pauphilet4 ·
Omar Skali Lami2 · Bartolomeo Stellato5 ·Hamza Tazi Bouardi2 · Kimberly Villalobos Ca
allo2 ·
Holly Wiberg2 · Cynthia Zeng2
Received: 14 September 2020 / Accepted: 16 December 2020
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC part of Springer Nature 2021
Abstract
The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult
decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing
measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat
the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics
of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level
clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach,
combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most
comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized
calculators to predict the risk of infection and mortality as a function of demographics, symptoms, como
idities, and la
values. Third, we develop a novel epidemiological model to project the pandemic’s spread and inform social distancing
policies. Fourth, we propose an optimization model to re-allocate ventilators and alleviate shortages. Our results have been
used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute
ventilators. At the policy level, they are cu
ently supporting safe back-to-work policies at a major institution and vaccine trial
location planning at Janssen Pharmaceuticals, and have been integrated into the US Center for Disease Control’s pandemic
forecast.
Keywords COVID-19 · Epidemiological modeling · Machine learning · Optimization
Electronic supplementary material The online version of
this article (https:
doi.org/10.1007/s XXXXXXXXXXcontains
supplementary material, which is available to authorized users.
� Dimitris Bertsimas
XXXXXXXXXX
1 Sloan School of Management, Massachusetts Institute
of Technology, Cam
idge, MA 02142, USA
2 Operations Research Center, Massachusetts Institute
of Technology, Cam
idge, MA 02139, USA
3 Benefits Science Technologies, Boston, MA 02110, USA
4 London Business School, London, NW1 4SA, UK
5 Operations Research and Financial Engineering,
Princeton University, Princeton, NJ, 08544, USA
Highlights
• This paper introduces a comprehensive data-driven
approach for the COVID-19 pandemic with goals to
inform the overall scientific community, to estimate the
epidemiological spread of the virus, to provide clinical
insights, and to support ventilator allocation decisions
for policy makers.
• To consolidate medical insights, a clinical database
on the disease is aggregated from available scientific
literature.
• To assess the risk of infection and mortality, we
provide personalized risk prediction models from
the electronic health records by leveraging machine
learning algorithms.
Published online: 15 Fe
uary 2021
Health Care Management Science XXXXXXXXXX:253–272
http:
crossmark.crossref.org/dialog/?doi=10.1007/s XXXXXXXXXX&domain=pdf
http:
orcid.org/ XXXXXXXXXX
https:
doi.org/10.1007/s XXXXXXXXXX
mailto: XXXXXXXXXX
• To forecast the progression of COVID-19 and evaluate
the impact of various social distancing policies, we
develop a dynamic epidemiological model.
• To inform operational decisions for government offi-
cials, our optimization model addresses surges in ven-
tilator demand through state-level and federal realloca-
tion.
1 Introduction
In just a few weeks, the whole world was upended by
the out
eak of COVID-19, an acute respiratory disease
caused by a new coronavirus called SARS-CoV-2. The virus
is highly contagious: it is easily transmitted from person
to person via respiratory droplet nuclei and can persist on
surfaces for days [22, 43]. As a result, COVID-19 has spread
apidly—classified by the World Health Organization as
a public health emergency on January 30, 2020 and as a
pandemic on March 11. As of November 2020, over 51
million cases and 1.2 million deaths have been reported
globally [20].
Given the uncertainty su
ounding the disease and
its treatment, healthcare providers and policy makers
have wrestled with unprecedented challenges. Hospitals
and other care facilities have faced shortages of beds,
ventilators and personal protective equipment—raising hard
questions on how to treat COVID-19 patients with scarce
supplies and how to allocate resources to prevent furthe
shortages. At the policy level, most countries have imposed
“social distancing” measures and other non-pharmaceutical
interventions to slow the spread of the pandemic. These
measures allow strained healthcare systems to cope with
the disease by “flattening the curve” [2] but also come
at a steep economic price [11, 32]. This trade-off has
prompted difficult decisions balancing public health and
socio-economic outcomes.
This paper proposes a comprehensive data-driven
approach to combat the COVID-19 pandemic. We lever-
age a
oad range of data sources, which include (i)
our own cohort-level data aggregating hundreds of clini-
cal studies, (ii) patient-level data obtained from electronic
health records, and (iii) census reports on the scale of the
pandemic. We develop an integrated approach spanning
descriptive analytics (to derive a macroscopic understanding
of the disease), predictive analytics (to forecast the near-
term impact and longer-term dynamics of the pandemic),
and prescriptive analytics (to support healthcare and policy
decision-making).
Our approach comprises four steps (Fig. 1):
– Aggregating and visualizing the most comprehen-
sive clinical database on COVID-19 as of May 2020
(Section 1). We aggregate cohort-level data on demo-
graphics, como
idities, symptoms and lab values from
160 clinical studies. These data paint a
oad picture
of the disease: identifying common symptoms, dispar-
ities between mild and severe patients, and geographic
disparities—insights that are hard to derive from any
single study and can orient future clinical research
on COVID-19, its mutations, and its disparate effects
across ethnic groups.
– Providing personalized indicators to assess the risk of
mortality and infection (Section 2). Using patient-
level data, we develop machine learning models to pre-
dict mortality and infection risk, as a function of demo-
graphics, symptoms, como
idities, and lab values. Using
gradient boosting methods, the models achieve strong
predictive performance—with an out-of-sample area
under the curve above 90%. These models yield person-
alized calculators that can (i) guide triage, treatment,
and care management decisions for strained health-
care systems, and (ii) serve as pre-screening tools fo
patients before they visit healthcare or testing facilities.
Cohort-level data
[clinical research]
Census-level data
[public reports]
Descriptive analyticsData Predictive analytics Prescriptive analytics Impact
Mortality/infection risk
(Section 2)
DELPHI-pred
(Section 3)
Ventilator allocation
(Section 4) Resource allocation
Social distancing policies
Triaging and treatment
Patient characteristics
(Section 1) Clinical understanding
Insights
Parameters
Parameters
Inputs
DELPHI-presc
(Section 3)
Patient-level data
[electronic medical record]
Fig. 1 Overview of our end-to-end analytics approach. We leverage diverse data sources to inform a family of descriptive, predictive and
prescriptive tools for clinical and policy decision-making support
254 D. Bertsimas et al.
– Developing a novel epidemiological model to fore-
cast the evolution of the disease and assess the
effects of social distancing (Section 3). We pro-
pose a new compartmental model called DELPHI,
which accounts for COVID-19 features such as
underdetection and government response. The model
estimates the disease’s spread with high accuracy;
notably, its projections from as early as April 3 have
matched the number of cases observed in the United
States up to mid-May and outperforms compara-
le methods during such period. We also provide
a data-driven assessment of social distancing poli-
cies, showing that the pandemic’s spread is highly
sensitive to the stringency and timing of mitigating
measures.
– Proposing an optimization model to support ventilato
allocation in response to the pandemic (Section 4).
We formulate a mixed-integer optimization model to
allocate ventilators efficiently in a semi-collaborative
setting where resources can be shared both between
healthcare facilities or through a central authority. In the
United States, this allows us to study the trade-offs of
managing the federal ventilator stockpile in conjunction
with inter-state transfers. Results show that limited
ventilator transfers could have eliminated shortages in
April 2020.
This work makes two key contributions. First, we
derive data-driven insights about the early stages of the
COVID-19 pandemic. Although some of the results should
e treated with caution when extrapolated beyond the
period spanning March to May 2020, these insights help
understand the clinical characteristics of the disease, predict
its mortality, forecast its evolution, and ultimately alleviate
its impact. Second, we provide a comprehensive roadmap
to guide short-term responses to new, and unforeseen
epidemics. The proposed approach involves four steps:
(i) gathering meta-data from early small-scale clinical
studies to derive a fast and
oad understanding of the
disease, (ii) applying predictive analytics based on patient-
level data to identify the drivers of the disease and
its mortality, (iii) using population-level data on cases,
hospitalizations and deaths to predict the macroscopic
evolution of the disease, and (iv) leveraging these models
for resource allocation optimization to alleviate the near-
term damage of the disease. A major feature of this approach
is to treat these different questions as interdependent
challenges, as opposed to a series of isolated problems.
Indeed, clinical decision-making depends directly on patient
inflows and available supplies, while resource planning and
government responses react to patient-level outcomes. By
combining various data sources into descriptive, predictive
and prescriptive methods, this paper proposes an end-to-end
approach to design a comprehensive and cohesive response
to the COVID-19 pandemic and future epidemics.
Ultimately, this paper develops analytical tools to inform
clinical and policy responses to the COVID-19 pandemic.
These tools are available to the public on a dedicated
website. 1 They have also been deployed in practice to
combat the spread of COVID-19 globally. Several hospitals
in Europe have used our risk calculators to support pre-
triage and post-triage decisions, and a major financial
institution in South America is applying our infection
isk calculator to determine how employees can safely
eturn to work. A major hospital system in the United
States, Hartford Healthcare, planned its intensive care unit
(ICU) capacity based on our forecasts, and leveraged ou
optimization results to allocate ventilators across hospitals
when the number of cases was rising. Our epidemiological
predictions are used by one of the largest pharmaceutical
companies, Janssen Pharmaceuticals, to design a vaccine
trial location selection strategy . They have also been one of
the top 5 models that are consistently incorporated into the
US Center for Disease Control’s forecasts [42] and its core
ensemble model.
2 Descriptive analytics: clinical outcomes
database
Early responses to the COVID-19 pandemic have been
inhibited by the lack of available data on patient outcomes.
Individual centers released reports summarizing patient
characteristics. Yet, this decentralized effort makes it
difficult to construct a cohesive picture of the pandemic.
To address this problem, we construct a database
that aggregates demographics, como
idities, symptoms,
laboratory blood