The current approach of mass quarantining is to “flatten the curve.” However, learning about how the virus has spread and how it can return with the eventual restoration of our economy is something that is still blurry. Recent work by Abbott Labs (among others) shows that shortening testing times and mass-producing testing kits at affordable prices look promising.
However, despite the advancements by Abbott Labs, it is unattainable to test everyone in America. As of today, April 5th, we have tested, on average, one in every two hundred Americans. This can be compared to a test rate like South Korea. The ramp-up in testing has not allowed moving closer to reopening our economy.
Some have proposed the idea of checking for antibodies. This test would suggest immunity to the virus because of a prior infection. The logic behind this is that people that have these antibodies can safely return to work and take on the critical tasks needed to restart the economy. Nonetheless, recent news from across Asia warns us that patients previously diagnosed with COVID-19 are being readmitted to hospitals after testing positive for the virus again.
So as it stands, our current approach to mass quarantining from what the media outlets have predicted to be up to twelve-months is not only slow but is also pushing us down a path of economic damage. If that continues, this may be difficult to recover from. Scaling up and developing new methods of testing that check for antibodies, while advantageous, will not be by itself enough to reopen our economy.
An Aggressive Data-Vision Approach is Needed
An aggressive data-driven approach to understand how COVID-19 is spreading should be suggested. Based on these insights, we can demarcate safe zones where seminal economic activity can be reinstituted with minimal risk. There are two aspects of this approach:
- We can develop more in-depth insights into how the virus is spreading. We must acknowledge that mass quarantining alone is not the best approach.
- Based on the insights we develop, we may be able to open parts of our country once again, with a measure of confidence based on the data.
Ultimately, the solution boils down to data and computation problems. Imagine if we took the phone numbers of everyone infected with COVID-19 (of course, using an anonymized identifier rather than the actual phone numbers for protecting the privacy of folks involved). Then, using cell tower data to gather movement of those individuals based on their smart-phones, we will perform location and time-based searches. This would determine who might have come in contact with infected persons in the last forty-five days. Then, we will algorithmically place the search results dataset into bands based on the degree of overlap (time and location). This algorithm will be able to eliminate instances of location proximity where there is minimal chance of spread for example, at a traffic intersection. Conversely, this algorithm will accord a higher weight to location proximity based on where there is a bigger chance of the virus spreading. For example, at a coffee shop or workplace. All these factors will lead to the calculation of a risk factor. Individuals that meet the high-risk criteria will be notified. Any individual who receives the notification will be instructed to self-quarantine immediately. We can go further and penalize them if they don’t follow the suggestion, using the cell phone data. These individuals should be sent a self-test kit on a priority basis.
If these individuals test positive, their immediate family would then receive instant notification to also self-quarantine. The location in which this individual came into contact with the COVID-19 infected patient that initiated this search will be notified as well. If they test negative, we will still learn a vital data point is how the virus is spreading. These learnings, including the test results, will be fed into a continuously retraining machine learning algorithm. This algorithm will keep track of the trajectory of an infected person and common intersection locations. Additionally, this algorithm will also be able to account for an infected person being quarantined, thus neutralizing a virus carrier from the mix. In summary, this algorithm is akin to performing deep automated contact tracing at a level that cannot be matched by armies of volunteers.
Another important byproduct of the trained algorithm is the automatic extraction of “features”. In machine learning, a feature is an individual measurable property or characteristic of a phenomenon being observed [1]. For example, the algorithm will observe that many people are becoming infected, without coming in direct contact with an already infected person. Based on observing millions of such data points, it can, on its own, identify discriminating features such as an infected mailman route and common meeting areas that include certain surfaces like metals where coronavirus can remain active for days.
Using a continuously retraining algorithm, we can start to open parts of the country where the threat of spread is low. Any discovery of a COVID-19 case in a low-risk area will trigger the actions mentioned above and will flow back as input to training. It should be evident that the dataset and algorithm described above is computationally challenging. We are talking about recursive data searches through a dataset comprised of millions of citizens and a continuously learning algorithm with potentially billions of hyperparameters.
But Hasn’t This Approach Already Been Used in Other Countries like Taiwan and Singapore?
There is no question that location tracking capabilities have been highly effective in controlling the spread of coronavirus. In Taiwan and Singapore, location tracking technologies were used very early in the outbreak and mainly used for surveillance. In Korea, officials routinely send text messages to people’s phones alerting them on newly confirmed infections in their neighborhood — in some cases, alongside details of where the unnamed person had traveled before entering quarantine. Based on my research, these countries did not rely on big data and deep learning techniques to derive insights from the data. In the case of Taiwan and Singapore, the dataset of infected persons is not large enough for such an analysis.
Summary
The U.S. Government has broad authority to request personal data in the case of a national emergency like the Coronavirus. In the United States, phone companies such as AT&T and Verizon have extensive records on their customer’s movements. However, it does not appear that we are leveraging the large body of people’s movement data to combat coronavirus. According to a recent Washington Post story, “AT&T said it has not had talks with any government agencies about sharing this data for purposes of combating coronavirus. Verizon did not respond to requests for comment.”
The goal of this post is to engender a collaborative discussion with experts in big data, ML and medicine. Hopefully, there are efforts already underway based on a similar or better idea. Please send your comments via twitter @vlele.