Predicting COVID-19 Cases Using LSTM Based on SARS Data: Methodology and Evaluation
This article investigates whether a short‑term time‑series algorithm, specifically an LSTM model trained on limited SARS data, can predict and assess COVID‑19 case numbers, describing data collection, model training, experimental validation, error analysis, and practical implications of the findings.
0x00 Declaration & Purpose The author states that the work is a purely academic exploration with limited data dimensions and no practical value, sourced manually from WHO, national health commissions, and news outlets.
Purpose To explore whether an algorithm can predict and evaluate epidemic trends using existing data, enabling quick assessment of the current situation.
0x01 Hypotheses Two hypotheses are proposed: (1) If a virus shares similar regional, climatic, and biological characteristics with SARS, an algorithm that predicts SARS cases can be applied to COVID‑19; (2) Define a threshold Δ for the ratio between actual and predicted cases to classify epidemic status (worsening, stable, or controlled).
0x02 Data Preparation SARS case data (03‑03‑17 to 03‑05‑30) from WHO and COVID‑19 case data (2020‑01‑15 to 2020‑01‑26) from Tencent News and the National Health Commission were manually collected. Only confirmed case counts were used as the target variable.
0x03 Algorithm Evaluation An LSTM (Long Short‑Term Memory) network built with Keras was employed, training on three‑day windows to predict the next day’s confirmed cases. Experiments were run on a 16‑core, 8 GB Alibaba Cloud instance.
0x04 Hypothesis Validation Using WHO SARS cumulative case numbers, the model was trained on the first 50 days and forecast the next 60 days. Predictions closely matched actual data, with average absolute errors of 5.7 % (mainland China) and 5.3 % (Hong Kong), confirming that LSTM can fit SARS case curves.
0x05 Further Validation The model was tested on three epidemic phases (early, middle, late) using only mainland China data. Errors increased in the early phase, indicating rapid outbreak, while middle‑phase predictions were higher than actual, suggesting slower spread.
0x06 Determining Δ By examining error rates near inflection points, Δ was estimated as the average of 0.21 (explosive phase) and 0.14 (control phase), yielding Δ ≈ 0.175. This threshold is used to judge epidemic status based on the deviation between predicted and actual cases.
0x07 COVID‑19 Prediction Models trained on data up to Jan 20 and Jan 25 were applied to forecast subsequent days. The forecast for Jan 23 suggested a potential surge if actual cases exceeded 3 337, while Jan 24 forecasts indicated continued spread without worsening.
Conclusion The study demonstrates that an LSTM model can reasonably predict epidemic curves using limited historical data, but acknowledges that real‑world deployment would require many additional variables (medical staff, testing accuracy, incubation period, etc.) and richer datasets.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.