Cleaning, Interpolating, and Forecasting Yellow River Sediment Data with Python
This article walks through cleaning missing hydrological data, applying various interpolation techniques, visualizing water level, flow, and sediment concentration, decomposing the time series into trend, seasonal, and residual components, and using Prophet to forecast future river sediment levels.
Problem Restatement
The Yellow River is the mother river of the Chinese nation, and studying its water‑sediment flux is crucial for environmental management, climate impact assessment, water resource allocation, flood control, and related research.
Question 1: Investigate the relationship between sediment concentration, time, water level, and flow at a hydrological station, and estimate the station's annual total water and sediment discharge over the past six years.
Question 2: Analyze the abruptness, seasonality, and periodicity of the water‑sediment flux over the same period.
Implementation Process
Data cleaning is performed first, filling missing values using methods such as mean filling or interpolation. Below are common interpolation methods:
Linear Interpolation – estimates intermediate values based on a straight line between two known points.
Polynomial Interpolation – fits a polynomial of degree n‑1 to n data points.
Spline Interpolation – uses low‑order polynomials between points, typically cubic splines, to ensure smoothness.
Lagrange Interpolation – constructs a polynomial that passes exactly through all known data points.
For this project:
Forward‑fill is used for date components (year, month, day).
Linear interpolation is applied to sediment concentration data.
<code>import pandas as pd
# List of sheets to read
sheets = ['2016', '2017', '2018', '2019', '2020', '2021']
# Read each sheet and append to a list
dataframes = [pd.read_excel('data/data_E_2023.xlsx', sheet_name=sheet) for sheet in sheets]
# Concatenate all dataframes
merged_data = pd.concat(dataframes, ignore_index=True)
merged_data.head()
</code>After merging, the date columns are forward‑filled and the sediment concentration column is interpolated:
<code># Forward fill the '年', '月', and '日' columns
merged_data[['年', '月', '日']] = merged_data[['年', '月', '日']].fillna(method='ffill')
# Interpolate the '含沙量(kg/m3)' column
merged_data['含沙量(kg/m3) '] = merged_data['含沙量(kg/m3) '].interpolate()
merged_data.head()
</code> <code># Convert date columns to integers
merged_data[['年', '月', '日']] = merged_data[['年', '月', '日']].astype(int)
# Change '24:00' to '0:00' and increment the day
mask = merged_data['时间'] == '24:00'
merged_data.loc[mask, '时间'] = '0:00'
merged_data.loc[mask, '日'] = merged_data.loc[mask, '日'] + 1
# Create a datetime column
merged_data['日期时间'] = pd.to_datetime(
merged_data['年'].astype(str) + '-' + merged_data['月'].astype(str) + '-' + merged_data['日'].astype(str) + ' ' + merged_data['时间'],
errors='coerce'
)
# Adjust datetime for the shifted rows
merged_data.loc[mask, '日期时间'] = (merged_data.loc[mask, '日期时间'] - pd.Timedelta(hours=1)) + pd.Timedelta(days=1)
</code>Time‑series plots are then generated for water level, flow, and sediment concentration:
<code># Create subplots for individual plots
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(15, 15))
# Plot water level
axes[0].plot(merged_data['日期时间'], merged_data['水位(m)'], color='b', label='水位(m)')
axes[0].set_xlabel('日期时间')
axes[0].set_ylabel('水位(m)')
axes[0].legend()
axes[0].set_title('随日期时间变化的水位')
# Plot flow
axes[1].plot(merged_data['日期时间'], merged_data['流量(m3/s)'], color='g', label='流量(m3/s)')
axes[1].set_xlabel('日期时间')
axes[1].set_ylabel('流量(m3/s)')
axes[1].legend()
axes[1].set_title('随日期时间变化的流量')
# Plot sediment concentration
axes[2].plot(merged_data['日期时间'], merged_data['含沙量(kg/m3) '], color='r', label='含沙量(kg/m3)')
axes[2].set_xlabel('日期时间')
axes[2].set_ylabel('含沙量(kg/m3)')
axes[2].legend()
axes[2].set_title('随日期时间变化的含沙量')
plt.tight_layout()
plt.show()
</code>Time Series Decomposition
The series is decomposed into trend, seasonal, cyclic, and residual components using an additive model:
<code>from statsmodels.tsa.seasonal import seasonal_decompose
# Extract the time series for sediment concentration
sand_series = merged_data.set_index('日期时间')['含沙量(kg/m3) ']
# Resample to daily frequency and interpolate missing values
sand_series_daily = sand_series.resample('D').mean().interpolate()
# Decompose the series
result = seasonal_decompose(sand_series_daily, model='additive')
# Plot components
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(15, 10))
result.observed.plot(ax=ax1, title='Observed')
result.trend.plot(ax=ax2, title='Trend')
result.seasonal.plot(ax=ax3, title='Seasonal')
result.resid.plot(ax=ax4, title='Residual')
plt.tight_layout()
plt.show()
</code>Forecasting with Prophet
The Facebook Prophet library is used to forecast future water level values, handling strong seasonality and missing data.
<code>from prophet import Prophet
# Prepare data for Prophet
water_level_data = merged_data[['日期时间', '水位(m)']].rename(columns={'日期时间': 'ds', '水位(m)': 'y'})
water_level_data.dropna(inplace=True)
# Initialize and fit the model
water_level_model = Prophet(yearly_seasonality=True, daily_seasonality=True)
water_level_model.fit(water_level_data)
# Create a future dataframe for 2 years
future_water_level = water_level_model.make_future_dataframe(periods=365, freq='D')
# Predict
forecast_water_level = water_level_model.predict(future_water_level)
# Plot the forecast
fig_water_level = water_level_model.plot(forecast_water_level)
plt.title('水位(m) 的预测')
plt.xlabel('日期时间')
plt.ylabel('水位(m)')
plt.show()
</code>Prophet also provides component decomposition results, shown in the accompanying figure.
Other details are omitted for brevity.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.