Building and Training a DQN Agent with highway‑env for Autonomous Driving Simulation
This article explains how to install gym and highway‑env, configure the environment, process state, action and reward data, build a DQN model in PyTorch, run training loops, and analyze results for autonomous driving simulations using reinforcement learning.
1. Install Environment
Install the gym library and the highway‑env package (which provides six driving scenarios) via pip.
<code>pip install gym
pip install --user git+https://github.com/eleurent/highway-env</code>2. Configure Environment
After installation, create a gym environment for the "highway‑v0" scenario and optionally adjust configuration parameters such as observation type, vehicle count, and feature ranges.
<code>import gym
import highway_env
env = gym.make('highway-v0')
# optional custom configuration
config = {
"observation": {
"type": "Kinematics",
"vehicles_count": 5,
"features": ["presence", "x", "y", "vx", "vy", "cos_h", "sin_h"],
"features_range": {"x": [-100, 100], "y": [-100, 100], "vx": [-20, 20], "vy": [-20, 20]},
"absolute": false,
"order": "sorted"
},
"simulation_frequency": 8,
"policy_frequency": 2
}
env.configure(config)</code>3. Data Processing
State (Observation)
The environment can output observations in three formats: Kinematics (a V×F matrix), Grayscale Image, and Occupancy Grid. The example uses the Kinematics format, which returns a matrix of vehicle features (e.g., position, velocity, heading).
Action
Actions are either continuous (throttle and steering) or discrete. The discrete set includes five meta‑actions: LANE_LEFT, IDLE, LANE_RIGHT, FASTER, and SLOWER.
<code>ACTIONS_ALL = {
0: 'LANE_LEFT',
1: 'IDLE',
2: 'LANE_RIGHT',
3: 'FASTER',
4: 'SLOWER'
}</code>Reward
All scenarios except the parking scene share a common reward function defined inside the package; its weights can be adjusted externally.
4. Build the DQN Model
The DQN network is a simple feed‑forward neural network with an input size of 35 (5 vehicles × 7 features) and an output size of 5 (the discrete actions). The code uses PyTorch.
<code>import torch
import torch.nn as nn
class DQNNet(nn.Module):
def __init__(self):
super(DQNNet, self).__init__()
self.linear1 = nn.Linear(35, 35)
self.linear2 = nn.Linear(35, 5)
def forward(self, s):
s = torch.FloatTensor(s)
s = s.view(s.size(0), 1, 35)
s = self.linear1(s)
s = self.linear2(s)
return s
# DQN wrapper with replay memory, epsilon‑greedy policy, and learning routine omitted for brevity</code>5. Training Loop
The agent interacts with the environment, selects actions using an epsilon‑greedy strategy, stores transitions in a replay buffer, and updates the network periodically. Statistics such as average reward, episode time, and collision rate are plotted every 40 training steps.
<code>while True:
done = False
s = env.reset()
while not done:
e = np.exp(-count/300) # decay epsilon
a = dqn.choose_action(s, e)
s_, r, done, info = env.step(a)
env.render()
dqn.push_memory(s, a, r, s_)
if dqn.position != 0 and dqn.position % 99 == 0:
loss = dqn.learn()
count += 1
if count % 40 == 0:
# compute and plot statistics
...
s = s_
reward.append(r)
# record episode time and collision flag
...
</code>6. Results and Discussion
Training curves show that the average collision rate decreases as training progresses, while episode duration tends to increase (episodes end early when a collision occurs). The abstracted highway‑env environment simplifies algorithm development compared with full‑scale simulators like CARLA, but offers fewer knobs for low‑level control research.
7. Conclusion
highway‑env provides a lightweight, game‑style platform for reinforcement‑learning research in autonomous driving, allowing rapid prototyping of DQN agents without dealing with sensor models or real‑world data acquisition.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.