Reinforcement Learning with highway‑env: Installation, Configuration, and DQN Training in Python
This article demonstrates how to install and configure the highway‑env reinforcement‑learning environment, set up a DQN agent in Python, and train it on various traffic scenarios, providing code examples and performance visualizations.
1. Install the environment – The gym library and the highway-env package are installed via pip:
<code>pip install gym</code> <code>pip install --user git+https://github.com/eleurent/highway-env</code>The package provides six traffic scenarios such as highway-v0 , merge-v0 , roundabout-v0 , parking-v0 , intersection-v0 , and racetrack-v0 . Documentation is available at highway‑env docs .
2. Configure the environment – After installation, a simple script creates the environment and renders a few steps:
<code>import gym
import highway_env
%matplotlib inline
env = gym.make('highway-v0')
env.reset()
for _ in range(3):
action = env.action_type.actions_indexes['IDLE']
obs, reward, done, info = env.step(action)
env.render()
</code>The rendered scene shows the ego vehicle (green) and surrounding traffic.
3. Data processing – highway‑env supplies three observation types:
Kinematics – a V×F matrix where V is the number of observed vehicles (including the ego) and F is the number of features (e.g., presence, x, y, vx, vy, cos_h, sin_h). The configuration example sets vehicles_count to 5 and defines the feature range.
Grayscale Image – a W×H gray‑scale picture of the scene.
Occupancy Grid – a W×H×F tensor representing the occupancy of each cell.
Example configuration for the Kinematics observation:
<code>config = {
"observation": {
"type": "Kinematics",
"vehicles_count": 5,
"features": ["presence", "x", "y", "vx", "vy", "cos_h", "sin_h"],
"features_range": {"x": [-100, 100], "y": [-100, 100], "vx": [-20, 20], "vy": [-20, 20]},
"absolute": False,
"order": "sorted"
},
"simulation_frequency": 8,
"policy_frequency": 2
}
</code>4. Action space – The environment offers discrete meta‑actions:
<code>ACTIONS_ALL = {
0: 'LANE_LEFT',
1: 'IDLE',
2: 'LANE_RIGHT',
3: 'FASTER',
4: 'SLOWER'
}
</code>5. Reward function – All scenarios except parking share a common reward function defined inside the library; it can be modified only in the source code.
6. Build the DQN model – A simple fully‑connected network maps the flattened 5×7 Kinematics vector (size 35) to five discrete actions:
<code>import torch
import torch.nn as nn
class DQNNet(nn.Module):
def __init__(self):
super(DQNNet, self).__init__()
self.linear1 = nn.Linear(35, 35)
self.linear2 = nn.Linear(35, 5)
def forward(self, s):
s = torch.FloatTensor(s)
s = s.view(s.size(0), 1, 35)
s = self.linear1(s)
s = self.linear2(s)
return s
</code>The surrounding DQN class implements experience replay, epsilon‑greedy action selection, and learning updates.
7. Training loop – The script repeatedly resets the environment, selects actions with a decaying epsilon, stores transitions, and calls learn() every 99 steps. Statistics such as average reward, episode time, and collision rate are recorded and plotted every 40 training iterations.
<code>dqn = DQN()
while True:
done = False
s = env.reset()
while not done:
e = np.exp(-count/300)
a = dqn.choose_action(s, e)
s_, r, done, info = env.step(a)
env.render()
dqn.push_memory(s, a, r, s_)
if dqn.position != 0 and dqn.position % 99 == 0:
loss = dqn.learn()
count += 1
s = s_
# record reward, time, collision, etc.
</code>8. Results – After training, the average collision rate decreases, episode duration grows, and the average reward improves, indicating that the agent learns to drive more safely in the simulated highway.
9. Conclusion – Compared with full‑scale simulators like CARLA, highway‑env offers a lightweight, game‑style abstraction that is convenient for algorithm prototyping, though it provides fewer knobs for low‑level control and sensor modeling.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.