Artificial Intelligence 9 min read

Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

This article explains how to install the gym and highway‑env packages, configure the environment for various driving scenarios, define observations, actions and rewards, build a DQN network in PyTorch, run the training loop, and analyze the resulting performance metrics.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Tutorial: Setting Up highway‑env with OpenAI Gym and Training a DQN for Autonomous Driving

This guide shows how to install the gym library and the highway‑env package (a set of six driving scenarios) using pip, and provides a link to the official documentation.

After installation, an example environment is created with gym.make('highway‑v0') and rendered to verify the setup.

The environment offers three observation types: Kinematics (a V×F matrix of vehicle features), Grayscale Image (a W×H gray‑scale picture), and Occupancy Grid (a W×H×F 3‑D tensor describing surrounding traffic).

Action space can be continuous or discrete; the discrete set includes LANE_LEFT , IDLE , LANE_RIGHT , FASTER , and SLOWER :

ACTIONS_ALL = {0: 'LANE_LEFT', 1: 'IDLE', 2: 'LANE_RIGHT', 3: 'FASTER', 4: 'SLOWER'}

The default reward function (identical for all scenarios except parking) is defined inside the package and can only be modified by editing the source code.

A simple DQN network is built with PyTorch. The network takes a flattened Kinematics observation (5 vehicles × 7 features = 35 inputs) and outputs Q‑values for the five discrete actions.

import torch
import torch.nn as nn

class DQNNet(nn.Module):
    def __init__(self):
        super(DQNNet, self).__init__()
        self.linear1 = nn.Linear(35, 35)
        self.linear2 = nn.Linear(35, 5)
    def forward(self, s):
        s = torch.FloatTensor(s)
        s = s.view(s.size(0), 1, 35)
        s = self.linear1(s)
        s = self.linear2(s)
        return s

The training loop follows the typical DQN pipeline: epsilon‑greedy action selection, experience replay, periodic target‑network updates, and loss optimization. Statistics such as average reward, episode duration, and collision rate are plotted every 40 training iterations.

while True:
    done = False
    start_time = time.time()
    s = env.reset()
    while not done:
        e = np.exp(-count/300)
        a = dqn.choose_action(s, e)
        s_, r, done, info = env.step(a)
        env.render()
        dqn.push_memory(s, a, r, s_)
        if dqn.position != 0 and dqn.position % 99 == 0:
            loss_ = dqn.learn()
            count += 1
            if count % 40 == 0:
                # plot metrics
                ...
        s = s_
        reward.append(r)
    # record episode time and collision
    ...

Resulting plots show that the collision rate decreases as training progresses, while episode length tends to increase (episodes end early when a crash occurs).

In conclusion, compared with full‑scale simulators like CARLA, highway‑env provides a highly abstracted, game‑like environment that is convenient for rapid reinforcement‑learning experiments, though it offers limited control over low‑level vehicle dynamics.

simulationreinforcement learningDQNautonomous drivinggymhighway-env
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.