CARVIEW |
- Log in to:
- Community
- DigitalOcean
- Sign up for:
- Community
- DigitalOcean

By Ayoosh Kathuria and Shaoni Mukherjee

If you’re getting into reinforcement learning, one of the best tools you can use is OpenAI Gym. It gives you a collection of environments where your AI agents can learn by trying things out and learning from the results. These environments range from simple tasks like balancing a pole on a cart to more fun ones like playing Atari games such as Breakout, Pacman, and Seaquest.
But sometimes, the built-in environments just don’t fit what you’re trying to do. Maybe you have a unique idea or a specific task in mind that’s not already included. The good news is that OpenAI Gym makes it easy to create your own custom environment—and that’s exactly what we’ll be doing in this post.
We will build a simple environment where an agent controls a chopper (or helicopter) and has to fly it while dodging obstacles in the air. This is the second part of our OpenAI Gym series, so we’ll assume you’ve gone through Part 1. If not, you can check it out on our blog.
Prerequisites
- Python: A machine with Python installed and beginner experience with Python coding is recommended for this tutorial.
- OpenAI Gym: This package must be installed on the machine or droplet being used.
Dependencies/Imports
We first begin with installing some important dependencies.
!pip install opencv-python
!pip install pillow
We also start with the necessary imports.
import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random
from gym import Env, spaces
import time
font = cv2.FONT_HERSHEY_COMPLEX_SMALL
Description of the Environment
The environment we’re building is kind of like a game, and it’s inspired by the classic Dino Run game—the one you see in Google Chrome when your internet goes down. In that game, there’s a little dinosaur that keeps running forward, and your job is to make it jump over cacti and avoid birds flying at it. The longer you survive and the more distance you cover, the higher your score—or in reinforcement learning terms, the more reward you get.
In our game, instead of a dinosaur, our agent will be a Chopper pilot.
- The Chopper has to cover as much distance as possible to get the maximum reward. There will be birds that the chopper has to avoid.
- The episode terminates in case of a bird strike. The episode can also terminate if the Chopper runs out of fuel.
- Like birds, there are floating fuel tanks (yes, no points for being close to reality, I know!) which the Chopper can collect to refuel the chopper to its full capacity (fixed at 1000 L).
Note that this is going to be just a proof of concept and not the most aesthetically pleasing game. However, this post will give you enough knowledge to improve on it!
The first consideration when designing an environment is to decide what sort of observation and action space we will use.
- The observation space can be either continuous or discrete. An example of a discrete action space is that of a grid world where the observation space is defined by cells, and the agent could be inside one of those cells. An example of a continuous action space is one where the agent’s position is described by real-valued coordinates.
- The action space can be either continuous or discrete. A discrete action space consists of specific behaviors that the agent can perform, but these behaviors cannot be quantified. For example, in a game like Mario Bros, the actions include moving left or right and jumping. While you can perform these actions, you cannot quantify them further — you can jump, but not jump higher or lower. In contrast, a game like Angry Birds features a continuous action space where you can decide how far to stretch the slingshot, allowing you to quantify your actions.
ChopperScape Class
We begin by implementing the init function of our environment class, ChopperScape
. In the init function, we will define the observation and the action spaces. In addition to that, we will also implement a few other attributes:
Canvas
: This represents our observation image.x_min
,y_min
,x_max
,y_max
: This defines the legitimate area of our screen where various elements, such as the Chopper and birds, can be placed. Other areas are reserved for displaying information, such as fuel left, rewards, and padding.elements
: This stores the active elements on the screen at any given time (e.g., chopper, bird, etc.).max_fuel
: Maximum fuel that the chopper can hold.
class ChopperScape(Env):
def __init__(self):
super(ChopperScape, self).__init__()
# Define a 2-D observation space
self.observation_shape = (600, 800, 3)
self.observation_space = spaces.Box(low = np.zeros(self.observation_shape),
high = np.ones(self.observation_shape),
dtype = np.float16)
# Define an action space ranging from 0 to 4
self.action_space = spaces.Discrete(6,)
# Create a canvas to render the environment images upon
self.canvas = np.ones(self.observation_shape) * 1
# Define elements present inside the environment
self.elements = []
# Maximum fuel chopper can take at once
self.max_fuel = 1000
# Permissible area of helicper to be
self.y_min = int (self.observation_shape[0] * 0.1)
self.x_min = 0
self.y_max = int (self.observation_shape[0] * 0.9)
self.x_max = self.observation_shape[1]
Elements of the Environment
Once we have determined the action space and the observation space, we need to finalize what would be the elements of our environment. In our game, we have three distinct elements: the Chopper, Flying Birds, and and Floating Fuel Stations. We will be implementing all of these as separate classes that inherit from a common base class called Point
.
Point Base Class
The Point class defines any arbitrary point on our observation image. We define this class with the following attributes:
(x,y)
: Position of the point on the image.(x_min, x_max, y_min, y_max)
: Permissible coordinates for the point. If we try to set the position of the point outside these limits, the position values are clamped to these limits.name
: Name of the point.
We define the following member functions for this class.
get_position
: Get the coordinates of the point.set_position
: Set the coordinates of the point to a certain value.move
: Move the points by a certain value.
class Point(object):
def __init__(self, name, x_max, x_min, y_max, y_min):
self.x = 0
self.y = 0
self.x_min = x_min
self.x_max = x_max
self.y_min = y_min
self.y_max = y_max
self.name = name
def set_position(self, x, y):
self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)
def get_position(self):
return (self.x, self.y)
def move(self, del_x, del_y):
self.x += del_x
self.y += del_y
self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)
def clamp(self, n, minn, maxn):
return max(min(maxn, n), minn)
Now we define the classes Chopper
, Bird
and Fuel
. These classes are derived from the Point
class, and introduce a set of new attributes:
icon
: Icon of the point that will display on the observation image when the game is rendered.(icon_w, icon_h)
: Dimensions of the icon.
class Chopper(Point):
def __init__(self, name, x_max, x_min, y_max, y_min):
super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
self.icon = cv2.imread("chopper.png") / 255.0
self.icon_w = 64
self.icon_h = 64
self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))
class Bird(Point):
def __init__(self, name, x_max, x_min, y_max, y_min):
super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
self.icon = cv2.imread("bird.png") / 255.0
self.icon_w = 32
self.icon_h = 32
self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))
class Fuel(Point):
def __init__(self, name, x_max, x_min, y_max, y_min):
super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
self.icon = cv2.imread("fuel.png") / 255.0
self.icon_w = 32
self.icon_h = 32
self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))
Back to the ChopperScape Class
Recall from Part 1 that any gym Env class has two important functions:
- reset: Resets the environment to its initial state and returns the initial observation.
- step: Executes a step in the environment by applying an action. Returns the new observation, reward, completion status, and other info.
In this section, we will implement our environment’s reset
and step
functions, along with many other helper functions. We begin with the reset
function.
Reset Function
When we reset our environment, we need to reset all the state-based variables. These include fuel consumed, episodic return, and the elements inside the environment.
In our case, when we reset our environment, we have nothing but the Chopper in its initial state. We initialize our chopper randomly in an area in the top left of our image. This area is 5-10 percent of the image width and 15-20 percent of the image height.
We also define a helper function called draw_elements_on_canvas
that basically places the icons of each of the game’s elements at their respective positions in the observation image. If the position is beyond the permissible range, then the icons are placed on the range boundaries. We also print important information, such as the remaining fuel.
We finally return to the canvas on which the elements have been placed as the observation.
%%add_to ChopperScape
def draw_elements_on_canvas(self):
# Init the canvas
self.canvas = np.ones(self.observation_shape) * 1
# Draw the heliopter on canvas
for elem in self.elements:
elem_shape = elem.icon.shape
x,y = elem.x, elem.y
self.canvas[y : y + elem_shape[1], x:x + elem_shape[0]] = elem.icon
text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)
# Put the info on canvas
self.canvas = cv2.putText(self.canvas, text, (10,20), font,
0.8, (0,0,0), 1, cv2.LINE_AA)
def reset(self):
# Reset the fuel consumed
self.fuel_left = self.max_fuel
# Reset the reward
self.ep_return = 0
# Number of birds
self.bird_count = 0
self.fuel_count = 0
# Determine a place to intialise the chopper in
x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))
# Intialise the chopper
self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
self.chopper.set_position(x,y)
# Intialise the elements
self.elements = [self.chopper]
# Reset the Canvas
self.canvas = np.ones(self.observation_shape) * 1
# Draw elements on the canvas
self.draw_elements_on_canvas()
# return the observation
return self.canvas
Before we proceed further, let us now see what our initial observation looks like.
env = ChopperScape()
obs = env.reset()
plt.imshow(obs)
Since our observation is the same as the gameplay screen of the game, our render function shall return our observation, too. We build functionality for two modes. One human
, which would render the game in a pop-up window, while rgb_array
returns it as a pixel array.
%%add_to ChopperScape
def render(self, mode = "human"):
assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
if mode == "human":
cv2.imshow("Game", self.canvas)
cv2.waitKey(10)
elif mode == "rgb_array":
return self.canvas
def close(self):
cv2.destroyAllWindows()
env = ChopperScape()
obs = env.reset()
screen = env.render(mode = "rgb_array")
plt.imshow(screen)
Step Function
Now that we have the reset
function out of the way, we begin work on implementing the step
function, which will contain the code to transition our environment from one state to the next, given an action. In many ways, this section is the proverbial meat of our environment, and this is where most of the planning goes.
We first need to enlist things that need to happen in one transition step of the environment. This can be broken down into two parts:
- Applying actions to our agent.
- Everything else happens in the environments, such as the behavior of the non-RL actors (e.g., birds and floating gas stations).
So, let’s first focus on (1). We provide actions to the game that will control what our chopper does. We basically have five actions: move right, left, down, up, or do nothing, denoted by 0, 1, 2, 3, and 4, respectively.
We define a member function called get_action_meanings()
that will tell us what integer each action is mapped to for our reference.
%%add_to ChopperScape
def get_action_meanings(self):
return {0: "Right", 1: "Left", 2: "Down", 3: "Up", 4: "Do Nothing"}
We also validate whether the action being passed is valid by checking whether it’s present in the action space. If not, we raise an assertion.
# Assert that it is a valid action
assert self.action_space.contains(action), "Invalid Action"
Once that is done, we accordingly change the position of the chopper using the move
function we defined earlier. Each action results in movement by five coordinates in the respective directions.
# apply the action to the chopper
if action == 0:
self.chopper.move(0,5)
elif action == 1:
self.chopper.move(0,-5)
elif action == 2:
self.chopper.move(5,0)
elif action == 3:
self.chopper.move(-5,0)
elif action == 4:
self.chopper.move(0,0)
Now that we have taken care of applying the action to the chopper, we focus on the other elements of the environment:
- Birds spawn randomly from the right edge of the screen with a probability of 1% (i.e., a bird is likely to appear on the right edge once every hundred frames). The bird moves five-coordinate points every frame to the left. If they hit the Chopper, the game ends. Otherwise, they disappear from the game once they reach the left edge.
- Fuel tanks spawn randomly from the screen’s bottom edge with a probability of 1 % (i.e., a fuel tank is likely to appear on the bottom edge once every hundred frames). The bird moves five coordinates up every frame. If they hit the Chopper, it is fuelled to its full capacity. Otherwise, they disappear from the game once they reach the top edge.
To implement the features outlined above, we need to implement a helper function that helps us determine whether two Point
objects (such as a Chopper/Bird or Chopper/Fuel Tank) have collided. How do we define a collision? We say that two points have collided when the distance between the coordinates of their centers is less than half of the sum of their dimensions. We call this function has_collided
.
%%add_to ChopperScape
def has_collided(self, elem1, elem2):
x_col = False
y_col = False
elem1_x, elem1_y = elem1.get_position()
elem2_x, elem2_y = elem2.get_position()
if 2 * abs(elem1_x - elem2_x) <= (elem1.icon_w + elem2.icon_w):
x_col = True
if 2 * abs(elem1_y - elem2_y) <= (elem1.icon_h + elem2.icon_h):
y_col = True
if x_col and y_col:
return True
return False
Apart from this, we have to do some bookkeeping. The reward for each step is 1; therefore, the episodic return counter is updated by one every episode. If there is a collision, the reward is -10, and the episode terminates. The fuel counter is reduced by one at every step.
Finally, we implement our step
function. We have written extensive comments to guide you through the process.
%%add_to ChopperScape
def step(self, action):
# Flag that marks the termination of an episode
done = False
# Assert that it is a valid action
assert self.action_space.contains(action), "Invalid Action"
# Decrease the fuel counter
self.fuel_left -= 1
# Reward for executing a step.
reward = 1
# apply the action to the chopper
if action == 0:
self.chopper.move(0,5)
elif action == 1:
self.chopper.move(0,-5)
elif action == 2:
self.chopper.move(5,0)
elif action == 3:
self.chopper.move(-5,0)
elif action == 4:
self.chopper.move(0,0)
# Spawn a bird at the right edge with prob 0.01
if random.random() < 0.01:
# Spawn a bird
spawned_bird = Bird("bird_{}".format(self.bird_count), self.x_max, self.x_min, self.y_max, self.y_min)
self.bird_count += 1
# Compute the x,y co-ordinates of the position from where the bird has to be spawned
# Horizontally, the position is on the right edge and vertically, the height is randomly
# sampled from the set of permissible values
bird_x = self.x_max
bird_y = random.randrange(self.y_min, self.y_max)
spawned_bird.set_position(self.x_max, bird_y)
# Append the spawned bird to the elements currently present in Env.
self.elements.append(spawned_bird)
# Spawn a fuel at the bottom edge with prob 0.01
if random.random() < 0.01:
# Spawn a fuel tank
spawned_fuel = Fuel("fuel_{}".format(self.bird_count), self.x_max, self.x_min, self.y_max, self.y_min)
self.fuel_count += 1
# Compute the x,y co-ordinates of the position from where the fuel tank has to be spawned
# Horizontally, the position is randomly chosen from the list of permissible values and
# vertically, the position is on the bottom edge
fuel_x = random.randrange(self.x_min, self.x_max)
fuel_y = self.y_max
spawned_fuel.set_position(fuel_x, fuel_y)
# Append the spawned fuel tank to the elemetns currently present in the Env.
self.elements.append(spawned_fuel)
# For elements in the Ev
for elem in self.elements:
if isinstance(elem, Bird):
# If the bird has reached the left edge, remove it from the Env
if elem.get_position()[0] <= self.x_min:
self.elements.remove(elem)
else:
# Move the bird left by 5 pts.
elem.move(-5,0)
# If the bird has collided.
if self.has_collided(self.chopper, elem):
# Conclude the episode and remove the chopper from the Env.
done = True
reward = -10
self.elements.remove(self.chopper)
if isinstance(elem, Fuel):
# If the fuel tank has reached the top, remove it from the Env
if elem.get_position()[1] <= self.y_min:
self.elements.remove(elem)
else:
# Move the Tank up by 5 pts.
elem.move(0, -5)
# If the fuel tank has collided with the chopper.
if self.has_collided(self.chopper, elem):
# Remove the fuel tank from the env.
self.elements.remove(elem)
# Fill the fuel tank of the chopper to full.
self.fuel_left = self.max_fuel
# Increment the episodic return
self.ep_return += 1
# Draw elements on the canvas
self.draw_elements_on_canvas()
# If out of fuel, end the episode.
if self.fuel_left == 0:
done = True
return self.canvas, reward, done, []
Seeing It in Action
This concludes the code for our environment. Now execute some steps in the environment using an agent that takes random actions!
from IPython import display
env = ChopperScape()
obs = env.reset()
while True:
# Take a random action
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
# Render the game
env.render()
if done == True:
break
env.close()
Conclusion
That’s a wrap for this part, folks! I hope this tutorial gave you a clear idea of what goes into building a custom environment with OpenAI Gym—from the design decisions to the little details that make your game-like setup fun and challenging. Now that you’ve got the basics down, feel free to get creative and build your own environment from scratch—or improve the one we just made. Here are a few ideas to level it up:
- Instead of ending the game on the first bird hit, you could add a life system for the chopper.
- Create an evil alien race of mutated birds that can fire missiles, and make the chopper dodge them.
Add some logic for what happens when a fuel tank and a bird collide—maybe a big explosion? And if you want to run your training faster or scale your experiments, consider spinning up a GPU Droplet on DigitalOcean. It’s a great way to offload the heavy lifting and train your agents more efficiently. Until next time—happy coding!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
About the author(s)

With a strong background in data science and over six years of experience, I am passionate about creating in-depth content on technologies. Currently focused on AI, machine learning, and GPU computing, working on topics ranging from deep learning frameworks to optimizing GPU-based workloads.
Still looking for an answer?
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
- Table of contents
- Prerequisites
- Dependencies/Imports
- Description of the Environment
- Elements of the Environment
- Back to the ChopperScape Class
- Seeing It in Action
- Conclusion
Deploy on DigitalOcean
Click below to sign up for DigitalOcean's virtual machines, Databases, and AIML products.
Become a contributor for community
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
DigitalOcean Documentation
Full documentation for every DigitalOcean product.
Resources for startups and SMBs
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Get our newsletter
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
The developer cloud
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Get started for free
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.