Getting Started#


  1. Setting up the python environment

    • Using pip3

    • Using conda

    • Installing Jupyter Notebook

  2. Getting the data

Setting up the Python environment#

Python has been chosen as the main language of this book, for several reasons. It doesn’t have too much jargon, and the signal-to-noise ratio in code is very high (it’s almost pseudocode). While Python is not the optimal language for scientific computing, it is familiar to most people with basic knowledge of programming, and the learning curve for Python is not as steep as C++, Java, or Julia.

One of the drawbacks associated with using Python is that it doesn’t handle large graphs or datasets very well. C++ and Java both deal with larger structures more efficiently, and many authors of seminal papers in the field of GIS provide C++ and Java implementations alongside their papers.

Julia, on the other hand, is a perfect midpoint between python and C++/Java. It maintains the readability of Python but was developed specifically for scientific computing applications.

For the sake of accessibility, the examples and exercises in this book will be based entirely in Python. Implementations in other languages are welcome as pull requests. Simply make a PR of this repo and we’ll include any submissions that look promising.

From this point forward, the book will assume that you already have Python 3.6 or newer installed on your system. For installation instructions specific to your operating system, see this Beginner’s Guide.

There are two primary ways to install the required packages you’ll need: pip or conda. It is recommended to use pip as conda may have some adverse effects on system dependencies when used improperly in Linux.


If you intend on developing in Windows, it is also recommended that you leverage the convenience of the Windows Subsystem for Linux (WSL). This allows you to use the full capabilities of Linux from within Windows. You can read more about setting up WSL here. After enabling WSL, you can proceed with the rest of this book as if you were operating in Linux.

Using pip3#

In the terminal, execute the following commands:

$ sudo apt update
$ sudo apt install python3-pip

Install venv and create a python virtual environment:

$ sudo apt install python3.8-venv
$ mkdir <new directory for venv>
$ python3 -m venv <path to venv directory>

Make sure that you replace python3.8 with the version of Python you are using.

You can now access your virtual environment using the following command:

$ source <path to venv>/bin/activate

Using the MacOS terminal:

$ python3 -m ensurepip --upgrade

venv is included with python 3.8+. You can run the following command to create a virtual environment:

$ mkdir <new directory>
$ python3 -m venv <path to venv directory>

You can now access your virtual environment using the following command:

$ source <path to venv>/bin/activate

pip should come preinstalled with Windows installations of python for versions newer than 3.4.

Make sure to check the “Add Python to PATH” option when using the Python installer. If you don’t, you’ll need to add python, pip, and various other programs to the system path manually.

After installing Python, you may need to restart your computer in order for the changes to take effect.

To create a virtual environment, execute the following either in command prompt, or Windows Powershell:

C:> py -m venv <new directory>

To activate the virtual environment:

C:> cd <your venv directory>
C:> .\Scripts\activate


There are several things to note when using python in Windows.

  1. Using the py or python command may open the Windows Store app in newer releases of Windows. This can be disabled by going to “Apps and Features”, selecting “Application Execution Aliases”, and disabling the sliders for any application related to python (i.e. python3.8.exe, py.exe).

  2. pip3 is sometimes not added to the system PATH by default. You may choose to add it manually, or simply use pip, as it should link to the same program.

  3. You can use Python by calling the py command. If you prefer using python or python3, you may need to add these to the system PATH manually.

Using conda#

Install conda for your OS using the guide found here.

You can create a conda environment with this command:

$ conda create --name <name of env> python=<your version of python>

To access the newly-created environment:

$ conda activate <your env name>

Installing Jupyter Notebook#

All of the code in this book is stored in Jupyter Notebooks (.ipynb files). To access these files directly, you have two options:

  1. Open Binder using the icon on the top right of a page containing Python code.

  2. Install Jupyter Notebook locally.

Jupyter Notebook can be installed as follows:

$ pip3 install jupyterlab
$ pip3 install notebook
$ conda install -c conda-forge jupyterlab
$ conda install -c conda-forge notebook

Getting the data#

Most of the data used in this book will be sourced from OpenStreetMaps. For any other datasets that are not OSM-related, you can download the data in whatever format it is available in, and import it into python using the appropriate methods. Open Data websites hosted by various government bodies are a great source of data related to infrastructure, population metrics, and land use. See the Datasets section in this book for more details.

For OpenStreetMaps, there are two primary ways of retrieving the data:

  1. Download the data as-is from OpenStreetMaps’ website and use tools like osmfilter to tune it as needed. This is not recommended as it is more difficult and not very efficient.

  2. Use OpenStreetMaps’ API (Overpass API) to query for data. This filters the data on the fly and you only retrieve what you need. The API is accessible in both Java and Python.

Data Completeness

The data from OSM is not always “complete”. This doesn’t mean that there are major uncharted regions, but rather that neighbouring nodes are not always grouped correctly. For some nodes where there are feasible connections between them in real life, OSM still represents them as having no way or relation connecting them. This means that using the osmnx parser will result in the nodes being placed in separate graph components, which is not accurate to real-world conditions. We can use osrm to find routes between these kinds of nodes and thus “complete” the graphs.

You can read more about the completeness of OpenStreetMaps data here:

  1. Completeness

  2. Completeness Metrics

The data model of OpenStreetMaps is surprisingly simple and consists only of three elements:

  1. Node represents a specific point on the earth’s surface defined by its latitude and longitude

  2. Way is an ordered list of between 2 and 2,000 nodes that define a polyline. Ways are used to represent linear features such as rivers and roads

  3. Relation is a multi-purpose data structure that documents a relationship between two or more data elements (nodes, ways, and/or other relations). Examples include:

    • A route relation, which lists the ways that form a major (numbered) highway, a cycle route, or a bus route.

    • A turn restriction that describes if a turn can be made from one way onto another.

    • A multipolygon that describes an area (whose boundary is the ‘outer way’) with holes (the ‘inner ways’).

All of the above can be found easily on the linked elements page, but there are two things you should be aware of:

  1. All ID’s of the same element type are unique globally, but they are not unique across element types (you can find a Way with the same ID as a Node).

  2. Ways and Relations are made by listing and referring to the ID’s of the Nodes that constitute them.

Example: Querying points of interest (POIs) using Overpass API#

Overpass API is OSM’s querying API. It is incredibly powerful in that it can very quickly return queried features, and allows for selection of location, tags, proximity, and more.

Let’s query for restaurants near the University of Toronto.

import overpass

api = overpass.API()

# We're looking for restaurants within 1000m of a given point
overpass_query = """
(node["amenity"="restaurant"](around:1000,43.66, -79.39);
 way["amenity"="restaurant"](around:1000,43.66, -79.39);
 rel["amenity"="restaurant"](around:1000,43.66, -79.39);
out center;

restaurants = api.get(overpass_query)
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 import overpass
      3 api = overpass.API()
      5 # We're looking for restaurants within 1000m of a given point

ModuleNotFoundError: No module named 'overpass'

The example above uses the overpass package, which by default returns results in geojson format. See the overpass documentation for more information.

Next, let’s extract some data about each restaurant, and then plot all of them on a map. This time, we’ll use a plotly ScatterMapBox, which uses tiles from MapBox. You can refer to plotly’s documentation here. Each POI on that map has a tooltip that shows the restaurant’s name when hovered.

import plotly.graph_objects as obj

# Extract the lon, lat and name of each restaurant:
coords = []
text = []
for elem in restaurants['features']:
    latlon = elem['geometry']['coordinates']
    if latlon == []: continue
    if 'name'  not in elem['properties']:
# Convert into a dictionary for plotly
restaurant_dict = dict(type='scattermapbox',
                   lat=[x[1] for x in coords], 
                   lon=[x[0] for x in coords],
                   marker=dict(size=8, color='blue'),

# plotting restaurants' locations around University of Toronto

center=(43.662643, -79.395689) # UofT main building

fig = obj.Figure(obj.Scattermapbox(restaurant_dict))

# defining plot layout
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0}, mapbox = {'center': {'lat': center[0], 'lon': center[1]}, 'zoom': 13})

Compile your dataset#

You can compile your own dataset using the Overpass QL language that runs on Overpass turbo. You can use this query language to mine OpenStreetMaps data, filter it, and get it ready to be used by osmnx or any library that parses .osm files. Below is a quick review about using Overpass API, which is the official API for reading data from OpenStreetMap servers. All the online routing services and software use it. Additionally, we will usually use Nominatim to do geocoding/geo-decoding; translating addresses to/from (latitude-longitude).

Also be aware of the fact that most of the time if you are building a dataset over a very big area in the map, the graph parsed from the data by osmnx won’t be complete, even though there are physically feasible routes that could make the graph complete and connect all the nodes. This deficiency is usually because of the incomplete relations and data of osm – don’t worry about that now.

For these cases, we will be using OSRM to fill these gaps when needed, with the help of some utilities in utilities/src/ This can be seen in some of the case studies.

Using OverPass QL#

Fire up Overpass turbo and run these scripts and export it as .osm files.

  • All hospitals around UofT

out center;
out center;
  • All Tim Hortons in Canada

node["name"="Tim Hortons"]({{bbox}});
out center;
  • All fast food or restaurant places in London*

  ["name"="London"]->.a;          // Redirect result to ".a"
out body qt;
    (area.a)                    // Use result from ".a"
    (area.a)                    // Use again result from ".a"
out body qt;
out skel qt;

Finding the bounding box around an area of interest is a recurring problem in writing OverPass QL queries. To solve for that, we can use bbox finder. Don’t forget to change the coordinate format to latitude/longitude at the right corner after drawing the polygon around the area of interest.

Using Overpass turbo’s Wizard#

Overpass turbo’s Wizard provides an easy way to auto-generate Overpass QL queries. Wizard syntax is similar to that of a search engine. An example of Wizard syntax is amenity=hospital that generates an Overpass QL query to find all the hospitals in a certain region of interest. Hospital locations will be visualized on the map and can be downloaded/copied using the “Export” button. The data can be exported as GeoJSON, GPX, KML, raw OSM data, or raw data directly from Overpass API. You can then use osmnx to read .osm files with osmnx.graph_from_xml.

Some examples of Overpass turbo’s wizard syntax include:

  • amenity=restaurant in "Toronto, Canada" to find all resturants in City of Toronto.

  • amenity=cafe and name="Tim Hortons" to find all Tim Hortons coffee shops.

  • (amenity=hospital or amenity=school) and (type:way) to find hospitals and schools with close ways mapped.

  • amenity=hospital and name~"General Hospital" will find all hospitals with “General Hospital” part of their names.


Contributing: Do you have something cool and want to share it with us? If you collected your data by downloading it directly from OpenStreetMaps and did some filtering with osmfilter, open a pull request with the details of the data and how you filtered it. If you collected your data with overpass turbo, please attach your Overpass QL script with the data so we can replicate your results, and maybe we can learn a thing or two from your script.