Visualization#

Using matplotlib#

Up to this point we have been plotting our maps using the matplotlib package, which is built into geopandas. While matplotlib is both easy to use and already known to many python users (as it is popular for graphing in general), it has the drawback that it is not interactive.

Using folium#

folium is a python package built to allow the use of leaflet, an open-source JavaScript library. leaflet is an interactive map development toolkit that powers a lot of web-based interactive maps. Alternatives to leaflet include MapBox, Google Maps, as well as many other paid services.

Also built upon leaflet is ipyleaflet, which is a package designed specifically for rendering leaflet maps in Jupyter Notebook. While leaflet is highly interactive and easily builds beautiful maps with little configuration, it depends on JavaScript in order to function.

Let’s replot the above map in folium. Note that folium expects geospatial data to be in EPSG:4326.

import geopandas as gpd
import folium

# file downloaded from https://data.ontario.ca/dataset/ontario-s-health-region-geographic-data
ontario = gpd.read_file(r"../../data/ontario_health_regions/Ontario_Health_Regions.shp")
ontario = ontario[(ontario.REGION != "North")]
ontario = ontario.to_crs(epsg=4326)

# Set starting location, initial zoom, and base layer source.
m = folium.Map(location=[43.67621,-79.40530],zoom_start=6, tiles='cartodbpositron')

for index, row in ontario.iterrows():
    # Simplify each region's polygon as intricate details are unnecessary
    sim_geo = gpd.GeoSeries(row['geometry']).simplify(tolerance=0.001)
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j, name=row['REGION'],style_function=lambda x: {'fillColor': 'black'})
    folium.Popup(row['REGION']).add_to(geo_j)
    geo_j.add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Note

You can click anywhere inside the marked regions to show a popup indicating the region’s name. This folium map uses OpenStreetMaps’s CartoDB styling for a base layer tileset. You can read more about using custom tilesets here.

Elevation Data#

We can also retrieve interesting geospatial information in the form of elevation data.

Let’s first get the centroids for each region:

ontario['centroid'] = ontario.centroid
ontario
C:\Users\Alaa\AppData\Local\Temp\ipykernel_12892\4260188429.py:1: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  ontario['centroid'] = ontario.centroid
Shape_Leng Shape_Area REGION REGION_ID geometry centroid
0 4.845977e+06 1.089122e+11 East 04 MULTIPOLYGON (((-77.54191 43.89913, -77.54234 ... POINT (-77.18411 44.89013)
2 4.860262e+06 7.543033e+10 West 01 MULTIPOLYGON (((-82.68871 41.68453, -82.68631 ... POINT (-81.07741 43.36065)
3 2.755817e+06 3.189867e+10 Central 02 MULTIPOLYGON (((-79.70667 43.39271, -79.70805 ... POINT (-79.65677 44.45330)
4 2.196396e+05 3.715020e+08 Toronto 03 MULTIPOLYGON (((-79.40671 43.74414, -79.40648 ... POINT (-79.40531 43.67619)

Warning

Notice that calculating the centroid raises a warning. That’s because we are using EPSG:4326, which uses degrees as a unit of measure. This makes polygon calculations inaccurate, especially at larger scales. We will ignore this warning for this example, but keep in mind that centroids will not be accurate in this projection. For better results, you can calculate the centroids of a projection that uses a flat projection (that retains area) and then reproject it back to EPSG:4326.

For web use, when the desired effect is a visual centroid, it is possible to continue using a Mercator projection like EPSG:4326, while applications that require a “true” centroid should use a projection like Equal Area Cylindrical, which avoids distortion at the poles. See here for more details.

Now, let’s query the Open Elevation API for the elevation (in metres) at the centroids for each region.

from requests import get

def get_elevation(centroid):
    query = (f'https://api.open-elevation.com/api/v1/lookup?locations={centroid.y},{centroid.x}')

    # Set a timeout on the request in case of a slow response
    r = get(query,timeout=30)

    # Only use the response if the status is successful
    if r.status_code!=200 and r.status_code!=201: return None

    elevation = r.json()['results'][0]['elevation']
    return elevation

elevations = []

for index, row in ontario.iterrows():
    elevations.append(get_elevation(row['centroid']))
    
ontario['elevations'] = elevations
ontario
Shape_Leng Shape_Area REGION REGION_ID geometry centroid elevations
0 4.845977e+06 1.089122e+11 East 04 MULTIPOLYGON (((-77.54191 43.89913, -77.54234 ... POINT (-77.18411 44.89013) 274.0
2 4.860262e+06 7.543033e+10 West 01 MULTIPOLYGON (((-82.68871 41.68453, -82.68631 ... POINT (-81.07741 43.36065) 353.0
3 2.755817e+06 3.189867e+10 Central 02 MULTIPOLYGON (((-79.70667 43.39271, -79.70805 ... POINT (-79.65677 44.45330) 261.0
4 2.196396e+05 3.715020e+08 Toronto 03 MULTIPOLYGON (((-79.40671 43.74414, -79.40648 ... POINT (-79.40531 43.67619) 125.0

Here they are plotted on the map, using Markers to show the centroid location. You can click on the marker to show the elevation at that point.

for index, row in ontario.iterrows():
    folium.Marker(location=[row['centroid'].y,row['centroid'].x], popup='Elevation: {}'.format(row['elevations'])).add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Custom Polygons#

We can also plot some custom polygons on our map.

For this particular example, let’s plot the three campuses of the University of Toronto on a map of Toronto. The campus polygons were generated manually using QGIS, but they can also be done in any GIS editor like ArcGIS or AutoCAD Map 3D.

import matplotlib.pyplot as plt

# This file was created using the QGIS Editor
uoft = gpd.read_file(r"../../data/QGIS/UofT.shp")

fig, ax = plt.subplots(figsize=(15,15))

toronto = ontario[(ontario.REGION == "Toronto")]
t = toronto.plot(ax=ax, color='none',edgecolor='black')
t.set_title("University of Toronto campuses")
uoft.plot(ax=ax, color='blue', edgecolor='black')
<AxesSubplot:title={'center':'University of Toronto campuses'}>
../../_images/32357b7707531702352127fab4793b67c8e998edee094978a962ef7318b5b70b.png

This is nice, but we can see that two of the polygons (two campuses of the University of Toronto) are floating in the middle of nowhere. That’s because our Ontario dataset only encompasses most of downtown Toronto, but ignores the other boroughs. Let’s use a Toronto municipal map instead. In this example, we’ll actually read the input as a .geojson file. It works the same way as .shp files. Let’s use the Toronto municipalities as a base layer, and plot the University of Toronto campuses on top of it.

fig, ax = plt.subplots(figsize=(15,15))
toronto = gpd.read_file(r"../../data/ontario_health_regions/TorontoBoundary.geojson")
t = toronto.plot(ax=ax, cmap='Pastel1', edgecolor='black', column='AREA_NAME', figsize=(15,15), legend=True)
t.set_title("University of Toronto campuses")
uoft.plot(ax=ax, color='blue', edgecolor='white')
<AxesSubplot:title={'center':'University of Toronto campuses'}>
../../_images/d071988acd10283a7f97953e263b84585f2523920004e35a63b24c30b6e2fb80.png

Almost there! The Mississauga campus (bottom left) is still in limbo. That’s because Mississauga is part of Peel Region, which is outside of Toronto. Let’s redo the map, including the Mississauga boundaries! This time, we’ll use a .kml file for the input. KML files are not supported by default by geopandas, so we’ll need to use a “driver” to read it. Fortunately, we can leverage fiona, a powerful python IO API, which is already built into geopandas.

gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'
missisauga = gpd.read_file(r"../../data/ontario_health_regions/MissBoundary.kml",driver='KML')
missisauga
Name Description geometry
0 MULTIPOLYGON (((-79.63657 43.73509, -79.63405 ...
new_row = {
    'AREA_NAME' : 'MISSISSAUGA',
    'geometry': missisauga['geometry'][0]
}
gta = toronto.append(new_row, ignore_index=True)
C:\Users\Alaa\AppData\Local\Temp\ipykernel_12892\1538361152.py:5: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
  gta = toronto.append(new_row, ignore_index=True)
C:\Users\Alaa\.conda\envs\uoft\lib\site-packages\pandas\core\dtypes\cast.py:1983: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  result[:] = values

Now let’s try plotting the University of Toronto campuses again…

fig, ax = plt.subplots(figsize=(15,15))
g = gta.plot(ax=ax, cmap='Pastel1', edgecolor='black', column='AREA_NAME', figsize=(15,15), legend=True)
g.set_title("University of Toronto campuses")
uoft.plot(ax=ax, color='blue', edgecolor='white')
<AxesSubplot:title={'center':'University of Toronto campuses'}>
../../_images/73bd0f62f043c24765805d18cb97d74d2cc1a36bad653fc9cf59b03bd4b8a8dc.png

Going further

As an exercise, we could try adding Markers and/or Popups to the campuses, to show their names when clicked. Alternatively we could also change the basemap layer to a road network map (like OpenStreetMap), and plot the shortest distance (by road) between the campuses…

Visualization Types#

There are various ways to visualize geospatial data on a map. Below are a few examples of some common visualization types.

For the example in this section, data will be sourced from Statistics Canada’s dataset for Motor Vehicle Collisions in 2018 [6].

import pandas
# Import polygons for Canadian provinces and concatenate to traffic statistics
canada = gpd.read_file(r"../../data/Visualization/canada.geojson")
traffic_accidents = pandas.read_csv(r"../../data/Visualization/Canada_traffic.csv").drop(columns=['Province'])
dataset = pandas.concat([canada,traffic_accidents],axis=1).reindex(canada.index)

Chloropleth Map#

A chloropleth map showcases data by representing each region with a different colour, often varying shading based on a target metric. For this example, we will use folium to show a chloropleth map of injuries as a result of motor vehicle collisions in 2018, by province.

m = folium.Map(location=[58.4052172,-109.6062729],zoom_start=3)

# Setup binning for legend
bins = list(dataset['Injuries per 100,000 population'].quantile([0,0.25,0.5,0.75,1]))

# Main map init
folium.Choropleth(
    geo_data=dataset,
    name='chloropleth',
    data=dataset,
    columns=['name','Injuries per 100,000 population'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.5,
    legend_name='Injuries per 100,000 population',
    bins=bins,
    reset=True
).add_to(m)
folium.LayerControl().add_to(m)

# Tooltip styling
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}

# Tooltip
NIL = folium.features.GeoJson(
    dataset,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['name','Fatalities per 100,000 population','Injuries per 100,000 population','Fatalities per Billion vehicles-kilometres','Injuries per Billion vehicles-kilometres','Fatalities per 100,000 licensed drivers','Injuries per 100,000 licensed drivers'],
        aliases=['Province','Fatalities per 100,000 population','Injuries per 100,000 population','Fatalities per Billion vehicles-kilometres','Injuries per Billion vehicles-kilometres','Fatalities per 100,000 licensed drivers','Injuries per 100,000 licensed drivers'],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;") 
    )
)
m.add_child(NIL)
m.keep_in_front(NIL)
m
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 7>()
      4 bins = list(dataset['Injuries per 100,000 population'].quantile([0,0.25,0.5,0.75,1]))
      6 # Main map init
----> 7 folium.Choropleth(
      8     geo_data=dataset,
      9     name='chloropleth',
     10     data=dataset,
     11     columns=['name','Injuries per 100,000 population'],
     12     key_on='feature.properties.name',
     13     fill_color='BuPu',
     14     fill_opacity=0.7,
     15     line_opacity=0.5,
     16     legend_name='Injuries per 100,000 population',
     17     bins=bins,
     18     reset=True
     19 ).add_to(m)
     20 folium.LayerControl().add_to(m)
     22 # Tooltip styling

File ~\.conda\envs\uoft\lib\site-packages\folium\features.py:1289, in Choropleth.__init__(self, geo_data, data, columns, key_on, bins, fill_color, nan_fill_color, fill_opacity, nan_fill_opacity, line_color, line_weight, line_opacity, name, legend_name, overlay, control, show, topojson, smooth_factor, highlight, **kwargs)
   1283     self.geojson = TopoJson(
   1284         geo_data,
   1285         topojson,
   1286         style_function=style_function,
   1287         smooth_factor=smooth_factor)
   1288 else:
-> 1289     self.geojson = GeoJson(
   1290         geo_data,
   1291         style_function=style_function,
   1292         smooth_factor=smooth_factor,
   1293         highlight_function=highlight_function if highlight else None)
   1295 self.add_child(self.geojson)
   1296 if self.color_scale:

File ~\.conda\envs\uoft\lib\site-packages\folium\features.py:499, in GeoJson.__init__(self, data, style_function, highlight_function, name, overlay, control, show, smooth_factor, tooltip, embed, popup, zoom_on_click, marker)
    496         raise TypeError("Only Marker, Circle, and CircleMarker are supported as GeoJson marker types.")
    497 self.marker = marker
--> 499 self.data = self.process_data(data)
    501 if self.style or self.highlight:
    502     self.convert_to_feature_collection()

File ~\.conda\envs\uoft\lib\site-packages\folium\features.py:542, in GeoJson.process_data(self, data)
    540     if hasattr(data, 'to_crs'):
    541         data = data.to_crs('EPSG:4326')
--> 542     return json.loads(json.dumps(data.__geo_interface__))
    543 else:
    544     raise ValueError('Cannot render objects with any missing geometries'
    545                      ': {!r}'.format(data))

File ~\.conda\envs\uoft\lib\json\__init__.py:231, in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    226 # cached encoder
    227 if (not skipkeys and ensure_ascii and
    228     check_circular and allow_nan and
    229     cls is None and indent is None and separators is None and
    230     default is None and not sort_keys and not kw):
--> 231     return _default_encoder.encode(obj)
    232 if cls is None:
    233     cls = JSONEncoder

File ~\.conda\envs\uoft\lib\json\encoder.py:199, in JSONEncoder.encode(self, o)
    195         return encode_basestring(o)
    196 # This doesn't pass the iterator directly to ''.join() because the
    197 # exceptions aren't as detailed.  The list call should be roughly
    198 # equivalent to the PySequence_Fast that ''.join() would do.
--> 199 chunks = self.iterencode(o, _one_shot=True)
    200 if not isinstance(chunks, (list, tuple)):
    201     chunks = list(chunks)

File ~\.conda\envs\uoft\lib\json\encoder.py:257, in JSONEncoder.iterencode(self, o, _one_shot)
    252 else:
    253     _iterencode = _make_iterencode(
    254         markers, self.default, _encoder, self.indent, floatstr,
    255         self.key_separator, self.item_separator, self.sort_keys,
    256         self.skipkeys, _one_shot)
--> 257 return _iterencode(o, 0)

File ~\.conda\envs\uoft\lib\json\encoder.py:179, in JSONEncoder.default(self, o)
    160 def default(self, o):
    161     """Implement this method in a subclass such that it returns
    162     a serializable object for ``o``, or calls the base implementation
    163     (to raise a ``TypeError``).
   (...)
    177 
    178     """
--> 179     raise TypeError(f'Object of type {o.__class__.__name__} '
    180                     f'is not JSON serializable')

TypeError: Object of type Timestamp is not JSON serializable

Cartogram Map#

Cartogram maps distort the area or distance scale of a map to reflect a chosen variable. In this case, we can distort the area of each individual province to scale to its share of motor vehicle injuries in 2018.

import geoplot


ax = geoplot.cartogram(
    dataset,
    scale='Injuries per 100,000 population', 
    legend=True,
    hue='Injuries per 100,000 population',
    legend_var='hue',
    cmap='BuPu',
    limits=(0.15,2),
    projection=geoplot.crs.WebMercator(),
    figsize=(15,15))
geoplot.polyplot(dataset, facecolor='lightgray', edgecolor='white', ax=ax)
<GeoAxesSubplot:>
../../_images/1aa543144c3d3dbd294e643267dff0e861191ec6db96babb3d85bb5ec322d510.svg

In this cartogram map, we can see that Manitoba has a disproportionate share of the motor vehicle injuries.

Bubble Map#

Bubble maps show properties of the points being visualized through colour and size variations. In the example below, we vary Marker size by the Injuries per 100,000 population, while we colour provinces in blue and territories in red.

territories = ['Yukon Territory','Nunavut','Northwest Territories']

m = folium.Map(location=[58.4052172,-109.6062729],zoom_start=3)

for i,r in dataset.iterrows():
    folium.Circle(
        location=[r['geometry'].centroid.y,r['geometry'].centroid.x],
        radius = float(r['Injuries per 100,000 population'])*500,
        fill=True,
        popup=f"{r['name']}: {r['Injuries per 100,000 population']}",
        color = 'blue' if r['name'] not in territories else 'crimson'
        ).add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Hexagonal Binning#

Hexagonal binning is a kind of aggregate visualization method where hexagons are drawn over the map, with each hexagon summarizing/abstracting the data that it contains.

The example below uses traffic accident data for Toronto, retrieved from the Toronto Police Service’s Public Safety Data Portal [7].

import plotly.figure_factory as ff

ksi = gpd.read_file(r"../../data/Visualization/KSI.geojson")
fig = ff.create_hexbin_mapbox(
    data_frame=ksi, 
    lat="LATITUDE", 
    lon="LONGITUDE", 
    nx_hexagon=15, 
    opacity=0.5,
    min_count=1,
    labels={'color':'Reported Accidents'},
    color_continuous_scale='turbo'
    )

fig.update_layout(mapbox_style="open-street-map")
fig.show()

Heat Map#

Heat maps use a colour gradient to represent increasing density in data. As the underlying data points increases in density, so does the intensity (or shade) of the heatmap colour. Below is an example using the same Toronto traffic data from before.

import plotly.express as px

fig = px.density_mapbox(
    data_frame=ksi,
    lat="LATITUDE", 
    lon="LONGITUDE",
    mapbox_style='open-street-map',
    radius=1,
    zoom=9,
)
fig.update_layout(coloraxis_showscale=False, hovermode=False)

Cluster Map#

Cluster maps aggregate data by merging neighouring points into one larger “cluster”. This generally improves performance when rendering larger datasets, and provides an easy-to-access indicator of density.

from folium.plugins import FastMarkerCluster
m = folium.Map(location=[43.6532,-79.3832], zoom_start=10)
FastMarkerCluster([[r.LATITUDE,r.LONGITUDE] for r in ksi.itertuples()]).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook