CSC 578D / Data Mining / Fall 2018 / University of Victoria

Python Notebook for Final Project

The datasets for this project are the following:

  1. Crime statistics for the city of Chicago from 2017. Link to dataset.
    1. Download here
  2. The Police beats boundaries for the city of Chicago. Link to dataset.
    1. Download here
  3. List of crimes by special code, in this case named IUCR (Illinois Uniform Crime Reporting). Link to dataset.
    1. Download here

Goals of this analysis:

  1. Look for patterns in crime.
  2. Visualize crime in a concise and efficient manner.
  3. Analyze crime as Time Series.
  4. Superpose the data onto a map to find better patterns.
  5. Build a Ball tree of some types of crime segmented by time in intervals of 1 hour. Ball trees can be queried very fast to determine if a person is in danger of being victim of a crime based on his/her location and time. Instead of using the Euclidean distance, I will use the Haversine distance, this will help me query the tree based on GPS coordinates of a person. The following is a list of crimes to be included in the tree:
    1. Homicide: Code 1XX
    2. Sexual Assault: Code 2XX
    3. Robbery: Code 3XX
    4. Battery: Code 4XX
    5. Assault: Code 5XX
    6. Theft: Code 8XX
    7. Motor Vehicle Theft: Code 9XX
    8. Deceptive Practice: Code 11XX

TODO:

  1. Separate violent crimes from non-violent ones.
  2. Add more dimensions to the Ball tree. Maybe I can add one type of crime per dimension, instead of all crimes in 1D.

Author: Andreas P. Koenzen akoenzen@uvic.ca

Version: 0.1

Disclaimer: This analysis was made as a project for the course CSC 578D, at the University of Victoria during Fall of 2018. I do not own the data used in this analysis, but the analysis was entirely devised by me, including the code.

In [96]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sbn
import json
import re
import datetime

from sklearn.neighbors import BallTree
from ast import literal_eval

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from IPython.display import Javascript, IFrame
from IPython.core.display import HTML

Miscellaneous Configuration:

In [97]:
HTML(
    '<style>'
    'ol           {counter-reset:item}'
    'ol li        {display:block}'
    'ol li:before {content:counter(item) ". ";counter-increment:item;font-weight:bold}'
    'iframe       {border:0px;}'
    'table        {float:left;}'
    '</style>'
)

plt.style.use(['default', 'ggplot'])
plt.rcParams.update({'font.size': 8})
Out[97]:

Environment Variables and Constants:

In [98]:
# must be set to replicate
data_set_home = %env DATA_SETS_HOME
mapbox_token  = %env MAPBOX_TOKEN

# the earth's radius in kilometers
KMS_PER_RADIAN = 6371.0088

Functions:

In [99]:
# function which returns the hours of daylight
# given the day of the year, from 0 to 365
def hours_of_daylight(date, axis=23.44, latitude=47.61):
    """
    Compute the hours of daylight for the given date.
    
    :param date:     A pandas' date object.
    :param axis:     Earth's tilt.
    :param latitude: The latitude of the location for which to compute
                     the daylight hours.
                     
    :returns: A scalar value denoting the amount of daylight hours.
    """
    diff = date - pd.datetime(2000, 12, 21)
    day = diff.total_seconds() / 24. / 3600
    day %= 365.25
    m = 1. - np.tan(np.radians(latitude)) * np.tan(np.radians(axis) * np.cos(day * np.pi / 182.625))
    m = max(0, min(m, 2))
    
    return 24. * np.degrees(np.arccos(1 - m)) / 180.

# function which returns if a person is in danger or not of being victim of a crime
# based on their current time of the day and location within the boundaries
# of the city of Chicago.
def am_i_in_danger(tree_list, time: datetime.time, coordinates=(41.841832, -87.623177), radius=0.5):
    """
    Query a Ball tree previously train to compute crimes within a fixed radius,
    and if enough points are returned, according to a constant, then we can resolve
    that that person *MAY* be in danger.
    
    Definition of danger: If the count of crimes within 500 meters of the location
    exceeds or is equals to 10 then it *MAY* be a danger zone.
    
    :param tree_list:   A list of Ball trees. Each index corresponds to 1 hours, starting at midnight.
    :param time:        The time of the day as a datetime.time object.
    :param coordinates: A tuple containing the latitude and longitude.
    :param radius:      The radius in kilometers.
    
    : returns: True if the person is in danger, False otherwise.
    """   
    return 'Yes, you may be in danger.' if np.asscalar(tree_list[time.hour].query_radius(
        np.radians(coordinates).reshape(1, -1), 
        r=(radius / KMS_PER_RADIAN), 
        count_only=True
    ) >= 10) else 'No danger around here, but depends on your luck :)'

Load the entire dataset into memory as a DataFrame:

  1. Index the dataset by the Date column.
  2. One row corresponds to one crime.
In [100]:
raw_data = pd.read_csv(
    "{0}/CSC_578D/Project/Chicago_Crimes_2017.csv".format(data_set_home),
    index_col=['Date'],
    parse_dates=True,
    date_parser=lambda x: pd.datetime.strptime(x, "%m/%d/%Y %I:%M:%S %p")
)
raw_data.sort_index(inplace=True)

# print basic information about the data
print('Shape: {}'.format(raw_data.shape))
print('Size in memory: {} MB'.format(int(raw_data.memory_usage(index=True, deep=True).sum() / 1024 ** 2)))
raw_data.head()
Shape: (267766, 21)
Size in memory: 184 MB
Out[100]:
ID Case Number Block IUCR Primary Type Description Location Description Arrest Domestic Beat ... Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
Date
2017-01-01 10959471 JA281085 026XX W LITHUANIAN PLAZA CT 1750 OFFENSE INVOLVING CHILDREN CHILD ABUSE RESIDENCE False True 831 ... 18.0 66 20 1159768.0 1858789.0 2017 02/10/2018 03:50:01 PM 41.768226 -87.689933 (41.768226489, -87.689933063)
2017-01-01 10801862 JA100930 079XX S RACINE AVE 0281 CRIM SEXUAL ASSAULT NON-AGGRAVATED RESIDENCE False False 612 ... 21.0 71 02 1169694.0 1852368.0 2017 02/14/2017 03:49:42 PM 41.750397 -87.653735 (41.750396912, -87.653735437)
2017-01-01 11061031 JA352488 069XX W DIVERSEY AVE 1544 SEX OFFENSE SEXUAL EXPLOITATION OF A CHILD RESIDENCE False False 2511 ... 36.0 18 17 1129425.0 1917836.0 2017 02/10/2018 03:50:01 PM 41.930830 -87.799812 (41.930830024, -87.799812262)
2017-01-01 11061056 JA397774 059XX S HONORE ST 1753 OFFENSE INVOLVING CHILDREN SEX ASSLT OF CHILD BY FAM MBR RESIDENCE False False 714 ... 15.0 67 02 1165036.0 1865263.0 2017 02/10/2018 03:50:01 PM 41.785882 -87.670440 (41.785882234, -87.67044034)
2017-01-01 11103097 JA450689 079XX S DR MARTIN LUTHER KING JR DR 1585 SEX OFFENSE OTHER RESIDENCE False False 624 ... 6.0 44 17 1180276.0 1852475.0 2017 02/10/2018 03:50:01 PM 41.750455 -87.614955 (41.750454627, -87.614955093)

5 rows × 21 columns

Pre-processing:

  1. Since the dataset is large in size, I need to remove some unused or duplicate columns. The following are redundant columns, which will be removed.
    1. ID
    2. Case Number
    3. Block: We can remove the block because we will use other information to pin point locations, like Beat, etc.
    4. Primary Type: I will use the IUCR to categorize danger.
    5. Location Description: I will use the "Domestic" column to filter out crimes commited indoors.
    6. Arrest
    7. Community Area
    8. Ward
    9. Year
    10. FBI Code
    11. Updated On
    12. X Coordinate
    13. Y Coordinate
  2. Index the entire dataset by the Date column.
In [101]:
# filter out redundant columns
data = raw_data.drop([
    'ID', 
    'Case Number',
    'Block',
    'Primary Type',
    'Location Description',
    'Arrest',
    'Community Area',
    'Ward',
    'FBI Code',
    'Updated On',
    'X Coordinate',
    'Y Coordinate'
], axis=1)
data.head()

# filtered data
non_domestic_crime = data.loc[(data['Domestic'] == False)]
domestic_crime = data.loc[(data['Domestic'] == True)]
Out[101]:
IUCR Description Domestic Beat District Year Latitude Longitude Location
Date
2017-01-01 1750 CHILD ABUSE True 831 8.0 2017 41.768226 -87.689933 (41.768226489, -87.689933063)
2017-01-01 0281 NON-AGGRAVATED False 612 6.0 2017 41.750397 -87.653735 (41.750396912, -87.653735437)
2017-01-01 1544 SEXUAL EXPLOITATION OF A CHILD False 2511 25.0 2017 41.930830 -87.799812 (41.930830024, -87.799812262)
2017-01-01 1753 SEX ASSLT OF CHILD BY FAM MBR False 714 7.0 2017 41.785882 -87.670440 (41.785882234, -87.67044034)
2017-01-01 1585 OTHER False 624 6.0 2017 41.750455 -87.614955 (41.750454627, -87.614955093)

Describe the dataset:

Gather metrics for:

  1. Top 10 Most Violent Beats: In police terminology, a beat is the territory and time that a police officer patrols.
  2. Describe the dataset using Time Series analysis.

Top 10 Most Violent Beats:

In [102]:
tmp = data.groupby('Beat').size().sort_values(ascending=False)[0:10]
top_10_mv_beats = pd.DataFrame(index=range(1, 11), data={
    'Beat #': tmp.index,
    'Crime Count ' + str(data['Year'].unique()[0]): tmp.values
})
top_10_mv_beats
Out[102]:
Beat # Crime Count 2017
1 1834 3153
2 111 2272
3 112 2253
4 1831 2211
5 421 2198
6 511 1992
7 624 1907
8 122 1897
9 1011 1886
10 423 1847

Result:

The most violent Police Beat in the city of Chicago is Beat #1834. Later on we would see more about the crimes that are commited in this particular Beat and others as well.

Segment beats into 5 crime categories using the count of crimes for each beat:

The color for each category in the Choropleth, were drawn from this website.

----- Category 1 Category 2 Category 3 Category 4 Category 5
Range 0-600 601-1200 1201-1800 1801-2400 2401-10000
Color #fee5d9 #fcae91 #fb6a4a #de2d26 #a50f15
In [103]:
beats_sorted = non_domestic_crime.groupby('Beat').size().sort_values(ascending=False)

categories = []
for i in range(0, 5):
    categories.append(beats_sorted[(beats_sorted > (i * 600)) & (beats_sorted <= (10000 if (i == 4) else (i * 600) + 600))])
for e in categories:
    pd.DataFrame(data=e.head(2), columns=['Crime Count']) # pretty print the first 2 of each
Out[103]:
Crime Count
Beat
726 598
1235 598
Out[103]:
Crime Count
Beat
1824 1181
813 1167
Out[103]:
Crime Count
Beat
1833 1698
511 1649
Out[103]:
Crime Count
Beat
111 2227
112 2222
Out[103]:
Crime Count
Beat
1834 3055

Group data by crime type and keep the top 10:

Filter both by Beat and IUCR. We should have a multi-index DataFrame, where the first index is Beat and the second will be IUCR, then we would have only one column named Crime Count listing the count of crimes. Then finally we should join our DataFrame with the one built from the Chicago_Police_IUCR_Codes.csv file and add both description columns from the latter.

In [104]:
# use Pandas Hierarchical Indexing
tmp = non_domestic_crime.groupby(['Beat', 'IUCR']).size()
tmp = tmp.sort_values(0, ascending=False).sort_index(level='Beat', sort_remaining=False)
tmp = tmp.groupby(level=0).head(10) # list the top 10 for each Beat
tmp = pd.DataFrame(index=tmp.index, data={'Crime Count': tmp})
# tmp.head()

# now match each IUCR with a description from Chicago_Police_IUCR_Codes.csv file
iucr_codes = pd.read_csv(
    "{0}/CSC_578D/Project/Chicago_Police_IUCR_Codes.csv".format(data_set_home)
)
iucr_codes['IUCR'] = iucr_codes['IUCR'].apply(lambda x: x.zfill(4)) # pad with leading 0 (zero) if IUCR entry is less than 4
iucr_codes.set_index('IUCR', inplace=True)
# iucr_codes.head()

# add new column with the secondary description
# see: http://blog.rlucas.net/bugfix/pandas-merge-woes-on-multiindex-solved/
tmp.index.levels[0].name = 'Beat'
tmp.index.levels[1].name = 'IUCR'
iucr_codes.index.name = 'IUCR'
tmp = tmp.join(iucr_codes) # magic! no need to specify the merging key!

# drop innecessary columns
tmp = tmp.drop(['INDEX CODE'], axis=1)
tmp.columns = [
    'Crime Count', 
    'Primary', 
    'Secondary'
]

crime_by_beat_iucr = tmp
tmp.head(20) # pretty print the first 2 beats
Out[104]:
Crime Count Primary Secondary
Beat IUCR
111 0860 510 THEFT RETAIL THEFT
0890 375 THEFT FROM BUILDING
1150 225 DECEPTIVE PRACTICE CREDIT CARD FRAUD
0810 164 THEFT OVER $500
0820 164 THEFT $500 AND UNDER
0870 153 THEFT POCKET-PICKING
0460 119 BATTERY SIMPLE
0560 69 ASSAULT SIMPLE
1330 46 CRIMINAL TRESPASS TO LAND
1153 35 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300
112 0860 508 THEFT RETAIL THEFT
0890 486 THEFT FROM BUILDING
1150 210 DECEPTIVE PRACTICE CREDIT CARD FRAUD
0820 160 THEFT $500 AND UNDER
0870 156 THEFT POCKET-PICKING
0460 122 BATTERY SIMPLE
0810 106 THEFT OVER $500
0560 61 ASSAULT SIMPLE
1330 38 CRIMINAL TRESPASS TO LAND
1152 36 DECEPTIVE PRACTICE ILLEGAL USE CASH CARD

Time Series analysis:

Requisite: The dataset needs to be indexed by date.

Plot data by month:

In [105]:
# build a new index for months
months = pd.period_range('2017-01', periods=12, freq='M')

month = data.groupby(data.index.month).size()
month = pd.DataFrame(index=months, data={'Crimes': list(month)})

fig, ax = plt.subplots(figsize=(6, 4))
_ = month.plot(
    ax=ax,
    title='Crimes by Month',
    style=['--']
)
_ = ax.set_ylabel('Count')
_ = ax.set_xlabel('Month')

Result:

Given the chart above we can see that a pattern emerges from the data, where crimes are commited more often during summer months. So I will formulate an hypothesis and look further to see if it holds.

Hypothesis 1:

Crimes are higher during summer months, due to more people being outdoors than indoors.

Hypothesis 1 Testing:

To test this hypothesis I will separate domestic crimes from non-domestic crimes and see if the pattern is still present.

In [106]:
months = pd.period_range('2017-01', periods=12, freq='M')

month_outdoor = non_domestic_crime.groupby(non_domestic_crime.index.month).size()
month_outdoor = pd.DataFrame(index=months, data={'Crimes': list(month_outdoor)})

month_indoor = domestic_crime.groupby(domestic_crime.index.month).size()
month_indoor = pd.DataFrame(index=months, data={'Crimes': list(month_indoor)})

fig, (ax_left, ax_right) = plt.subplots(ncols=2, figsize=(12, 4))
_ = ax_left.set_ylabel('Count')
_ = ax_left.set_xlabel('Month')
_ = ax_right.set_ylabel('Count')
_ = ax_right.set_xlabel('Month')

_ = month_outdoor.plot(
    ax=ax_left,
    title='Non-Domestic Crimes by Month',
    style=['--']
)
_ = ax_left.legend(["{} Crimes Total".format(month_outdoor['Crimes'].sum())])

_ = month_indoor.plot(
    ax=ax_right,
    title='Domestic Crimes by Month',
    style=['--']
)
_ = ax_right.legend(["{} Crimes Total".format(month_indoor['Crimes'].sum())])

Result:

Well, we can see after separating the two sets (outdoor & indoor) crimes, that hypothesis 1 is rejected since both charts follow the same pattern. We can see in the right chart that indoor crimes also spike during summer months. Maybe we can formulate another hypothesis for why this pattern occurs.

Hypothesis 2:

Non-domestic crimes are higher during summer months, due to more daylight hours, and more people are in the streets.

Hypothesis 2 Testing:

In [107]:
hour = non_domestic_crime.groupby(non_domestic_crime.index.hour).size()
hour = pd.DataFrame(index=hour.index, data={'Crimes': list(hour)})
# hour

fig, (ax_left, ax_right) = plt.subplots(ncols=2, figsize=(12, 4))
_ = hour.plot(
    ax=ax_left,
    title='Crimes by Hour',
    style=['--']
)
_ = ax_left.set_ylabel('Count')
_ = ax_left.set_xlabel('Hour')

start = datetime.datetime(year=2017, month=1, day=1)
days = start + pd.to_timedelta(np.arange(365), "D")

# iterate all days an compute the daylight hours
hours = pd.DataFrame(
    data=days.map(lambda day: hours_of_daylight(day, latitude=41.841832)),
    columns=["Daylight Hours"],
    index=days
)
_ = hours.plot(
    ax=ax_right,
    title='Daylight Hours Chicago',
    style=['--'],
    color='b'
)
_ = ax_right.set_ylabel('Hours of Daylight')
_ = ax_right.set_xlabel('Month')

I have plotted Chicago crimes by hour on the left, and on the right we have a plot of daylight hours by month. We can observe that crime rises at 5:00AM, increases during the day, up to a peak at noon and later starts to decrease roughly at 6:00PM. This implies that crimes happen more often during daylight hours. That may be a reason why we observed more crimes during the summer months than winter months, but that is subject to much more analysis and variables like temperature and others.

Weekdays vs. Weekends:

In [108]:
compound = non_domestic_crime.groupby([
    np.where(non_domestic_crime.index.weekday < 5, "Weekday", "Weekend"), 
    non_domestic_crime.index.hour
]).size()
# compound

fig, (ax_left, ax_right) = plt.subplots(ncols=2, figsize=(12, 4))
_ = compound['Weekday'].plot(
    ax=ax_left,
    title='Crimes during Weekdays',
    style=['--']
)
_ = ax_left.set_ylabel('Count')
_ = ax_left.set_xlabel('Time')

_ = compound['Weekend'].plot(
    ax=ax_right,
    title='Crimes during Weekends',
    style=['--']
)
_ = ax_right.set_ylabel('Count')
_ = ax_right.set_xlabel('Time')

Results:

Above I plotted crimes segmented by Weekdays and Weekends, trying to further the analysis, but no distinctive patterns emerged from this last analysis, besides that crime rises a few minutes later on weekends. So I can say that more analysis is need to test Hypothesis 2. Maybe segment crimes by their type before doing the time series analysis.

Visualization:

  1. Plot crimes by Beat.
  2. Build a Choropleth using the Chicago Boundaries dataset.
In [109]:
ax = top_10_mv_beats.plot.bar(
    x=0,
    y=1,
    figsize=(6, 4),
    title='Top 10 Crimes Police Beats',
    color='r'
);
ax.set_ylabel('Count');
ax.set_xlabel('Beat #');
ax.set_axisbelow(True);
ax.yaxis.grid(color='gray', linestyle='dashed');
ax.xaxis.grid(color='gray', linestyle='dashed');

Choropleth:

  1. First separate the map into beats covering the entire city of Chicago.
  2. Plot the data using a choropleth chart.
  3. Filter crime by outdoor only.
  4. Filter crime by top 10 crimes per beat.
In [110]:
# read the boundaries dataset
boundaries = ''
with open("{0}/CSC_578D/Project/Chicago_Boundaries.geojson".format(data_set_home)) as json_file:  
    boundaries = json.load(json_file)
In [111]:
# build the multipolygons layers
layers = ''
layer = """
map.addLayer({{
  'id':'Beat_{}_{}',
  'type':'{}',
  'source':{{
    'type':'geojson',
    'data':{{
      'type':'Feature',
      'geometry':{{
        'type':'Polygon',
        'coordinates':{}
      }}
    }}
  }},
  'layout': {{}},
  'paint': {{
    '{}':'{}',
    '{}':{}
  }}
}});
"""
popups = ''
popup = """
map.on('click', 'Beat_{}_Fill', function (e) {{
    new mapboxgl.Popup().setLngLat(e.lngLat).setHTML('{}').addTo(map);
}});
map.on('mouseenter', 'Beat_{}_Fill', function () {{
    map.getCanvas().style.cursor = 'pointer';
}});
map.on('mouseleave', 'Beat_{}_Fill', function () {{
    map.getCanvas().style.cursor = '';
}});
"""
for _, e in enumerate(boundaries['features']):
    beat_num = e['properties']['beat_num']
    coordinates = e['geometry']['coordinates'][0]
    crimes = 0
    
    if int(beat_num) == 3100:
        continue
    
    # use a different color for each category
    colors = ['#fee5d9', '#fcae91', '#fb6a4a', '#de2d26', '#a50f15']
    color = ''
    for i, e in enumerate(categories):
        crimes = e.get(int(beat_num), default=None)
        if crimes != None:
            color = colors[i]
            break
            
    # get beat information
    beat_info = '<ol>'
    for index, data in crime_by_beat_iucr.iterrows():
        if int(index[0]) == int(beat_num):
            beat_info += '<li>{}-{} ({} crimes)</li>'.format(data[1], data[2], data[0])
    beat_info += '</ol>'
    
    layers += layer.format(
        beat_num, 
        'Fill',
        'fill',
        json.dumps(coordinates),
        'fill-color',
        color if color != '' else '#999',
        'fill-opacity',
        0.75
    )
    layers += layer.format(
        beat_num, 
        'Outline',
        'line',
        json.dumps(coordinates), 
        'line-color', 
        '#67000d', 
        'line-width', 
        0.60
    )
    popup_html = "<h1>Beat ID: {}</h1><p>Crimes #: {} crimes in 2017</p>{}".format(
        beat_num, 
        crimes,
        beat_info
    )
    popups += popup.format(
        beat_num,
        popup_html,
        beat_num,
        beat_num
    )
In [112]:
replacements = {
    'mapbox_token': mapbox_token,
    'layers': layers,
    'popups': popups,
    'total_crimes': str(beats_sorted.sum())
}

html = """
<!DOCTYPE html>
<html>
<head>
    <meta charset='utf-8'/>
    <title>Project</title>
    <meta name='viewport' content='initial-scale=1,maximum-scale=1,user-scalable=no'/>
    <script src='https://api.tiles.mapbox.com/mapbox-gl-js/v0.51.0/mapbox-gl.js'></script>
    <link href='https://api.tiles.mapbox.com/mapbox-gl-js/v0.51.0/mapbox-gl.css' rel='stylesheet'/>
    <style>
    body {margin:0; padding:0;}
    #map {position:absolute;top:0;bottom:0;width:100%;}
    .legend {
        background-color:#fff;
        border-radius:3px;
        bottom:30px;
        box-shadow:0 1px 2px rgba(0,0,0,0.10);
        font:12px/20px 'Helvetica Neue', Arial, Helvetica, sans-serif;
        padding:10px;
        position:absolute;
        right:10px;
        z-index:1;
    }
    .legend h4 {margin:0 0 10px;}
    .legend div span {
        border-radius:50%;
        display:inline-block;
        height:10px;
        margin-right:5px;
        width:10px;
    }
    .mapboxgl-popup {max-width:400px;font:12px/20px 'Helvetica Neue', Arial, Helvetica, sans-serif;}
    </style>
</head>
<body>
    <div id='map'></div>
    <div id='crime-legend' class='legend'>
    <h4>Chicago Non-Domestic Crimes by Police Beat per Year (2017)</h4>
    <div><span style='background-color:transparent'></span>No Data</div>
    <div><span style='background-color:#fee5d9'></span>0 - 600 crimes</div>
    <div><span style='background-color:#fcae91'></span>601 - 1200</div>
    <div><span style='background-color:#fb6a4a'></span>1201 - 1800</div>
    <div><span style='background-color:#de2d26'></span>1801 - 2400</div>
    <div><span style='background-color:#a50f15'></span>2401 - 10000</div>
    <h4>Total Non-Domestic Crimes: (total_crimes)</h4>
</div>
    <script>
    mapboxgl.accessToken = '(mapbox_token)';
    var map = new mapboxgl.Map({
        container: 'map',
        style: 'mapbox://styles/mapbox/streets-v10',
        center: [-87.623177, 41.841832],
        zoom: 9.5
    });
    map.on('load', function () {
        (layers)
        (popups)
    });
    </script>
</body>
</html>
"""

for key, value in replacements.items():
    html = html.replace("({})".format(key), value)

display(HTML('<iframe srcdoc="{srcdoc}" style="width:100%;height:800px;"></iframe>'.format(srcdoc=html)))