Visualizing Police Traffic Stops in Iowa City – Part 1 – Geocoding the data

Click here to explore this project!

 

Previously, I used Scrapy to set up a web scraper. I designed this scraper to collect data daily from the Iowa City Police Department’s activity log. One interesting question we could ask using this data is where are you most likely to get stopped by the police in Iowa City.

The data I have collected is all of the police activity, but specifically I want to just look at traffic stops. Lucky for me, I have been saving this information to a database. We can get what we are looking for by firing off a simple SQL query.

SELECT * FROM ICPDLog WHERE activity='TRAFFIC STOP';

At the time of data collection this returned approximately 1600 rows. Therefore we have data on 1600 traffic stops conducted by the Iowa City Police Department. Included in this data is the address (or intersection) that the traffic stop occurred at. Awesome!

BUT…. We want to visualize theses traffic stops on a map. Plotting these traffic stop locations would be much easier if we had their GPS coordinates instead of just their street address or intersection. This process of assigning a Latitude and Longitude  to an address is called Geocoding. There are several services that allow this to be done online. I am selecting to do this through Google Maps. They allow you 2500 geocoding requests per day, and have a easy-to-use Python library which we can use to make requests to the API in just a few lines of code.

So what I want to do first request all of these rows from the SQL database and write them to a file so I can give it a quick inspection before I start wasting my 2500 requests by sending bad data. We can even reuse part of the code we wrote before to store these records into the database. To the dbconn.py file we can add this easy query to our database

def get_items(activity):
    import pandas as pd
 
    df = pd.DataFrame(session.query(LogItem.dispatch, LogItem.activity, LogItem.addr, LogItem.disposition)
                      .filter(LogItem.activity==activity)
                      .all())
 
    return df

So after a quick inspection everything looks good. However, after testing the Google Geocoding API in a web browser. I find it does not work with intersections which contain slashes. For example Jefferson St/Clinton St fails to return a geocode. But some experimenting does show that Jefferson St and Clinton St does successfully return a correct geocode! So that is an easy fix in our data!

This quick script below does the trick.

import dbconn
import pandas as pd
 
# Get the dataframe again
df = dbconn.get_items("TRAFFIC STOP")
 
# Create a new list to hold our new addresses
new_addr = []
 
# Check each address and split the string if it contains a slash
for addr in df.addr:
    new = addr.split('/')
    new = " and ".join(new) # join them back together using the word "and"
    new += ', Iowa City, IA' # append the city and state for accurate geocoding
    new_addr.append(new)
 
# add the new addresses as a new column to the dataframe
df['new_addr'] = new_addr
 
# write to file, inspect one more time
df.to_csv('data_for_geodocoding.csv')

Again, everything looks good. Its time to geocode all of these addresses with the next script:

import pandas as pd
import googlemaps
import time
from datetime import datetime
 
api_key="[YOUR GOOGLE API KEY]" 
data = "data_for_geocode.csv"
 
df = pd.read_csv(data)
 
# initialize the googlemaps api library with your api key
gmaps = googlemaps.Client(key=api_key)
 
geocodes = []
 
 
for i in range(len(df)):
    address = df.iloc[i].new_addr
    dispatch = df.iloc[i].dispatch
    print("Trying: %s *** %i" % (address, i))
    geocode_result = gmaps.geocode(address)
    try: # not all addresses will successfully geocode, this try/catch block will make sure we do not crash while looping
        a = geocode_result[0]
        lat = a['geometry']['location']['lat']
        lng = a['geometry']['location']['lng']
        output = {"dispatch": dispatch, "new_addr": address, "lat": lat, "lng": lng}
        print("Successfully got: " + str(output))
        geocodes.append(output)
    except: # if the geocode result returns nothing we will just insert 0's at the lat and lng
        output = {"dispatch": dispatch, "new_addr": address, "lat": 0, "lng": 0}
        geocodes.append(output)
 
    time.sleep(0.5) # wait half a second as to not overwhelm the API
 
 
df_geocodes = pd.DataFrame(geocodes)
df_geocodes.to_csv('geocoded_data.csv') # save our geocoded data to a file
 
print("DONE!!!")

Now, for simplicity we just saved the dispatch (the unique id of each data point), address, and geocode to the file. Lets combine this with out other file so we can have the full records.

df = dbconn.get_items("TRAFFIC STOP")
df_geo = pd.read_csv("geocoded_data.csv")
 
# merge the dataframes on the unique ID
df_join = pd.merge(df, df_geo, how='left', on='dispatch')
 
df_join.drop(['Unnamed: 0'], axis=1, inplace=True) # clean up
df_join.to_csv('joined_data.csv')

Awesome! We now have our database records with their lat and lng in a data frame. Lets add this back to the database for safekeeping.

To our dbconn.py file lets add another method:

def update_geocode(dispatch, lat, lng):
 
    session.query(LogItem).filter(LogItem.dispatch == int(dispatch))\
        .update({"geocode_lat": float(lat), "geocode_lng": float(lng)})
 
    session.commit()

This will take the dispatch, lat, and lng. The dispatch will identify the unique record, and the lat and lng will be the values we want to add to the database. But before this will work, we need to alter our database table so that these columns exist. The following sql statements will add the columns:

ALTER TABLE ICPDLog ADD COLUMN lat FLOAT;
ALTER TABLE ICPDLog ADD COLUMN lng FLOAT;

Now lets write a script to do this work:

import dbconn
import pandas as pd
 
data = "joined_data.csv"
df = pd.read_csv(data)
 
# get just the columns we need to update our records
df = df[['dispatch','lat','lng']]
 
for i in range(len(df)):
    dispatch = df.iloc[i].dispatch
    lat = df.iloc[i].lat
    lng = df.iloc[i].lng
 
    dbconn.update_geocode(dispatch, float(lat), float(lng))
 
print("Done, go check though")

We now have updated our database so every record with an activity of TRAFFIC STOP now has a geocode. All of the other records are left untouched (their geocodes are null, but feel free to geocode them if you want)

We can now move on to the next step, mapping all of this newly geocoded data

Part 2 (Coming Soon)

 

Leave a Reply

Your email address will not be published. Required fields are marked *