Categories
Data

Reverse Geocode a Set of Lat-Long Coordinates to City + Country

This tutorial demonstrates how to reverse geocode a set of latitude-longitude coordinates to city and country using Python and the Google Maps API.

I have previously written about my GPS location data from this summer’s travels. The data set, gathered with the OpenPaths app, contains lat-long coordinates and timestamps. Without city or country data, any visualizations would be very simplistic because all I have is coordinates and timestamps. It would be nice to reverse geocode these coordinates to add city and country data to each point. Then, I could create richer, more informative marker popups that include this new geographical information.

Texas A&M Geoservices runs a nice web service that allows you to upload a data set of lat-long values as a batch, and receive address data back. However, their database only covers the United States (it requires you to have a state field in addition to lat and long) so it won’t work for this case.

Reverse Geocode with the Google Maps API

Instead, I will use the Google Maps API.  Google provides a JSON API that allows you to request address data for a coordinate pair. Using Python, I will reverse geocode each of the 1,759 GPS coordinates in my data set to city + country. The original data set is available here and all of this code is available in this GitHub repo, particularly this IPython notebook. First I import the necessary modules:

import pandas as pd, requests

Next I load the data set that contains lat-long coordinates and add three new columns – geocode_data (to contain the JSON blob Google sends back), city, and country:

df = pd.read_csv('summer-trip-gps.csv')
df['geocode_data'] = ''
df['city'] = ''
df['country'] = ''

Now I write a function to handle the reverse geocoding requests. This function takes a string argument, latlng, in the form of “48.355328,11.7917104” and sends it to the geocoding API. If the API returns a result then the function returns it, otherwise it returns None.

def reverse_geocode(latlng):
    result = {}
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
    request = url.format(latlng)
    data = requests.get(request).json()
    if len(data['results']) > 0:
        result = data['results'][0]
    return result

Then I map my reverse_geocode function to each latitude-longitude value in the dataframe’s latlng column:

df['geocode_data'] = df['latlng'].map(reverse_geocode)

Parsing City and Country data

Due to the ambiguity of terminology in different countries – city vs town, state vs province, county vs region, etc. – most reverse geocoders return a full address with normalized address elements (see Factual’s API for an example using OSM data). Google, however, returns a more flexible list of address components, each tagged with one or more “types.” I’ll have to parse this data to find city name – or the closest thing to it – for each set of coordinates.

For each element in the series of reverse-geocode data I received from Google, I search inside the address components list for an item that contains the string “country” in its list of types. Then I do the same in search of a component that represents municipality. Depending on the country, different types of components could represent what I call a city or town in the United States – and when one is found, I return it.

def parse_country(geocode_data):
    if (not geocode_data is None) and ('address_components' in geocode_data):
        for component in geocode_data['address_components']:
            if 'country' in component['types']:
                return component['long_name']
    return None

def parse_city(geocode_data):
    if (not geocode_data is None) and ('address_components' in geocode_data):
        for component in geocode_data['address_components']:
            if 'locality' in component['types']:
                return component['long_name']
            elif 'postal_town' in component['types']:
                return component['long_name']
            elif 'administrative_area_level_2' in component['types']:
                return component['long_name']
            elif 'administrative_area_level_1' in component['types']:
                return component['long_name']
    return None

Finally, I map my parse_city and parse_country functions one at a time to the series of reverse-geocode data I received from Google, and then save to CSV:

df['city'] = df['geocode_data'].map(parse_city)
df['country'] = df['geocode_data'].map(parse_country)
df.to_csv('geocoded.csv', encoding='utf-8', index=False)

Next steps

That’s it. I now have a data set that contains lat-long coordinate pairs, time stamps, city name, and country. For reference, once again here is the original data set and here is the new reverse geocoded data set. Interestingly, Google’s API returned no results for any of the lat-long coordinates in Kosovo, so I had to enter the city and country for these (few) rows manually.

This Python code can be easily changed to use a different geocoding API (such as Factual’s) or extract the full address text instead of city and country components. You could also easily tweak this geocoder to search for municipality in other types of address components, but these four I used covered my entire data set accurately. The data can be visualized with nice informational pop-ups now with tools like CartoDB, Leaflet, or Mapbox and Tilemill.

9 replies on “Reverse Geocode a Set of Lat-Long Coordinates to City + Country”

Hi Geoff,
nice writeup!
Have you heard about the geopy package available in the pypi repository:
https://pypi.python.org/pypi/geopy/1.4.0

It does basically the same things you do, but it has wrapped everything in a nice-to-use api and you can use many more geo information provider like openstreetmap…
Maybe it helps you for further projects!

Best regards,
Sebastian

Yes – geopy is a great tool for geocoding and simple spatial analysis with python. My goal was more to understand the Google API and provide a guide for others learning how to use it. I’d also recommend geopy for an easy to use package.

[…] One of the biggest challenges came from the Great Schools API. While the GS nearby API provides schools within a specified radius of a geographic position, I noticed that schools weren’t showing up within the radius specified. Fortunately, I was familiar with Google’s Directions API, which provides a reverse geocode feature that provides a town name for a given spot. This was far from straightforward since google provides multiple layers of data for each coordinate. On top of that, town names can be locality, postal_town, administrative_area_level_2, administrative_area_level_1. This necessitated the following code from Geoff Boeing: […]

We are in the middle of developing a mapping application and for some reason the google apis are not returning the correct city. We are getting good results with the county being returned based on the geocoordinates, but not the city. Sometimes the city is off by miles. For example, I am sending these coordinates:
Latitude: 45.049783885754024
Longitude: -93.06243156597827
And the routine is returning Saint Paul, Mn instead of Vadnais Heights , Mn.
It has been very frustrating.

Hi.
The operation takes about a second to return the district/city name, given the coordinates.
What do we do if the lat lon data is in GBs? (say millions of records.)

We are struggling using the Google APIs and getting the correct city and county when we pass the geocoordinates to the API. In our application the resolution of city and county with the geocoordinates is very important and recently we had geocoordinates that we passed to the API return a city that was 80 miles north of where the geocoordinates were plotted on the map and we tested plotting the coordinates in two ways independantly. This is frustrating. Also, we have a set of coordinates that are clearly in Coon Rapids (city) and in Anoka (county), but when the API is called it returns Mora (city) and Kanabec (county) in one case and returns Minneapolis (city) and Anoka (county) in another case. Minneapolis and Coon Rapids are close to one another, but they do not border each other! So, we are getting sporadic results.

Leave a Comment