8 Petabytes of Map Tiles

The Task

I recently had the task of serving 200+ terabytes of map tiles for users through a web map. The two map products were 250 meter croptype layers for 2000 to 2014. A little background on the project, this is for a USGS NASA funded project with the mission of mapping the world's croplands at 30 meter resolution over two decades.

The process is quite simple and nothing unusual, except for the ability to make these maps dynamic! It's a far simpler task to create the tiles once and serve them forever as they were originally created. Granted, my work is hardly dynamic at this point(switching between years in an image stack and using different masks), but I can easily extend the concept. All of this is accomplished by using Google's Earth Engine, not to be confused with Google Earth.

About the Solution

The primary pieces for the solution are AWS Cloudfront, Redis, Heroku, Python Flask and Google Earth Engine. Each of these tools were chosen for a specific purpose. First, Google Earth Engine provides the power for the creation of the tiles.

Google Earth Engine brings together the world's satellite imagery — trillions of scientific measurements dating back over 40 years — and makes it available online with tools for scientists, independent researchers, and nations to mine this massive warehouse of data to detect changes, map trends and quantify differences on the Earth's surface. Applications include: detecting deforestation, classifying land cover, estimating forest biomass and carbon, and mapping the world’s roadless areas.

It's an amazing tool and I always find myself running into the mathematics and statistics wall when playing around with it. That's how powerful of a tool it is for the more experienced remote sensing scientist.

The next tool or set of tools is my proxy that connects to Google Earth Engine to create the map, build the tile url and return it to the user. Google Earth Engine has a python api to interact with its many commands and except for its few quirks, this api is nearly identical to its more popular usage in the javascript playground. This little app is living on my python flask server on heroku. Below is the basic process.

  1. A xyz slippy tile map request comes into my server.
  2. A flask view parses out the necessary information to build the map and retreives a map id and token from Google Earth Engine.
  3. The map id and token are stored in Redis for 12 hours for future requests.
  4. The app then builds a Google Earth Engine url for a tile on this map with the same xyz from the first step.
  5. The python app then gets this tile and returns it to the original request.

For the most part this works quite smoothly. However, with more complex maps, tiles can take a significant amount of time to be generated by Google Earth Engine.

The solution for this is to through a cache layer in front of these tiles. Amazon's Cloudfront is the perfect solution to this and brings incredible ease of use and low cost charging pennies for gigabytes. (Although my next step is to require signed urls for accessing these tiles, but in the mean time I configured some alarms!)

How many tiles?

The neat part of this solution is the number of tiles that are suddenly available to the user. Each global layer with all of the zoom levels from 1 to 18 amounts to 91,625,968,980 tiles, EACH! Now consider that there are numerous layers each with several years... 2,748,779,069,400 tiles. With an average size of a 2-3 kb, this is a possible 8 petabytes of data.

Maybe I need to reconsider that signed url on AWS Cloudfront after all.

Sample Tiles

Here is a view of a leaflet map with one of these layers and several tiles from somewhere in Africa. Please ignore the control aesthetics as it is still a work in progress, but the play buttons work and allow the user to loop through the years!

Leaflet Tiles from Google Earth Engine

Next time I will talk about the Leaflet maps part of this and how I incorporate Angular into the mix.

Flagstaff Soap Co

Why Soap

My wife and I first started making soap a few years ago. We had always experimented with similar crafts and soap was something that we eventually tried. I actually made the first few batches without any assistance from my wife, but she soon took over and greatly improved on my work.

The Store

It wasn't long until we had piles of different varieties of soap in our house. At the time I was waiting to hear back about a permanent position with the Coconino National Forest as the Wilderness and Trails Manager on the Red Rock Ranger District, a process that had already been many months old. Eventually I heard back from the USFS and did not get the position, giving us the motivation to pursue some alternate ideas.

At the same time as my job situation was in flux, my wife and I found a cute little space in downtown Flagstaff. We thought what the heck, lets go for it! We signed a lease in March of that year and began a two month build-out to get the store ready and our inventory stocked.

Making Everything

We have always prided ourselves on making our own products and not simply reselling products made by some other company. This attitude even applied to all of our furniture!

Building a Retail Counter

Building the Counter for the Store

Building a Table

Building a Table

Soap Work Bench

Building the Counter for the Store

Three Years Later

It has now been three years since we first started and the only regret is that our space isn't bigger! We have been fortunate that our sales have increased year over year by about 20%. I'm not sure how long that can continue as eventually we will hit a maximum amount of revenue per square foot of our space.

One of the best things about our business getting older is that everything is easier and faster. Alisha makes soap in half the time compared to when we first opened and all of the supporting activities have been streamline.

Even more importantly, we have some wonderful employees that give my wife and I time away from the business. In our first year, we were open seven days a week with no employees!


Owning a business has been a tremendous learning experience and I wouldn't trade it for anything else. While the soap store does not fit into our long term plan, it will be difficult to leave behind.

Soap Work Bench

A Bar of Soap

Averaging Mobile Locations

The Problem

When collecting GPS point data it is generally acceptable that averaging multiple captures leads to improved results. How does a mobile phone collecting location using a combination of wifi, network towers and gps change the data collection process? Should a weighted average be used based upon the accuracy returned by the device?


Coming Soon

Any Other Ideas?

If you have any please let me know. I will be updating this entry as I explore this problem.

Google Maps Imagery Date

The Problem

One of the issues that I have been running into with my internship with USGS is attaching a date to the very high resolution imagery that is provided by Google Maps. Without the temporal context, a user cannot make provide useful training or validation data since the land cover type may have changed from one year to another. For example, a forest may have been converted to agricultural land.

Potential Solutions

Ideally I would find a programmatic solution to this issue as I need to be able to determine the imagery date for thousands of locations. I have found a few partial solutions and will discuss those below.

Google Earth Timeline

Using the Google Earth API Plugin, it is possible to find all of the images that are available for the location. However, there are three primary issues with this method. One is that the plugin is deprecated and will only be live until the end of the year. The second is that the most recent image in the stack may be a cloud cover image and filtered out from use in Google Maps. The third issue is that the plugin is not even supported on all Chrome Versions.

Bing Maps Headers

A tool exists that extracts the capture date from the headers of the imagery, but it hasn't been updated in four years. I'm hesitant to put much effort into exploring this option.

Temporal Comparison Based on Higher Resolution Data

It may be possible to detect whether a change has occurred in the last n years using freely available imagery sources such as Landsat or Modis. However, I can see two issues with this method. First, there would be a loss in resolution compared to the Google Satellite image. Second, this may create false positives in highly variable land cover types.

For example, it would not be difficult to detect a change from forest to urban. However there may be errors in the agricultural land cover type due to changes in crop type or environmental variables such as weather.

This method could likely give me a score for whether a particular location has been the same land cover type for the last n years, while the user gives me which land cover type it is.

Any Other Ideas?

If you have any please let me know. I will be updating this entry as I explore this problem.

Update 7/10/15

Direct access through the Federal Government's contract with Digital Globe has made this issue moot as I can request imagery and retrieve the metadata containing the capture date. I have put some of these images into a proof of concept at croplands.org/app/classify.

IBM Extreme Blue

Summer Internship

For this summer of 2015, I am an IBM Extreme Blue Intern. It has been an amazing opportunity so far and I highly recommend it. Instead of doing test or QA, I am part of a four person team that takes a high level idea and builds it into a product with a legitimate business case. At the end of the 12 week process, we present our product to top IBM executives.

Our team is creating a Predictive Analytics environment for IBM’s IT service support team at a major IBM customer. We are working towards improving their capability to track, analyze, and understand their IT service tickets. The project involves tying together several different IBM products and services, extracting data from different sources, performing statistical analysis, making actionable predictions, and providing visualizations through both web and mobile dashboards.

Extreme Blue Team

The team with one of our mentors

USGS Internship

Student Developer

I began my position with the US Geological Survey in May of 2014 as a student developer. My workis in support of a NASA funded project to map global agricultural lands using remote sensing(satellites). The justification for the project is that with better information on global cropland, food security may be improved and water productivity increased.

My Role(s)

In this position I wear a few different hats. I am the sole software focused member of the team and support the team in various tasks while also working to create some specific products for the project. In the creation of these products, I am responsible for all software development activities from architecture to implemenation to maintenance and everything else in between.

Three Parts

The product that I am creating has three primary parts, the restful server and database, the angular web application and a mobile application for data collection.

REST Server

The backend is basically running an instance of Python Flask on Heroku with a Postgresql database. Additional components include the following:

  • REDIS cache and reduce workload on server.
  • AWS Cloudfront caches many of the get responses.
  • Python Celery with RabbitMQ handles all of the async tasks.
  • Google Earth Engine provides tiles for different map layers and spational analysis.

Web Application

Angular and Leaflet maps dominate the web application and the map view dominates the application. Several objectives exist for the web application to facilitate work on the project and present products to the public.

  • A simple map view with a legend and opacity controls to display products created by the team.
  • Abiltity to create, filter, review and download training and validation data.
Web Application Leaflet Map

Web Application Map View using Leaflet and Google Earth Engine

Mobile Application

One of the most challenging aspects of this project is collecting significant amounts of training and validation data. In the United States, this is less of an issue and here the work is less of focus given the amount of data of agriculture currently present. The goal of the project is to focus on a global scale and remote areas of Africa are even more important in protecting food security.

One possible way to collect much of the information needed is to leverage volunteers with the UN's Food and Agriculture Organization. The FAO currently collects statistics on agriculture and yield from local governments. Our project is a natural compliment to these statistics as the method of obtaining the information is removed from local or regional politics.

The easiest way to collect this information today is with a mobile application. I currently working on a cordova based app to facilitate this.

Mobile Application

Collect View