What is Kepler.gl?
Kepler.gl, or just Kepler, is an open-source browser-based mapping visualization tool built on top of deck.gl, an open-source WebGL framework designed to visualize large-scale geospatial datasets. Kepler offers extensive capabilities for mapping geospatial data without the need to download any software or send data to another server. All the magic happens at home, on your machine.
In this tutorial, we’ll look at how to make a few different types of visualizations using a data set already provided by Kepler. Use this link to open Kepler in a new tab and follow along.
Loading Data
When Kepler is first opened, it gives the option to load data. Kepler accepts CSV, GeoJSON, and Arrow files. There is also the option to load a JSON. In this context, JSONs are used for configuration rather than uploading data. More information on geospatial data types can be found here.
For this tutorial, we will be working with a data set provided by Kepler. Usually, this data set can be loaded automatically from Kepler’s demo page by clicking on the Try Sample Data tab in the upper right corner. But as of their most recent update to Kepler 3.0, it is no longer possible, so we will be loading it manually. Download the data.csv file from Kepler’s GitHub repo and upload it to follow along.
When you upload your data, the map probably won’t look the same as the image above, but that’s not important just yet. To start, let’s hide the control panel and change the perspective to 3D (buttons outlined in red). From here you should see data already loaded onto the map. Have a look around and get a feel for the controls.
Exploring Data
Before diving into visualizations, let’s take a look at the data itself. The dataset we are working with represents New York City taxi trips for one whole day on January 15th, 2015.
Bring back the control panel and mouse over to the data set. An icon will appear that opens up the data table (see red circle). Now we can see a table showing our data.
Besides spatial data, Kepler is capable of reading temporal and attribute data. Here we see the table provides spatial data in the form of latitude and longitude of both pickups and dropoffs. There is also a cost breakdown for each trip including fare and tip amount as well as pickup and dropoff times. An example record from the table is provided below.
VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | dropoff_longitude | dropoff_latitude | fare_amount | tip_amount | total_amount |
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 2015-01-15 19:05:39 +00:00 | 2015-01-15 19:23:42 +00:00 | 1 | 1.59 | -73.99389648 | 40.75011063 | -73.97478485 | 40.75061798 | 12 | 3.25 | 17.05 |
The data can also be explored spatially using Kepler’s tooltip function. This allows us to mouse over individual features on the map and see a popup with information related to the selected point.
In the interactions menu of Kepler’s control panel, we turn on the tooltip function. In the panel below, we can select which fields from the data table should appear when we mouse over a given point.
Mapping Data
Now we will jump into creating visualizations of the taxi trips. While Kepler is capable of striking visualizations, the real goal is to gain deeper insights into our data that would not be possible without maps.
Ride Start & End Points
In the layers panel, we see which layers are currently being displayed on the map. The layers are generated from the underlying spatial data. Multiple layers can be created from the same data table. For example, ride startpoints, ride endpoints, and ride trajectories are three ways in which our data can be mapped spatially.
Let’s delete the pickup -> dropoff line
layer. We will keep pickup -> dropoff arc
for later. Let’s also make the dropoff points visible so they are displayed with the pickup points.
Now we make the start and end points different colors. The dropdown arrow opens up styling options for the dropoff points layer. The three dots next to Fill Color open up the coloring options. If the field color based on is activated make sure to click the x. Later on, we will dynamically color our layers. But for now, we just want one color. Let’s also make the points smaller, setting 5
as the radius. This will keep the map from being too visually cluttered
Now if we zoom in, we see the points mapped in two colors. Let’s change the layer blending to additive so any area with a significant concentration of points will become lighter. Now if we move around the map, zooming in and out, we detect high densities of points. Subtractive blending would have a similar effect but make the overlapping points darker instead of lighter.
Ride Trajectories
Next, we will take a look at ride trajectories. Deleting the dropoff and pickup points will help keep the control panel clean and the map free of clutter. Now we will be working with the pickup -> dropoff arc
layer. Make it visible with the eye icon in the control panel.
We’ll color the arcs based on the number of passengers on each trip. Select the three dots next to the arc color and use passenger_count
as the field the color is based on. Since this is a number and not a category, we need a continuous color scale as opposed to discrete color categories. It is also helpful to use a quantile color scale here. This ensures a roughly equal number of trips for each color category.
Orienting ourselves from 3D to 2D will help us detect overlapping trajectories.
And let’s change the layer blending back to normal so arc overlaps don’t end up as blobs of white light. We’ll also change the opacity to 0.3
so we can see more of the data. Feel free to experiment with different opacity values.
This is still a tremendous mass of visual information, so next we will filter based on time. We select the filter section of the control panel. and add a new filter, using tpep_pickup_datetime
for the field.
One of the many benefits of Kepler is that it also understands non-spatial data. Kepler recognizes this new field relates to time and provides a suitable interface for filtering. Let’s drag the selectors until we are left with a 15-minute interval. Clicking the play button, we see a time window slowly move forward and dynamically filter the data on the map.
Now let’s create one more filter for trips having a total payment of 40 dollars or more. This time we select total_amount
as the field. Enter 40 as the number on the lower end of the range. Now when we play the animation it becomes clear that trips to the airports around New York constitute the majority of trips with a payment of 40 or more dollars.
Heat Map
Next, we’ll create a brand new layer from the data. Let’s delete all the previous layers and start fresh. We’ll be building a heatmap based on a grid.
We click on add layer and specify the type as grid. We’ll be basing the grid on dropoff points so we select the dropoff latitude and longitude in the corresponding fields. We can also name the layer heatmap.
Now we select a continuous color scale for the heatmap. In the image, I’ve chosen a more traditional white to red. If you can’t see the grid, make sure the layer isn’t hidden in the control panel. We will also make the color based on tip_amount
. Kepler performs the data aggregations in your browser, creating an average tip amount for each grid cell.
Now let’s make the grid size 0.25
for a more manageable scale. This means each cell is now 250 x 250 meters in size. This granularity works well for a city-level analysis. By reducing the opacity, we better see the underlying map of the city.
We’ll turn on height and select height based on Point Count
. Let’s change the map perspective back to 3D and we can see the data in a new way. The darker the color, the higher the tip amount and the higher the column, the more data points are in that grid. If you want to amplify the height differences, you can increase the Height Multiplier.
Now to make the map more readable for someone unfamiliar with the city, we will add labels. We select the basemap section of the control panel and select which features should be visible. We can also turn on the legend to see which values correspond with which colors.
Now that we’ve created the map, let’s interpret it. The darker red indicates a higher average tip amount. The higher a cell’s height, the more trips that have ended there. It seems that the highest tip amounts with a significant number of trips occur at dropoff points at the airports. Take a look yourself at the data you’ve prepared and see if you detect anything unusual or unexpected.
Exporting & Saving
Now that we’ve created a map, we probably want to share it. Kepler offers a few options.
Html
The easiest way to share a map is to export it as an .html file. This allows anyone with a browser and internet connection to open the map. The html file also contains all of the data you uploaded into Kepler. If there is a lot of data, these file sizes can become unwieldy. Double-check if there are any data privacy concerns before sharing a map this way.
When exporting, you will be asked to provide a Mapbox access token for the basemap. If you don’t provide one, your data will still be visible, but there won’t be a map underneath.
JSON
We can also export the map configuration as a .json. The JSON will configure another Kepler instance to display the same visualizations provided that the same underlying data has been uploaded as well.
Cloud Sharing
Kepler also provides the option to share the maps via the cloud using DropBox, Foursquare, or Carto.
Image
The results can also be shared with a simple static image. Your map will lose its interactivity and animation capabilities, but for some simple analyses, an image more than suffices.
Other Features
This tutorial is a very brief introduction to Kepler and its capabilities. Kepler also offers integrations integrations with Jupyter and API. Check out the docs for some ideas of what you can accomplish with Kepler.
Quirks
Kepler isn’t a perfect product and you will encounter issues. For example, as of writing this tutorial, much of the demo data can’t be loaded into Kepler on their demo page. JSON configuration files sometimes go unrecognized, even when the same data has been uploaded. And Kepler may experience performance issues with datasets that are not exceptionally large by general standards.
Based on my experience with Kepler, here are a few ideas for optimization:
- Perform data aggregations in your database or notebook before uploading your dataset.
- Clean up data fields to reduce dataset size e.g. the
tip_amount
has more significant figures than are necessary for currency. - Recognize the resources a browser-based visualization software will require from your machine and work accordingly.
Conclusion
Kepler is a powerful tool for visualizing static datasets in your browser. Kepler is not designed to perform spatial operations or work with dynamically loaded data. Just like any other tool in your tool belt it cannot and will not do it all. But Kepler certainly has its place in the repertoire of any geospatial professional.
I hope this tutorial helped you get started with Kepler and see its potential. If you need help with a specific geospatial problem or need similar content written for your own blog, please reach out.
Happy mapping!