How Far can New York Taxi Data take you?
This is a conceptual piece to showcase using Location based Business Intelligence in Tableau. It is a joint project created by DataBlick Labs, which showcases projects created jointly by Allan Walker, Noah Salvaterra and Anya A'Hearn. Data from the New York City Taxi and Limousine Company, is animated in Tableau showing all the taxi's flowing through New York City for part of a day (the "How Far" tab). The concept would be made relevant to a merchant or transaction processor; if the taxi data could be tied to the consumer via a payment, or other loyalty program data capture (the "Where" tab). Current customers could be compared to potential customers to understand if there are differences in demographics and travel patterns. Using "nearest neighbor" analysis one could see all the times a taxi, or consumer passed by or stopped at a specific merchant throughout the course of a day, or given time period (The "What" tab). This information could be used to incentivize purchases at a specific merchant. Finally, the "What's Next" tab shows how this geospatial and consumer information can be interacted with and analyzed further inside of Tableau to gain valuable consumer insights. Please explore the viz below as well as highlights of a few capabilities explored using the NY Taxi data:
- Video version - demo of concept
- Downloadable version, for users to be able to interact with it locally, and see the animation via the standard pages shelf (a subset of the data)
- Animation of the Taxis pages shelf via the JS API
- Alerting capabilities - i.e. when a certain even occurs, such as Taxi's at Grand Central dips below a certain point, pop up an alert
Video Version:
How Far Tab: Data from the New York City Taxi and Limousine Company, is animated in Tableau showing all the taxi's flowing through New York City for part of a day.
Where Tab: The concept would be made relevant to a Merchant or Transaction processor, if the taxi data could be tied to the consumer via a payment, or other loyalty program data capture. Current customers could be compared to potential customers to understand if there are differences in demographics and travel patterns..
What Tab: Using "near" analysis one could see all the times a taxi, or consumer passed by or stopped at a specific merchant throughout the course of a day, or given time period. This information could be used to incentivize purchases at a specific merchant.
What's Next Tab: Finally, the "What's Next" tab shows how this geospatial and consumer information can be interacted with and analyzed further inside of Tableau to gain valuable consumer insights.
Alerts and Animation via JS API
By looping an integer array which is mapped to interpolated time we can then filter the worksheet, creating a psuedo-animation effect.
View the animation here.
Allan Walker's map-tastik process for creating New York City Taxi & Limousine Company (NYC TLC) Trips for 1st of December 2013 in Tableau
- Hard drive sent to NYC TLC under Freedom of Information (FOI) Request.
- Hard drive populated with 2 series of comma separated value (CSV) files for one year (2013);
- Origin and Destination Latitude and Longitude, Pick Up and Drop Off Date/Times, Duration, Medallion ID
- Medallion ID, Fare in USD, Tip in USD, Number of Passengers, Date/Time
- Imported CSV files to Postgres/PostGIS
- Exported (a) WHERE date/time = 1/12/2013 and Origin and Destination Lat/Long !=NULL
- Create index column “Join”
- Imported CSV to QGIS as two files, Origin & Destination Latitude and Longitude
- Clipped Point files with NYC political boundary file “boroughs” to shoreline as a mask, ensuring Origins and Destinations were on land
- Joined Origins and Destinations back as one file as a table, 360,000 Node Pairs
- Batched up Origins and Destinations into 100,000 blocks
- Using ESRI Network Analysis extension, and NYC “LION” Street Database, created routes using “Join” as “Name” and “RouteID”
- Parameterized “LION” with the following rules: U-Turns allowed, adhere to Max Speed limits, adhere to one-way
- Used NAVTEQ Database Time of Day setting for 1/12/2013
- Snapped all Origins and Destinations node pairs to network edges
- Generate multiple shortest path routes using dijkstra's algorithm as ESRI multipolyline shapefiles per batch
- Generate ESRI point shapefiles per batch using ET Wizards for ESRI polyline to point (vertices) to give network path order
- Merge all point files
- Join back to CSV as left inner on “Join” where “Name” was converted from string to Long (integer)
- Export to TSV
- Imported back to ESRI as joined file and reverse geocoded by ESRI World Rooftop for every point allocating street address, city, state and ZIPcode (5) within a radius of 100ft
- Generate POI database by extracting OpenStreetMap Metro Extracts for NYC filtered for Amenities, Shop and Tourism where Name !=NULL
- Reverse Geocode POI database by ESRI World Rooftop for every point allocating street address, city, state and Zipcode (5) within a radius of 100ft
- Filter POI point file for Street Address !=NULL
- Export POI point shapefile to TSV
- Left outer join Taxi TSV and POI database TSV on Street Address
- Allocate Vincenty distance formula in miles in both tables as a Calculated Field: =ACOS(COS(RADIANS(90-TaxiLat1)) *COS(RADIANS(90-POILat2)) +SIN(RADIANS(90-TaxiLat1)) *SIN(RADIANS(90-PoiLat2)) *COS(RADIANS(TaxiLong1-POILong2))) *3958.75
- Animate visualization based on Noah's interpolated date time using a case statement to convert to integer
Noah's math-tastick-ness:
"I seriously don't know what to say other than time interpolation. I'm seriously not trying to be sassy."