Christoph
Oberhofer

The Making of Switch - Part 2: A location-based app

Highly motivated and ready to switch into ‘build mode’, I started experimenting with the idea of building a location-based app. But first, I had to get more familiar with the key ingredients for a location-based app. Having never built one myself, there was definitely a gap in my knowledge on where to get started. In order to stay focused, I made sure to always have the goal in front of me - tracking train journeys.

Thought experiment

To get a better perspective of the problem space, I started a thought experiment where I imagined myself sitting on a train, and figuring out what train I was on, given the tools I had available: my phone and a spotty internet connection.

First attempt

  1. Open the Scotty app (Austrian Railways - Schedule)
  2. Navigate to the Live Map view
  3. Tap the train that’s closest to my current location
Scotty Live Scotty Timetable

That was easy enough, but with the following caveats:

This was clearly conflicting with my originally outlined constraints, such as limited internet availability and independence of the train agency.

Second attempt

During the second attempt, I forced myself to only use my current location data, time of day and the train schedule. But as it turned out, a train schedule itself isn’t very helpful when you are on a train, and between stations. So I sat back and waited to arrive at the next stop. Once stopped, I took a note of the current station’s name, the arrival time, and a list of trains departing, by simply copying the timetable at the platform.

When looking at the departure monitor below, while your train departs, it should be obvious that your train is the one on the top of the list, given the platforms match.

Departure Monitor

Once departed, and zooming through the landscape, figuring out which train we were on became more difficult again, simply because the schedule only references stops along the way. Additionally, how would a system know that I’m traveling on a train and not in a car? Then I realized I was missing yet another crucial piece, knowledge about the tracks the train was actually running on.

Data Needs

After that trip, I summarized the following data requirements:

To make use of the schedule, I needed to match it against my phone’s movement pattern. Movement, in the context of Switch, is the combination of physical activity, like walking or driving, and geographical location. This is exactly the kind of data offered by Google Maps with the Timeline feature turned on. It allows you to go back in time and re-visit some of the locations you’ve been to in the past. It was shockingly detailed, and provided even insight into the means of transportation.

Google Timeline Map Google Timeline List

I wished I had access to this data, but unfortunately there isn’t any API available for 3rd parties. Nevertheless, the existence of this feature gave me hope that apps are indeed capable of tracking movement accurately, even when they run in the background, without user interaction.

The path forward was more clear now: Carefully combine movement and train schedule data, shake it, and enjoy the results.

Data Sources

Now, with my data needs defined, the next step was to figure out how to source them. I decided to split this part into two streams, movement data and scheduling data.

Movement Data

Today’s smartphone operating systems offer various ways to access movement data, including location and activity. Fortunately, accessing this information is only possible after users grant permission to the consuming app, protecting their privacy. The movement data is a combination of:

  1. Location data, the approximate geographical location of the person
  2. Physical activity, the type of activity currently being performed, such as walking, cycling or driving

There’s a dedicated chapter on movement data collection: Part 3 - Movement Data

Scheduling Data

Whereas movement data is recorded on the phone directly, data about a train’s schedule is provided by a third party. Other services, such as Google Maps, also depend on public transportation schedule data provided by the respective agencies.

Google Maps Google Maps Trains

During my research I stumbled upon GTFS, a file format specification for the public transportation sector. It defines scheduling information and even includes geographical details about stops and routes. This was exactly what I was looking for.

Since the train schedule is such an essential part of Switch, I dedicated an entire chapter towards it: Part 4 - GTFS Timetable

Data persistence

With data formats figured out, I had to find a way to store all this data. Keep in mind that some of the requirements forced me to keep data stored locally, on the user’s device. No server-side component allowed. There are plenty of storage options on Android: You can store data in files, or in a key-value store like SharedPreferences. Yet another option was to rely on SQLite, which is natively supported in Android apps.

SQLite

I decided to use SQLite, for both, movement and scheduling data. Here’s why:

  1. It’s widely supported across all platforms, including desktop. This is inherently important for testing and debugging
  2. SQLite operates on a single file, reducing complexity and making backup scenarios super simple.
  3. SQLite is a relational database, supporting powerful querying techniques to offload some of the work to the SQL engine, instead of application code.

SQLite isn’t a silver bullet either and comes with its own baggage of problems. Since it’s a relational database, one has to think about schema modeling, indexing, and data migrations.

One caveat I discovered early on was the lacking support of geospatial data. There are ways to build SQLite binaries with R-Tree support, but this also meant shipping my own binary, instead of using the one provided by the operating system. I accepted this limitation, and moved on.

Data visualization / debugging

Once the ingestion part of the data-pipeline was figured out, the next question was how to get access to this data for debugging purposes. Since the data was stored all on-device, organized in tables, it turned out to be quite difficult to debug and reason about. Once again, I turned to Google Maps Timeline view to get inspiration.

The way I approached this problem was to turn the recorded movement data into events, and order them chronologically. Subsequently, I turned this event-stream into a state representation, giving me the ability to step through “time”. This allowed me to plot the data directly onto a map, including the ability to jump from one event to another, making visual debugging much more pleasant.

Debugging Map View Debugging Events

This screen turned out extremely useful while debugging, and helped me uncover a good number of issues related to movement tracking and journey matching.

Real-World testing

By the time the first iteration of the data-collection was in place, I was eager to test it out in the real-world. This part wasn’t too difficult, since all I had to do was to keep my phone with me all the time. Ideally I wanted to collect a few days worth of data, just to get a better understanding on what the API’s capabilities were, and their accuracy.

As it turned out pretty early in the testing phase, the real world is a messy place, and individual data-points are not to be trusted. Neither are guarantees when applications run in the background. This was when I realized once again, that data collected from phone sensors is highly probabilistic. To make them useful, I could no longer rely on my traditional 0 or 1 thinking, but had to switch the paradigm, from purely deterministic, to probabilistic modeling.

In the next part (Part 3 - Movement Data) I describe my approach on how I collected and transformed the movement data to be useful inside Switch.