April 2014 - Vestigial Blog

[The project referred to in this post can be found at http://vestigial.org/MMT/ ]

I’ve been looking for some better tools to produce interactive, data driven, visually appealing web content. In the past couple of years, I’ve become enamored with R for analysis and visualization, but the graphic results are static. (Sure, there are tricks to create animations, but I’m not looking for workarounds.) I occasionally use Google Charts when I need to put together a quick visualization, but they don’t provide quite the level of flexibility I’d like. I started looking at either working directly with SVG or Canvas DOM elements, or using a Javascript SVG library that would allow me to avoid the low-level details.

The most interesting possibility was the D3 framework. D3 — for Data-Driven Documents — is an entire framework for DOM manipulation in data-driven sites. Browsing through the examples on the D3 site, I recognized several memorable visualizations that have appeared on one of my favorite blogs through the years, Flowing Data. It is possible to use D3 for SVG construction and manipulation while non-data-driven portions of the site are handled by, eg, jQuery or standard Javascript. But as long as you’re already using the bandwidth to load the framework, you might as well drop other frameworks, and use the tools that D3 provides.

I was keen to get some experience with D3. When learning a new technology, I prefer to dive straight in — come up with a short, but non-trivial project that I can build. In this case, I came up with a project that melds technology, data visualization, and ultrarunning. The Massanutten Mountain 100 Mile Trail Run (or MMT) is in a few weeks. In such a long race, runners and crews like to have some idea when they’ll arrive at intermediate points along the course if they’re aiming for some given finish time. Conversely, knowing when they’ve arrived at points along the course can help to predict what sort of finish time to expect. While I’m not the first person to provide a visualization, or some tool to correlate aid station splits with finish times, it’s fun to put together something that’s visually appealing and useful.

Showing data from 2011 and 2013 for finishers who finished between 20:59 and 25:55, race time. The horizontal axis is time and the vertical axis is distance, labeled on the left with mileage at each aid station, and on the right with the aid station name. Each diagonal line, or “track”, represents a single racer. Intermediate times on the graph show first and last racer times of arrival at each aid station (for racers in the result set). Tufte would be proud.

There are several interactive components that I think are noteworthy. First, I provide on-demand data loading. When the page loads, none of the race results is loaded. When a year is selected, the page checks whether the data have been downloaded. If not, it fires an AJAX request, and saves the data so the results can be turned on and off.

The page also provides sliders to limit the result set based on finish time. Each limiter consists of three components: a triangular slider widget (represented by an SVG path element), a time display (represented by an SVG text element), and a vertical guide line (represented by an SVG line element). When the widget is slid, all three elements should move in unison, and the time display should update with the time value at the current point. As a bonus, the vertical guide gets brighter. So I needed to be able to address each element individually, but move them in unison. To build that, first I needed to define the shape for my widget (note that in SVG coordinates, the top left is [0,0]):

var limpolygon = [{x: 0, y: 0}, {x: 10, y: 0}, {x: 5, y: 10}, {x: 0, y: 0}];

I also need to define a function to tell D3 how to interpret the data above. I can use d3.svg.line() to return a function for this purpose. Since I’ve built the object with straight-forward X and Y coordinates, I just need to build a simple function based on those values:

var limline = d3.svg.line()
  .interpolate("linear")
  .x(function(d) { return d.x; })
  .y(function(d) { return d.y; });

Finally, I put the group together. I define a group element (“g”), and append the widget, which I construct in place. I then use the D3 selector to reselect the group, and add the line, then the text:

svg.append("g")  // Create the group, append it to the svg object
  .attr("id", "lim1")
  .attr("transform", "translate("+lim1x+","+limy+")")  // Put it into position
  .append("path")  // Create "path" element for widget, and append it to group
    .attr("id", "lim1_point")
    .attr("d", limline(limpolygon))  // A path has a "d" attribute which gives
                                     // instructions for drawing. Our limline()
                                     // translates raw data into path data
    .attr("fill", "white")
    .on("mousedown", function() {
      capt = "lim1";
      d3.select("#lim1_line").style("stroke-opacity", "1");
    });

d3.select("#lim1").append("svg:line")   // Create line element, append to group
  .attr("x1", limhalfw)
  .attr("y1", ex_pad.top)
  .attr("x2", limhalfw)
  .attr("y2", height - ex_pad.bottom)
  .attr("id", "lim1_line");

d3.select("#lim1").append("svg:text")   // Create text element, append to group
  .attr("id", "lim1_time")
  .text("00:00")
  .style("text-anchor", "end")
  .attr("transform", "translate(-2)");  // Push it 2px to left, for a nice gap

In my view, the coolest trick is making the data respond to the sliders. Whereas showing or hiding the individual years relies on a small number (3) of discrete values, I need to show or hide individual race results based on what is essentially a continuous scale. This involves several steps. First, when adding each track to the graph, I need to attach the finish time to it. Fortunately, HTML5 provides the ability to specify arbitrary data attributes with the data-* construct.

lineset.enter()
  .append("path")
  .attr("data-finish", function(d) {  // Add the data-finish attribute
    return d.finish;
   })
  .style("stroke-opacity", function(d) {
    if (d.finish > finScale(lim2x) || d.finish < finScale(lim1x)) return "0";
    else return ".3";
   })
  .datum(function(d) { return d.splits; })
  .attr("class", "rtrack line " + iden)  // Classes to use later in selectors
  .attr("d", line);

Above is the code to add the tracks. While it might not make much sense if you are not familiar with D3, the key point is the third line. The object has a data object, d, applied to it, and on that line, we set the data-finish attribute to the value of d.finish. (Directly below that, we set the opacity of the line to 0 (making it invisible) if it falls outside of our specified range, or .3 if it is inside the range. But we’re getting ahead of ourselves.)

The next thing we need to a way to translate the location of a slider into a finish time. D3 provides “scales” for just such a purpose. Usually, D3 scales are used to translate some real world value to a pixel position. In this case, we want to do the reverse. I want to build a function that will translate an input domain of a pixel position into the output range of a race time, which in this case is between 0 and 36 hours.

var finScale = d3.scale.linear()
  .domain([lim1x, lim2x])
  .range([0, 36]);

(An astute reader who is familiar with D3 might note that somewhere else, I must have defined a scale to translate from times to pixel values. In that case, someone might wonder why I don’t just use linear.invert() to translate a range value into its corresponding domain value. The answer is that the scale that translates from time to position uses a domain defined by the time of day as a date object, whereas in this case, I want to translate between position and a floating point number representing the finish time in hours (with minutes represented in the fractional portion of the number). Hence the need to define a new scale.)

In this case, lim1x is the initial pixel position of the lower limit slider, and lim2x is the pixel position of the upper limit slider. That produces a function that can be called as finScale(px_pos) to return a corresponding race time. I can then use that in the function that is called when a slider is released.

function updateRange() {
  var fin1 = finScale(lim1x);  // Translate pixel positions to finish times
  var fin2 = finScale(lim2x);

  d3.selectAll(".rtrack").transition(500).style("stroke-opacity", function(d) {
    if (this.getAttribute("data-finish") > fin2 ||
        this.getAttribute("data-finish") < fin1) return "0";
    else return ".3";
  });

  updateAidStationTimes();
}

That function translates the current pixel positions of the sliders into race times (fin1 and fin2). Then it uses d3.selectAll to get every item with the class “rtrack” (which is every race line displayed on the graph), applies a 500ms transition time to the following step, then sets the stroke-opacity style based on a function that checks whether the custom attribute data-finish is in the range defined by the limiters. Finally, it calls updateAidStationTimes(), which I won’t explain in detail here, but it uses d3.extent() with a custom accessor function to find the first and last arrival time of racers in the result set at each aid station. (If you’re particularly interested, you can always dig it out of the source.) It then updates the times displayed on the graph, and moves them into the proper positions.

I started the project on Saturday morning with no experience in D3 (or with SVG graphics), and I finished Sunday evening. I even had time to get out for a bike ride, a run, and a trip to the library to get a movie (which I also watched over the weekend). In the course of this project, I came to appreciate just how massive D3 is. I’m starting to get a feel for it, but this project just scratched the tip of the D3 iceberg (though I’m not sure one would really scratch an iceberg, the tip or otherwise).

[The project referred to in this post can be found at http://vestigial.org/MMT/ ]

When I was very young — maybe around second grade — I started taking piano lessons from Mrs. Bickley, who also happened to be my school’s librarian. But I had little patience for practicing. Working through a new song, note by note, making the same mistakes again and again created such visceral frustration that there came a point when I refused to practice at all. Sometime during fourth or fifth grade, my parents allowed me to quit. I felt some shame whenever I would venture into the library and see Mrs. Bickley, but that couldn’t nearly outweigh how much I disliked practicing.

Some years later, when my brother was taking guitar lessons, I tried again. My parents could take us both to the music store with the sound-proof practice rooms in the back, and we could take our lessons. That time around, I lasted perhaps a year before deciding that it wasn’t for me.

In middle school and high school, I was involved in theater (some, of the musical variety) and choral groups. I wasn’t much of a singer, but it turns out that at certain times and places, if you are a male, it’s sufficient just to be willing to show up. I was the beneficiary of just such a situation.

Many, many years later, I found myself regretting that I didn’t stick with music. I’m envious of people who can sit down with an instrument, and play a song. As time went on, my regret grew more and more palpable. I still had a mini Casio keyboard that I received as a gift as a child. I plugged it in, and started to pick out songs on its miniature keys. After several weeks of the tinny, synthesized keyboard sound, I decided that it was time to commit. I bought a full-sized keyboard, and I started to play again. Whereas I had been an impatient child, I have become an adult who is quite content to spend an hour or two working through a troublesome section of a song.

I can’t claim that I’ve become a very good piano player. But I can pick through a song, and repeat it enough that it eventually starts to sound coherent. One of my favorites is Traumerei by Schumann. It’s short and melancholy and open to oh-so-much expression. Here, I play it too fast, and I’m sort of distracted by the concerns of self-recording. I hammer the keys too hard and unevenly in many places, and I continually turn even eighth notes into dotted eighth notes and sixteenth notes.

But it brings me joy to be able to sit down at an instrument, and play a song.

(I should note that the C above middle C on my keyboard is busted. The key itself is loose and most of the time, it doesn’t play when struck. So if you are paying close attention, you’ll notice that that note is almost entirely absent from the song. Also, I recorded the sound on Garageband on my iPad. For some reason, that app has recently decided that my songs really need occasional, randomly placed gong sounds. So again, if you are paying close attention, you’ll hear a couple of Garageband gongs.)

Vestigial Blog

Monthly Archives: April 2014

Race Progress Visualization Using D3

Traumerei

Climb Some Mountains

Recent Posts

Archives