Week 8: A Grammar of Graphics I (Florence - a grammar for Svelte)

Introduction

In the previous weeks we covered the basics of using HTML, SVG, JS and CSS to construct visualizations in the browser. With the addition of Svelte as a reactive framework, we have a powerful arsenal of tools to help us create all kinds of different visualizations. However, creating even a simple scatter plot can be quite tedious with these basic tools as they are relatively 'low-level' and are not specifically geared towards data visualization. In other words, it would be great if we had another system or library that would make creating visualizations based on data a bit more convenient.

Lucky for us, many such approaches and frameworks exist. The behemoth in this space is d3, which is short for 'data driven documents'. Started in 2011 by Mike Bostock, it has evolved into a universe of small modular libraries that are incredibly useful for all facets of data visualization. D3 does not use a plug-and-play approach - you can't just say 'make a scatterplot of my data' but instead need to give it relatively minute instructions.

On the other end of the spectrum, we have charting libraries that are very high-level: they make it simple to construct visualization idioms with only a few lines of code, for example charts.js and Baidu's echarts. The latter category is useful if you want to make a quick chart but offers, as a result, fewer options for customization, especially if you're interested in building a visualization system.

There are some midway approaches as well. For example, the Interactive Data Lab at UW has built a very powerful visualization language called Vega. Vega (and its high-level sibling Vega-Lite) uses d3 under the hood but provides a specification language (in JSON) that is more specifically catered towards visualization design. The theoretical system that approaches like Vega and other (e.g ggplot2 within the R universe) are using is often referred to as a grammar of graphics.

Specifying a visualization as a JSON configuration specification can be counterintuitive and, more importantly, moves us away from the syntax familiar to us from plain HTML/SVG and Svelte templates. For this reason, in this course, we will make use of a visualization library, florence that is built with the same philosophy on visualization design (i.e. the grammar of graphics) but does this on top of Svelte's template syntax. In this way, you can use everything you have learned so far and use the system provided by the florence and the grammar of graphics to make more advanced visualizations easily.

Using Florence

Florence is built on top of Svelte's component model. This means that Florence exports a series of small, modular building blocks that you can import to help build your visualizations. Just like with our use of d3-scale, we first have to install florence in our local project. We can do so by running npm install @snlab/florence.

After you install florence, you can import it in your Svelte components as per usual. The below sandbox has florence installed and ready for your use.

Core components

Florence is built on top of the grammar of graphics. You will see many familiar concepts fly by: scales, marks, aesthetics etc. So how do we get started with, say, visualizing our data from the Du Bois chart with this system?

The first key component that we need to import is the Graphic. Every Florence graphic starts with the Graphic component. Think of it as a supercharged svg root element.

We can import the Graphic component into our project like so:

import { Graphic } from '@snlab/florence'

After importing we can use it within HTML markup as per normal:

<div class="main-chart">
  <!-- main chart -->
  <Graphic width={500} height={500}>
  </Graphic>
</div>

When you inspect the DOM, you will see that all this has done so far is to create an SVG element with some default attributes.

To start drawing something, we need to use Marks. Florence provides all the marks you need to make visualizations: points, symbols, rectangles, areas, polygons, lines, and labels. These might seem simple, but together they can build very advanced visualization – a bit like how modern games are basically a giant collection of triangles!

For now, let's just use the Point mark. From its documentation we know that it needs an x and y property. Let's try to plot a point in the middle of our Graphic.

We will do this section in class together.

Solution

Using scales

So far, we have positioned things within our graphic using pixel values (i.e. a x value of 100 will be placed at the 100 pixels away from the origin). But we already know that scaling is an essential step for any visualization that is based on data (it is called 'data visualization' after all!). To make that process easier, Florence has a built-in understanding of scales to help us create a local coordinate system. Basically, what this means is that we can stop thinking and working with pixels but instead work with the actual data values. We will put this in practice for our Du Bois chart.

Our data has two variables: year and population. So far we have plotted population on y-axis, following Du Bois, but today it is much more common to have time on the x-axis. We will create two scales for our data, using d3-scale. We will only specify the 'domain' (the min/max of our data). We don't need to add the 'range' (the min/max of our screen pixel space) as we are going to let Florence figure that out.

const scaleX = scaleLinear().domain([1740, 1900])
const scaleY = scaleLinear().domain([0, 8000000])

Once we have the scales set up, we can supply them to the Graphic to create a local coordinate system.

<div class="main-chart">
  <!-- main chart -->
  <Graphic width="500" height="500" {scaleX} {scaleY}>
  </Graphic>
</div>

In the previous section we placed a point by specifying 'pixel space'. We can now place points in 'data space'. For example, to place a point at the year 1800, with a population of 6 million, we can do:

<div class="main-chart">
  <!-- main chart -->
  <Graphic width={500} height={500} {scaleX} {scaleY}>
    <Point x={1800} y={6000000}>
  </Graphic>
</div>

Now that we have placed a single point, let's see if we can extend this logic to place all our data points.

We will do this section in class together.

Solution

Axis

To help the reader orient themselves, of course it makes sense to add some axes. Florence has two components for x axes and y axes. They offer all kinds of options to customize your axes but the defaults are often OK to start with. You can simply add an axes by including it within the Graphic. It will infer the appropriate scale from its parent Graphic automatically and try to set up some decent tick marks etc.

<Graphic {scaleX} {scaleY} flipY>
  <!-- snip -->
  <XAxis />
  <YAxis />
</Graphic>

You will immediately see that the axes don't completely display. This is because there's currently no space available between the edge of the Graphic and the start of the data/content. Think about it like this: you want your axis to be outside, on the edge of the data visualization. But right now the visualization connects seamlessly to the edge of the Graphic. To solve that, we need to create some space between the data content and the edge of the Graphic. We can do this by adding some padding on the Graphic.

<Graphic {scaleX} {scaleY} flipY padding={60}>
  <!-- snip -->
  <XAxis />
  <YAxis />
</Graphic>

We will do this section in class together.

Solution

Additional marks

We have now recreated a very basic version of our Du Bois chart. One of the advantages of the grammar of graphics is that it becomes easy to switch to different marks. Let's try to replace our original Point mark with the following other marks:

  • Label
  • Rect
  • Line
  • The linear scale we are using for years does not replace 'pretty' tick marks by default. Let's replace it with a temporal scale

We will do this section in class together.

Solution

To replace the original Point mark with a Label mark, all we need to do is import the correct mark and supply it with its required properties. In addition to x and y that we also used for Point, we also need to supply an actual text to display for each label.

To display a bar or rectangle for each year, we need to define both the starting as well as the end point on the x axis. In practice, we can use d3's band scale to do this. But since we know that each observation is for one decade, we just make the bars 5 years wide as an easier shortcut.

To display a single line instead of individual points changes the nature of our visualizations: instead of 1 data point -> 1 mark, we now visualize many data points -> 1 mark. To do that, we need to supply an array of all x values and an array of all y values to a single line mark. We can this using the Javascript's map method that allows us to create a new array from our original data array with the right properties (year and population respectively) extracted.

Finally, to change from a simple linear scale based on the integer of the year, we bring in d3's temporal scale. To do this, we also need to convert the year values to a proper JavaScript Date type.