Receive $80 Grab vouchers valid for use on all Grab services except GrabHitch and GrabShuttle when you subscribe to BT All-Digital at only $0.99*/month.
Find out more at btsub.sg/promo
I've not written much in about two weeks for various reasons, but I'm back now to what I'm hoping will be a more regular schedule (although a good chunk of the rest of the world seems to be heading out for holidays). I thought I'd kick it back off on the blog with some thoughts about data visualisation.
The slightly embarrassing truth is that I've been spending unhealthy amounts of time thinking about ways to compare time series, or line graphs. The question often boils down to whether to index the lines. Do we adjust the lines so that they are on the same scale?
Let's say I want to know how well shares of LionGold Corp are doing. Maybe the shares are down 70 per cent from six months ago. That sounds pretty rough, but if the rest of the market fell 80 per cent, LionGold would actually have outperformed. So it's useful to overlay LionGold's price change against some benchmark, like the Straits Timex Index (STI), for example. But we can't just plot LionGold's and the STI's absolute prices on the same chart, because they don't share the same units and their absolute prices are so far apart. Even if we use a left-hand axis for LionGold and a right-hand axis for the STI, it's still going to be a problematic overlay that's subject to the chartmaker's biases. If I wanted to mute LionGold's price drop, I'll just zoom out with a larger scale for LionGold and zoom in with a tighter scale for the STI.
A fairer way to compare them would be to index them. We simply plot the percentage of each of LionGold and STI against their six-month-ago price, and we get a pretty neat comparison. Like this:
We've been indexing quite a bit, because when the nature of the data is quite similar, shifting them to the same scale makes for quite a natural comparison. But that is not always the case.
Consider this chart that I worked on in October, in a week when global markets were in turmoil amid fears of a sharp drop in worldwide demand. The equities and commodities charts were indexed, because we wanted to show how the different benchmarks were moving relative to each other, and the benchmarks were largely on different scales and units. But we didn't index the charts looking at bond yields and credit default swap spreads. Why not? It was not simply that the benchmarks for each of those categories were already on the same scale, but also because indexing them would turned the data into something meaningless for readers. The market cares about the absolute value of the government bond yield, not how many percent it has changed from six months ago. And the absolute spread between the US Treasury and Singapore Government Securities yields has meaning, which indexing would have obscured. One disclaimer, though: If the story were about the percentage change in yields or spreads, which is possible, we may have used indexed charts instead.
So I've come up with my two laws of indexing.
In general, index only if:
Does that sound right? Am I missing anything?
To people who deal with share prices regularly, indexing of this sort is commonplace. But at a recent workshop of data professionals and enthusiasts that I attended, there was a discussion among some participants about how to visualise a comparison of the prices of different stocks, and indexing did not seem to be a natural response for many of them who did not deal regularly with markets. I think there's no doubt that everyone there knew how to index two lines. In fact, I think most of the people there could do a lot of fancy visualisations. But I think in many cases, it's the little and simple stuff that work better. I think one constant challenge for the data community is figuring the communication part of the work, in terms of what kind of visualisation is the most useful or meaningful.