by PhilipJ on 14 February 2006
Edward R. Tufte is well known as an expert in, well, visually displaying quantitative information, and I’ve just finished reading his book on the subject. He has created a couple of indices that you can use to help yourself make better graphics. The first of these is the data-ink ratio, which he defines as “the proportion of a graphic’s ink devoted to the non-redundant display of data-information”. In short, don’t waste ink saying (or displaying) the same information multiple times. Here’s an example of a bad data-ink ratio.
The labeled, shaded bar chart displays the height in six ways: (1) the height of the left line, (2) the height of the right line, (3) the height of the shaded region, (4) the position of the horizontal bar, (5) the position of the number above the bar, and finally (6) the number itself.
Tufte argues that any five could be erased and the information the graph is trying to convey to a reader is still done so effectively. A cycle he comes up with to keep in mind when creating and editing graphs is: (1) Above all else, show the data, (2) Maximize the data-ink ratio, within reason, (3) Erase non-data ink, within reason, (4) Erase redundant data-ink, (5) Revise and edit.
A second index introduced is the data density of a graphic, defined as the number of entries in a data matrix over the area of the graphic. This, as you may have guessed, is meant to be maximized, for
A cartographer writes that “the resolving power of the eye enables it to differentiate 0.1 mm when provoked to do so. Clearly, therefore, conciseness is of the essence …
Some numbers tossed around as being respectable are 28 numbers (data points) per square centimetre for a time series of the weather in New York City, to an enormous 17,000 per square cm in a greyscale map of the galaxy, the record as of the printing date of the book. While I’m less convinced that this is as critical as the data-ink arguments brought up above, it is still a good idea to be concise with graphs.
As to the actual style of the graphs themselves, Tufte spares no mercy:
Like weeds, many varieties of chartjunk flourish. Here three widespread types found in scientific and technical research work are catalogued—unintentional optical art, the dreaded grid, and the self-promoting graphical duck.
One of my least favourites as well, graphics (particularly histograms) using crosshatches or angled lines leads to the moirÃ© effect, in which the design interacts with tremors in the eye to produce distracting vibrations and movement. Tufte says, and I agree, that “the noise clouds the flow of information”, which is the exact opposite effect a good graph should have.
On grids, another pet peeve of mine, Tufte says,
the grid should usually be muted or completely supressed so that its presence is only implicit—lest it compete with data. [...] Dark grid lines are chartjunk. They carry no information, clutter up the graphic, and generate activity unrelated to data information.
Certain fairly-pervasive software comes such that all graphs, charts, etc, have grids on by default, and I have never understood why.
Perhaps the worst of all offenses is creating a duck, which is
[w]hen a graphic is taken over by decorative forms or computer debris, when the data measures and structures become design elements, and when the overall design purveys graphical style rather than quantitative information
named after the Big Duck, a store in the shape of a duck in the U.S.:
Ducks are usually created because of computers, and invoke responses such as “Isn’t it remarkable that the computer can be programmed to draw like that?” Tufte then gives this graph as an example of a duck, and claims it is possibly the worst graphic to ever find its way into print:
Hard to argue with that! So remember, keep your data-ink ratios near 1, your data densities maximized, and forgo the duck.