Introduction
When they're done right, graphs are a useful tool for telling compelling data stories and supporting data models. However, too often graphs lack the right components to truly enhance understanding.
In this blog, we look at how a few quick customizations help make graphs more impactful. In particular, we will consider:
- Using grid lines without cluttering a graph.
- Changing tick labels for readability.
- Using clear axis labels.
- Marking events and outcomes with lines, bars, and annotations.
Data
As an example, we will use New York Times COVID tracking data (available on GitHub). This data is part of the New York Times U.S. tracking project.
From this data, we will be using the rolling 7-day average of COVID cases per 100k provided by date for five states: Arizona, California, Florida, Texas, and Washington.
Creating a Basic Graph
Let's start by creating a basic panel data plot using:
- The
plotXY
procedure with dates. - A formula string and the
by
keyword.
First we will load our data:
// Load original data
fname = "us_state_covid_cases.csv";
covid_cases = loadd(fname,
"date($date) + cat(state) + cases + cases_avg_per_100k");
// Filter desired states
covid_cases = selif(covid_cases,
rowcontains(covid_cases[., "state"],
"Florida"$|"California"$|
"Arizona"$|"Washington"$|
"Texas"));
Note that in this step we've:
- Specified the variables we want to load and their variable types.
- Filtered our data to include only our states of interest.
Now, we can make a preliminary plot of the rolling 7 day average number of COVID-19 cases per 100,000 people:
// Plot COVID cases per 100K by state
plotXY(covid_cases, "cases_avg_per_100k ~ date + by(state)");
by
keyword tells GAUSS to split the data on a particular variable. It was introduced in GAUSS 22, as well as the capability to use plotXY
with date variables.Customizing Our Graph
Our quick graph was a good starting point. However, a few customizations will help present a clearer picture:
- Adding y-axis grid lines will help us read COVID cases values more easily.
- Reformatting our x-axis tick labels to include months rather than quarters will make the dates more recognizable.
- Change axis labels.
Declaring a plotControl
Structure
The first step for customizing graphs is to declare a plotControl
structure and to fill it with the appropriate defaults:
// Declare plot control structure
struct plotControl myPlot;
// Fill with defaults for "xy" graph
myPlot = plotGetDefaults("xy");
Customizing Plot Attributes
After declaring the plotControl
structure, we can use plotSet
procedures to change the desired attributes of our graph.
Adding Y-Axis Grid Lines
First, to help make levels of COVID cases more clear, let's add y-axis grid lines to our plot using plotSetYGridPen
.
The plotSetYGridPen
procedure can be used to set the width, color, and style of the y-axis grid lines:
- Turn on y-axis major and/or minor grids.
- Set the width, color, and style of the grid lines.
Input | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
which_grid | Specifies which grid line to modify. The options include: "major" , "minor" , or "both" . |
||||||||||
width | Specifies the thickness of the line(s) in pixels. The default value is 1. | ||||||||||
color | Optional argument, specifying the name or RGB value of the new color(s) for the line(s). | ||||||||||
style | Optional argument, the style(s) of the pen for the line(s). Options include:
|
// Turn on y-axis grid for the major ticks. Set the
// grid lines to be solid, 1 pixel and light grey
plotSetYGridPen(&myPlot, "major", 1, "Light Grey", 1);
plotSet
procedure, the first input is a pointer to a declared plotControl
structure. We indicate that something is a pointer using the &
symbol.Because GAUSS allows us to add and format y-axis and x-axis grid lines separately, we are able to improve readability with y-axis lines without adding the clutter of a full grid.
Customizing X-Axis Ticks
Next, let's turn our attention to the x-axis ticks. We will use three GAUSS procedures to help us customize our ticks:
Procedure | Description |
---|---|
plotSetXTicLabel | Controls the formatting and angle of x-axis tick labels for 2-D graphs. |
plotSetXTicInterval | Controls the interval between x-axis tick labels and also allows the user to specify the first tick to be labeled for 2-D graphs. |
plotSetTicLabelFont | Controls the font name, size and color for the X and Y axis tick labels. |
First, let's change the format of the labels on the x-axis to indicate months rather than quarters:
// Display 4 digit year and month on 'X' tick labels
plotSetXTicLabel(&myPlot, "YYYY-MO");
plotSetXTicLabel
.Second, let's set the x-axis ticks to:
- Start in March of 2020 to correspond with the start of the pandemic.
- Occur every 3 months.
// Place first 'X' tick mark on March 1st, 2020
// with ticks occurring every 3 months
plotSetXTicInterval(&myPlot, 3, "months", asDate("2020-03"));
Third, let's increase the size of the axis tick labels:
// Change tic label font size
plotSetTicLabelFont(&myPlot, "Arial", 12);
Updating Axis Labels
Finally, we change the axis labels:
// Specify the text for the Y-axis label as well as
// the font and font size for both labels
plotSetYLabel(&myPlot, "Cases per 100k", "Arial", 14);
// Specify text for the x-axis label
plotSetXLabel(&myPlot, "Date");
plotSetYLabel
and plotSetXLabel
functions automatically set the font, font size, and font color for both axes. There is no need to specify it again.Now we can create our formatted graph:
// Plot COVID cases per 100K by state. Pass in the 'plotControl'
// structure, 'myPlot', to use the settings we applied above.
plotXY(myPlot, covid_cases, "cases_avg_per_100k ~ date + by(state)");
Highlighting Events
It's common with time series plots that we want to note specific dates or periods on the graph. GAUSS includes four functions, introduced in GAUSS 22, that make highlighting events easy.
Procedure | Description | Example |
---|---|---|
plotAddVLine | Adds one or more vertical lines to an existing plot. | plotAddVLine("2020-01-01"); |
plotAddVBar | Adds one or more vertical bars spanning the full extent of the y-axis to an existing graph. | plotAddVBar("2020-01", "2020-03"); |
plotAddHLine | Adds one or more horizontal lines to an existing plot. | plotAddHLine(500); |
plotAddHBar | Adds one or more horizontal bars spanning the full extent of the x-axis to an existing graph. | plotAddHBar(580, 740); |
As an example, let's add vertical lines to help compare July 4th, 2020 to July 4th, 2021.
Specifying Legend Behavior When Adding Lines
First, when adding new data to an existing plot, we need to specify how we want this data treated on the legend using the plotSetLegend
procedure.
We can add a label for the line to the legend:
// Label next added line "Independence Day"
// and add to the legend
plotSetLegend(&myPlot, "Independence Day");
or we can tell GAUSS to not make any changes to the current legend:
// The empty string specifies that the legend
// should remain unchanged when the next line is added.
plotSetLegend(&myPlot, "");
Specifying Line Style
Next, we will specify the style of our lines using the plotSetLinePen
procedure. This procedure lets us set the width, color, and style of the lines added to the graph.
Attribute | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
width | Specifies the thickness of the line(s) in pixels. The default value is 2. | ||||||||||
color | Optional argument, specifying the name or RGB value of the new color(s) for the line(s). | ||||||||||
style | Optional argument, the style(s) of the pen for the line(s). Options include:
|
// Set the line width to be 2 pxs
// the line color to be #555555
// and the line to be dashed
plotSetLinePen(&myPlot, 2, "#555555", 2);
Adding Lines to Mark Events
Finally, let's add the lines marking Independence Day in 2020 and 2021.
We first specify the dates we want to add lines using asDate
:
// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");
Then we add our holidays to the existing graph using plotAddVLine
:
// Add holidays to graph
plotAddVLine(myPlot, ind_days);
The complete code for adding the lines looks like this:
// Do not add vertical lines to the legend
plotSetLegend(&myPlot, "");
// Set the line width to be 2 pixels
// the line color to be a dark grey color, #555555,
// and the line to be dashed
plotSetLinePen(&myPlot, 2, "#555555", 2);
// Create string array of independence days
ind_days = asDate("2020-07-04"$|"2021-07-04");
// Add holidays to graph
plotAddVline(myPlot, ind_days);
Adding Bars to Mark Events
Now, let's add a vertical bar to mark the winter holidays time period of 2020. We will add a bar that marks the time span from Thanksgiving 2020 to New Year's Day 2021.
We first need to create a new plotControl
structure to format our bars. Since we are adding a bar to the graph, we will fill our new plotControl
structure with the defaults for a bar graph:
// Create plotControl structure
struct plotControl plt;
// Fill with default bar settings
plt = plotGetDefaults("bar");
Next, we can format our bar using the plotSetFill
procedure. The plotSetFill
procedure allows us to control the fill style, opacity, and color of graphed bars:
// Set bar to have solid fill with 20% opacity
// and grey color
plotSetFill(&plt, 1, 0.20, "grey");
We also have to specify the legend behavior when the bar is added. This time let's add a label to the legend for the "Winter Holidays":
// Add "Winter Holidays" to the legend
plotSetLegend(&plt, "Winter<br>Holidays");
<br>
is HTML and it tells GAUSS to line break between the words "Winter"
and "Holidays"
. Now we are ready to add the bar to our graph using the plotAddVBar
procedure:
// Add a vertical bar to graph starting
// on November 26th, 2020 and
// ending January 1st, 2021
plotAddVBar(plt, asDate("2020-11-26"), asDate("2021-01"));
Adding Notes to Graphs
As final customization, let's add a note to our graph to label one of our holidays. We can do this using the plotAddTextBox
procedure.
The plotAddTextBox
takes three required inputs:
- The text to be added to the graph.
- The x location where the text should start.
- The y location where the text should start.
plotAnnotation
structure can be used to format the textbox and its text content. // Label the 2020 Independence Day line
plotAddTextBox("← Independence Day", asDate("2020-07-04"), 80);
←
is HTML and it tells GAUSS to add a left arrow to the graph. Conclusion
In this blog, we see how a few customizations and enhancements can make plots easier to read and more impactful.
In particular, we covered:
- Using grid lines without cluttering a graph.
- Changing tick labels for readability.
- Using clear axis labels.
- Marking events and outcomes with lines, bars, and annotations.
Further Reading
- How to Create Tiled Graphs in GAUSS
- How to Interactively Create Reusable Graphics Profiles
- Five Hacks For Creating Custom GAUSS Graphics
- How to Mix, Match, and Style Different Graph Types
References
"The New York Times. (2021). Coronavirus (Covid-19) Data in the United States. Retrieved 12-05-2021, from https://github.com/nytimes/covid-19-data."
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 18 years of combined industry and academic experience in data analysis and research.
Great! I found the data here: https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-states.csv
Besides, I had to change the position of the legend in my settings (Tools > Preferences > Graphics menu.), because it was located on the top right in my settings.
Best,
Jamel
Hi Jamel,
Thanks for your comment! You're correct that, depending on your preference settings, you may need to update the preferred location of your legend to replicate the results in this blog. Alternatively, the position of the legend can be programmatically set using the plotSetLegend function.
Best,
Eric