Intro
Often times we need to mix multiple graph types in order to create a plot which most effectively tells the story of our data. In this post, we will create a plot of the Phillips Curve in the United States over two separate time periods.
We will show how to add scatter points and lines as well as data series' of different length to a single plot. However, our main focus will be showing you how to control the styling of all aspects of the plot in these cases.
How to add data to an existing graph
The main GAUSS graphing functions shown in the second column of the table below, treat each column of the input data as a separate series. This makes it very simple to plot data in which the plot type and series length are the same.
Plot type | Main function | Add function |
---|---|---|
Scatter | plotScatter | plotAddScatter |
XY | plotXY | plotAddXY |
Time Series | plotTS, plotTSHF | plotAddTS, plotAddTSHF |
Bar / Histogram | plotBar, plotHist, plotHistF, plotHistP | plotAddBar, plotAddHist, plotAddHistF, plotAddHistP |
Box | plotBox | plotAddBox |
We can add a new series to a graph created with one of the main functions in column two, by using one of the add functions in column three.
The add functions take the same inputs as the main functions. For example, the code below will create a scatter plot and then add a line at the mean of the Y variable.
// Load variables from dataset
data = loadd("phillips-us-1955.csv", "CPI + Unemployment");
// Plot initial scatter points
plotScatter(data[.,1], data[.,2]);
// Compute mean of unemployment
mu_u = meanc(data[.,2]);
// Compute range of CPI
range_cpi = minc(data[.,1]) | maxc(data[.,2]);
// Add a line across the range of the CPI
// data at the height of the average of the
// Unemployment data
plotAddXY(range_cpi, mu_u | mu_u);
The code will create a graph which looks like this:
How to style a graph with plotAdd functions
When styling graphs made with plotAdd
functions, an important distinction is made between axis level attributes and series level attributes.
Axis level attributes
Axis level attributes are the elements which are generally independent of the data. For example, the title, axes labels and tick labels are axis level attributes. These attributes become fixed during the creation of the initial plot and cannot be modified with a plotAdd
call.
The axis level attributes include:
- Title
- Axes labels
- Tick label font and format
- Axes line attributes
- X and Y-axis range (if explicitly set with
plotSetXRange
/plotSetYRange
) - Canvas size
- Legend location, font and background style
For example, if we modify our earlier code like this:
// Load variables from dataset
data = loadd("phillips-us-1955.csv", "CPI + Unemployment");
struct plotControl myPlot;
myPlot = plotGetDefaults("scatter");
// Set axis element, title
plotSetTitle(&myPlot, "Phillips Curve in the US", "Helvetica Neue", 14);
// Plot initial scatter points
// Note that we pass in 'myPlot' this time
plotScatter(myPlot, data[.,1], data[.,2]);
// Compute mean of unemployment
mu_u = meanc(data[.,2]);
// Compute range of CPI
range_cpi = minc(data[.,1]) | maxc(data[.,2]);
// Attempt to reset axis level property
myPlot = plotGetDefaults("xy");
plotSetTitle(&myPlot, "Title which will be ignored");
// Attempt to set axis level property
plotSetXLabel(&myPlot, "CPI");
// Add a line across the range of the CPI
// data at the height of the average of the
// Unemployment data
// Note that we pass in 'myPlot' this time
plotAddXY(myPlot, range_cpi, mu_u | mu_u);
we will get a graph that looks like this:
The code reset the title before the plotAddXY
call from "Phillips Curve in the US" to "Title which will be ignored". The code also reset the X-axis label from nothing to "CPI". As you see both of those changes were ignored because axis level attributes must be set on the initial plot creation.
Series level attributes
Series level attributes are those settings which are specific to the data series. For example, line color, line thickness and scatter symbol are all series level attributes.
The series level attributes include:
- Line color, thickness, style
- Scatter symbol type, color
- Legend text
You have the choice whether to set the series level attributes at the time of the initial plot call or to modify them during the plotAdd
. Passing a plotControl
structure to the plotAdd
call tells GAUSS to update the series level attributes. If you do not pass a plotControl
structure to the plotAdd
call, GAUSS will continue using the attributes set in the initial plot creation.
More specifically, when a plotControl
structure is not passed into the plotAdd
call, GAUSS will continue cycling through the previously set series attributes where the last plot call left off. It is like the last plot call left a bookmark in the list of series attributes.
For example, you may have noticed that the mean line in the first graph we made is blue, while the line in the second graph is orange. Looking at the default colors in the table below, we can see why.
Default colors | ||||
---|---|---|---|---|
#FC8D62 | #8DA0CB | #66C2A5 | #E78AC3 | #A6D854 |
When we created the first graph, the initial plotScatter
call set the graph's line and scatter symbol colors to the list of five colors shown above. The initial scatter plot used the first orange color, so the plotAddXY
without a plotControl
structure started with the blue color next in the list.
In the second graph, since the plotAddXY
call was given a plotControl
structure, this told GAUSS to create the new line using the series settings from the new plotControl
structure starting from the beginning. Since we had not changed the sequence of colors, the added mean line used the same orange color as the scatter plot.
How to create the full Phillips Curve graph
It is generally considered a best practice to apply all axes and series level settings with the initial plot call. That is what we will do here. Our steps will be:
- Load the data.
- Set the axis level attributes.
- Set the series level attributes.
- Draw the initial scatter plot.
- Add the second set of scatter points.
- Add the two regression lines.
Step 1: Load and define data
We will start by loading the data for both sets of scatter points. To keep our focus on the graphs, we have simply defined the start and endpoints for the regression lines instead of computing them.
// Load inflation and unemployment data
// (CPI percent change) for 1955–71 and 1974–84
phil_1955 = loadd("phillips-us-1955.csv", "CPI + Unemployment");
phil_1974 = loadd("phillips-us-1974.csv", "CPI + Unemployment");
// For simplicity, define start and
// end points for regression lines
reg_x_1955 = { -0.8538, 6.4246 };
reg_y_1955 = { 5.6356, 3.8925 };
reg_x_1974 = { 2.3590, 14.5923 };
reg_y_1974 = { 9.0762, 5.9248 };
Step 2: Set the axis level attributes
Next, we will apply the most straightforward axis level settings. The only point to note about these is that they cannot be changed by a plotAdd
call.
// Declare 'myPlot' to be a plotControl structure
// and fill with default settings
struct plotControl myPlot;
myPlot = plotGetDefaults("scatter");
plotSetTitle(&myPlot, "Phillips Curve in the US", "Arial", 14);
// Note that the font settings applied in
// plotSetXLabel will control the Y-label font
// as well. Setting the Y-label without font
// settings will leave previous font settings unchanged
plotSetXLabel(&myPlot, "Inflation", "Arial", 14);
plotSetYLabel(&myPlot, "Unemployment");
plotSetYRange(&myPlot, 0, 12.5);
Setting the legend will make us think a little more. We will set the legend background to be completely transparent, that is clearly an axis level setting.
However, the function plotSetLegend
allows us to set the legend location which is also an axis level attribute in addition to the text for each item in the legend which is a series level attribute.
Fortunately, since we have decided to set all axis and series level attributes before the first scatter plot, we don't need to worry about this distinction. We simply need to make sure that we set our desired text for every data series we plan to add to the plot.
// Make the legend background completely
// transparent, i.e. 0% opacity
plotSetLegendBkd(&myPlot, 0);
// The text for each data series we plan to add
// to our graph, in order. Note that the last
// item is an empty string, "", because we only
// want a legend item for one of the regression lines.
leg_text = "1955-1971" $| "1974-1984" $| "Regression line" $| "";
plotSetLegend(&myPlot, leg_text, "top left inside");
Step 3: Set the series level attributes
After setting the legend text, we have one more series level attribute to set. That is the line or scatter symbol color. To keep things simple and focused, we will just use the string hex codes and a text name for the colors.
However, GAUSS has many built-in color palettes which are attractive and easy to use. Our blog post 5 Hacks for Creating Custom GAUSS Graphics includes a section which links to the available color palettes and the functions needed to access them.
// Set the colors for the scatter plot, an orange and green
// color, followed by one "black" for each regression line
clrs = "#66C2A5" $| "#FC8D62" $| "black" $| "black";
plotSetLineColor(&myPlot, clrs);
Step 4: Draw the initial scatter plot
We'll now draw the initial scatter plot, containing the inflation and unemployment data from 1955 to 1971. We will pass in the plotControl
structure which contains all the settings we have applied for every data series we plan to add to the graph.
// Draw initial scatter plot
plotScatter(myPlot, phil_1955[.,1], phil_1955[.,2]);
This results in a graph which looks like this:
Step 5: Add the second scatter plot
// Add next scatter series, using previously
// set series level attributes since a
// plotControl structure was not passed in
plotAddScatter(phil_1974[.,1], phil_1974[.,2]);
This code will result in a plot which looks like this:
Most of this graph should look like you expect. The added scatter plot is using the second legend text item and the second color. The title, axes labels, and Y-axis range have stayed just as we set them.
The X-axis range, however, has changed. Since we made a big deal earlier that axis level attributes do not change after a plotAdd
, you may wonder why the X-axis range did not stay the same.
The reason for this is that the default state for the X and Y-axis ranges is to expand to fit the data being drawn. Since we had not set the X-axis range, it remained at "expand to fit data".
However, if we had set the X-axis range to the range of the first scatter plot, -1 to +7, then the X-axis range would have remained unchanged. This would mean that all data to the right of X=7 would be out of view.
Step 6: Add the regression lines
Since the plotAdd
functions, like the standard plot functions, will plot each column of data as a separate series, either of these code snippets will produce the same result:
// Option 1: Add the regression lines in separate calls
plotAddXY(reg_x_1955, reg_y_1955);
plotAddXY(reg_x_1974, reg_y_1974);
// Option 2: Add both regression lines in one call
plotAddXY(reg_x_1955 ~ reg_x_1974, reg_y_1955 ~ reg_y_1974);
Remember to only use only one of the above options. This should result in the plot shown at the top of this tutorial.
Conclusion
Congratulations! You have learned:
- Which plot attributes are axis level elements vs series level elements.
- That axis level elements need to be set on the first plot call.
- That series level elements can be set on the first plot call, or during a
plotAdd
, if aplotControl
structure is passed in.
These concepts you've learned will make it much easier for you to create and style GAUSS graphs with data of different lengths and different plot types.