Preparing and Cleaning FRED data in GAUSS

Introduction

The FRED data repository provides access to a wide variety of economic time-series data. It is widely used in research across a number of fields including social sciences, economics, and finance.

FRED provides extensive access to data. However, that data must be pre-processed and properly cleaned before usage. This can be a time-consuming and tedious part of data analysis.

In today’s blog, we look at how to save time and reduce errors using GAUSS’s new data management tools.

Using the quarterly real GDP dataset from the FRED database we explore GAUSS’s new data management tools.

In particular, we examine how to:

  • Deal with irregular dataset headers.
  • Change variable names.
  • Filter dates and change the date display.

Simple Data Import Example

The real GDP series from the FRED database was downloaded and is stored in the file fred_rgdp.xlsx. To start, we open our dataset in the Data Import window by double-clicking on the filename, fred_rgdp.xlsx in the Project Folders tab.

This opens the Data Import tool:

The data preview window indicates that there are a number of issues to address in our raw dataset:

  • The raw data includes an irregular header.
  • The variable names are not correct.

Updating the header row

We can see in our data preview that more appropriate variable names are located row 11. This can be specified using the Header Row text box on the Import Options tab:

The variable names are automatically updated in the preview to reflect the names contained in row 11. Also notice that the Row Range input is automatically updated to 12, one past the header row.

Importing data

The FRED real GDP data is now ready for import and we can select Import to bring the data into the GAUSS workspace.

Note that when we do, the GAUSS code used to import the data is auto-generated and can be used to replicate the interactive steps we performed.

The auto-generated code can be accessed a few different ways:

  1. The command history. To send the code from the command history to a file, right-click on the command in the History list and select Send to File.
  2. The Program Input/Output window. The code snippet can be directly copied by selecting the code snippet, right-clicking, and selecting Copy.

Cleaning Data in Memory

Suppose that after importing the real GDP data, we decide we want to perform some data cleaning steps.

Specifically, let’s consider performing the following:

  • Changing the name of the GDPC1 variable.
  • Changing the display format of the observation_date.
  • Filtering by date to only include observations from 1970 or later.

To begin, double-click fred_rgdp in the Symbols window on the left to open it in the symbol editor.

Changing a variable name

To change a variable name, click the triangle next to the GCPC1 variable name to open the Variable settings menu and select Rename:

Enter the new variable name, real_gdp, in the Rename Column dialog box and select OK:

Changing the date display format

The observation dates for our quarterly data, contained in observation_date, are currently displayed in the ISO-8601 (“Year-Month-Day”) format (1947-01-01). Since we are working with quarterly data, let’s update this to display the year and quarter, 1947-Q1.

This can be done by expanding the Variable settings menu for observation_date and selecting Properties. This opens the Specify Date Format dialog:

We enter our new date display format, %Y-Q%q, in the Date Display text box and click OK.

The data editor preview now shows:

The asterisk on the matrix name tab indicates that our dataframe fred_rgdp has unsaved changes.

Filtering dates

To finish our real GDP FRED data cleaning example, let's filter the dataframe to include only observations that occur in 1970 and later.

To do this, select the Filter tab and set:

  1. The Variable equal to observation_date (this is the default).
  2. The Operation to >=.
  3. The Value equal to 1970-01-01.

After setting all three parts of the filter, we click the + button to add the filter. When we do this, the data preview updates to indicate which values will be included after hitting Apply:

Conclusion

The first, and least glamorous, step in successful data modeling is data cleaning -- the quality of any model depends on the quality of the data coming in.

Today, we’ve examined how the interactive data management tools in GAUSS can make this process less painful and less tedious.

Specifically, we’ve seen how to interactively:

  • Remove irregular headers.
  • Change variable names.
  • Filter by date.
  • Change a date display format.

Further reading:

2 thoughts on “Preparing and Cleaning FRED data in GAUSS

Leave a Reply