Dates and Times Made Easy

Introduction

Working with dates in data analysis software can be tedious and error-prone. The new GAUSS date type, introduced in GAUSS 21, can save you time and prevent frustration and errors.

The date data type is part of the GAUSS dataframe alongside the category, string, and numeric type.

In this blog, we will explore the advantages the date type has to offer, including:

  • Loading and viewing dates side-by-side with other data types.
  • Viewing and displaying dates in easy-to-read formats.
  • Easily changing the date format.
  • Using familiar date formats for filtering data.

Loading date variables

Date variables can be loaded interactively using the Data Import window or programmatically using loadd. To learn more about both options, we will use a sample time series dataset containing quarterly real GDP growth data spanning from 1947-Q2 to 2021-Q1.

After modifying the variable names to improve readability, the first five rows of the CSV file look like this:

Date,rgdp
1947-04-01,-1.0
1947-07-01,-0.8
1947-10-01,6.4
1948-01-01,6.2

Interactively loading dates

First, we will double-click the filename, real_gdp.csv, in the Project Folders window to open the Data Import window.

Loading data in GAUSS from filename

This will load a preview of our data: Previewing data in GAUSS

From this preview we can see that:

  • GAUSS has automatically detected that Date, highlighted in green, should be loaded as a date variable.
  • The Date variable is in a standard format, 4/1/1947, which is auto-detected by GAUSS.

After loading our data, we can preview the new dataframe:

print real_gdp[1:5, .];
            Date       rgdp
      04/01/1947       -1.0
      07/01/1947       -0.8
      10/01/1947        6.4
      01/01/1948        6.2
      04/01/1948        6.8

With the date data type, we can see our dates, in their original and familiar format, side-by-side with rgdp.

Non-standard date formats

Set GAUSS date format.

GAUSS can automatically detect most standard date formats but if it doesn't detect our dates, we can select Properties from the drop-down to the right of the variable name to open the Specify Date Format window.

Selecting date formats

The Specify Date Format window is an easy-to-use interactive tool which:

  • Allows us to easily specify the date format of raw data.
  • Includes a BSD specifier list that allows us to quickly select the appropriate specifiers.
  • Allows us to filter the specifiers, to simplify specifier selection.
  • Dynamically updates a Sample Date so we can view the date format built by our selected specifiers.

Programmatically loading dates

We can use the date keyword in formula strings to indicate that loadd should load a variable as a date:

real_gdp = loadd("real_gdp.csv", "date($Date) + rgdp");

In this case, because Date is in a standard format we don't need to include anything else to load our data.

If our date variable was not in a standard form, we would need to include BSD string specifiers to indicate the format.

For example, suppose our dates look like 2021, June-09. The formula string would be modified to specify our format:

real_gdp = loadd("real_gdp.csv", "date($Date, '%Y, %B-%d') + rgdp");

The date format 2021, June-09 is described by the BSD specifier format string, %Y, %B-%d:

Original contentsDescriptionTypeFormat string contents
2021The year with century as a decimal number.Date%Y
,A comma.Literal,
JuneThe full month name.Date%B
-A dash.Literal-
31The day of month as a number (01-31).Date%d

There are a few helpful things to note about our BSD specifiers:

  • They are enclosed within single ticks.
  • Any literal symbol can be added into the specifiers, as we did with the comma and dash. For example, if our dates look like 2021 # May-31 our BSD specifiers should be %Y # %b-%d.

Changing display formats

Suppose that we want to display our quarterly real GDP data in a format that more clearly indicates the quarter, 1947-Q2. This can easily be done by:

  1. Opening the dataframe real_gdp in the Data Editor.
  2. Opening the Date variable Properties using the drop-down menu in the Management pane. Changing date format after loading in GAUSS.
  3. Entering our desired date format in the Date Display text box from the Specify Date Format window, using BSD date format specifiers:

Once we've done this, the variable name, Date, will be highlighted in red. This indicates that we have unsaved changes and still need to click Apply to save our changes.

Once we click Apply, the code GAUSS uses to change the date formats is generated in the Program Input/Output window. You can copy-and-paste this code into a program file to reproduce these steps in the future without using the window.

real_gdp = setcoldateformats(move(real_gdp), "%Y-Q%q", "Date");

Now if we print the first five observations of real_gdp, the quarters are clearly displayed:

print real_gdp[1:5, .];
            Date       rgdp
         1947-Q2       -1.0
         1947-Q3       -0.8
         1947-Q4        6.4
         1948-Q1        6.2
         1948-Q2        6.8 

Using dates for data cleaning

The date data type not only makes loading and reading dates easier, it also has a number of useful data cleaning applications like data indexing and filtering.

Example #1: Viewing data for one date

Suppose we are interested in comparing the real GDP at the beginning of the 2008 Financial Crisis (2008-Q1) to the real GDP at the beginning of the COVID-19 pandemic (2020-Q1) in the U.S.

Let's print the GDP for the first of these dates using selif and a conditional statement checking for dates equal to 2008-Q1 and 2020-Q1:

selif(real_gdp, real_gdp[., "Date"] .$== "2008-Q1");
       Date             rgdp
       2008-Q1          -2.3
selif(real_gdp, real_gdp[., "Date"] .$== "2020-Q1");
       Date             rgdp
       2020-Q1          -5.0

We can see two important advantages of using the date data type in this example:

  • We can index data using the Date variable name (this is true of any variable in a dataframe).
  • Using the $ in our conditional statement allows us to use the current date labels to select subsets of data ("YYYY-Qq").

If we didn't use the dataframe and date format, filtering by dates would be far less intuitive and readable:

selif(real_gdp, real_gdp[., 1] .== dttoposix(20080101));
           Date         rgdp
       20080101         -2.3

Example #2: Filtering a subset of data

While we may want to occasionally view a single data point, usually we will want to filter a subset of data. For example, suppose that we only want to use observations occurring on or after 1970-Q1:

data = selif(real_gdp, real_gdp[., "Date"] .$>= "1970-Q1");

Now if we view the first five observations:

print data[1:5, .];
            Date      rgdp
         1970-Q1      -0.6
         1970-Q2       0.6
         1970-Q3       3.7
         1970-Q4       4.2
         1971-Q1      11.3

As we saw before, using the $ in our conditional statement tells GAUSS that we want to use our current date format label for filtering.

While it is often nice to reference our data in a preferred format, sometimes you will want to write code that will work for data imported with any format style.

You can do this by using the date format YYYY-MM-DD along with a conditional statement that does not use the $.

Let's look at a couple of examples. First, let's include data that occurs in or after my year of birth, 1978:

data = selif(real_gdp, real_gdp[., "Date"] .>= "1978");

Now if we view the first five observations:

print data[1:5, .];
            Date       rgdp
         1978-Q1        1.3
         1978-Q2       16.4
         1978-Q3        4.1
         1978-Q4        5.5
         1979-Q1        0.7

If we want to refine this, we can add a month to our conditional statement:

data = selif(real_gdp, real_gdp[., "Date"] .>= "1978-06");
print data[1:5, .];
            Date       rgdp
         1978-Q3        4.1
         1978-Q4        5.5
         1979-Q1        0.7
         1979-Q2        0.4
         1979-Q3        3.0

Conclusion

In today's blog, we've explored the new date data type and ways it makes life easier when working with dates. After today, you should have a better understanding of the advantages offered by the date data type including:

  • Loading and viewing dates side-by-side with other data types.
  • Viewing and displaying dates in easy-to-read formats.
  • Easily changing date format.
  • Using familiar date formats for filtering data.

Leave a Reply