Introduction
Time series data with inconsistently formatted dates and times can make your work frustrating. Dates and times are often stored as strings or text data and converting to a consistent, numeric format might seem like a daunting task.
Fortunately, GAUSS includes an easy tool for loading and converting dates and times – the date
keyword.
Using the date keyword
The date
keyword is used with GAUSS's formula string syntax to indicate to GAUSS that a variable is a date. The rest of the work is done internally and GAUSS automatically converts dates and times from strings to the POSIX format.
Consider the file bike_count.xlsx which contains string dates in a fairly standard Month/Day/Year HH:MM:SS
format:
Two steps must be taken to load the dates using the GAUSS loadd
function:
- A
$
in front of the variable name tells GAUSS that the variable is a string. - The
date
operator tells GAUSS that the variable contains dates.
data = loadd("bike_count.xlsx", "date($Time) + Count");
The POSIX date format
The date
keyword converts string formatted dates into the POSIX date and time format. This format represents a date as a scalar number reflecting the seconds that have passed since midnight on January 1st, 1970.
Looking at the data we loaded earlier, the POSIX dates are stored in the first column:
Date formats
One of the most convenient features of the date
keyword is that it internally detects a wide variety of string date and time formats. For example, dates stored as
"10/31/2018 15:00:00"
or
"31-10-2018 15:00:00"
or
"2018-10-31 15:00:00"
can all be automatically detected and converted to POSIX time using the date
keyword. A full list of acceptable date formats is found at the bottom of this blog.
Loading non-standard formats
Though GAUSS can recognize a large number of date and time formats, you still may come across non-standard dates and times outside of the scope of the date
function.
Consider a file named temperature.csv
with the following contents:
"day","Phoenix","Seattle" "November-8-2018",81,51 "November-9-2018",79,45 "November-10-2018",79,50 "November-11-2018",24,50 "November-12-2018",66,59 "November-13-2018",65,56 "November-14-2018",70,55
The date format shown above is not one that GAUSS will automatically recognize. In these cases, it is helpful to write a custom GAUSS function which uses the GAUSS string to POSIX conversion function, strctoposix
. The strctoposix
function uses format specifiers to load custom dates.
Below is a GAUSS procedure which will take dates in the format shown above (i.e. November-8-2018
) and convert it to POSIX time.
proc (1) = loadMyDates(str);
local fmt;
// %B - full month name
// %e - day of month
// %Y - 4 digit year
fmt = "%B-%e-%Y";
retp(strctoposix(str, fmt));
endp;
This function can now be used in place of the date
keyword to load your dates as shown below.
temp_data = loadd("temperature.csv", "loadMyDates($day) + Phoenix + Seattle");
This will create a GAUSS matrix named temp_data
with the contents shown below:
Conclusion
In this blog you've learned:
- How to load date and time data using the
date
keyword. - What date formats are compatible with the
date
keyword. - What format the
date
keyword converts string dates to. - How to use a custom function to load non-standard dates.
Code and data from this blog can be found here.
Recognizable date formats
20181031 31-10-2018 10-31-2018 2018-10-31 10/31/2018 2018/10/31 31 October 2018 201810311830 20181031 1830 31-10-2018 18:30 2018-10-31 18:30 10/31/2018 18:30 2018/10/31 18:30 31 October 2018 18:30 20181031183000 20181031 183000 18:30:00 31-10-2018 18:30:00 2018-10-31 18:30:00 10/31/2018 18:30:00 2018/10/31 18:30:00 10/31/2018 18:30:00 PM 2018/10/31 18:30:00.000 10/31/2018 18:30:00.000 31 October 2018 18:30:00 2018-10-31T18:30 2018-10-31T18:30T 2018-10-31T18:30:00 2018-10-31T18:30:00 2018-10-31T18:30:00.000 2018-10-31T18:30:00.000