Introduction
Time series data with inconsistently formatted dates and times can make your work frustrating. Dates and times are often stored as strings or text data and converting to a consistent, numeric format might seem like a daunting task.
Fortunately, GAUSS includes easy tools for loading and converting dates and times.
Loading Dates Into GAUSS
One of the most convenient things about GAUSS dataframes, is that they allow you load different data types, including dates with little or no additional steps. Fore example, Many date formats are automatically identified by GAUSS, and will load using the GAUSS loadd function, without any additional steps.
Consider the file bike_count.xlsx which contains string dates in a fairly standard Month/Day/Year HH:MM:SS
format:
Because the date column is stored in a common format, GAUSS will automatically detect that it should be loaded as a date.
// Load all data from bike_count.xlsx
data = loadd("bike_count.xlsx");
// Check column types
getColTypes(data);
The column types confirm that the first column we've loaded is a date column:
type date number
The GAUSS Date Variable
The GAUSS date variable allows us to display dates in recognizable form while storing related POSIX date and time format under the hood. This makes them easy to read, while keeping them compatible with GAUSS date and time functions.
For example, we could find the time difference in hours between the second and first observation of time.
// Find difference in hours
timediffposix(data[2, "Time"], data[1, "Time"], "hours")
This tells us that one hour has passed between the first observation and the second:
1
Loading Non-Standard Formats
Though GAUSS can recognize a large number of date and time formats, you may come across non-standard dates and times. These can still be loaded as date variables but you will need to use:
- The
date
keyword to indicate that you are loading a date variable. - The correct BSD specifier format string to tell GAUSS the format of your dates.
Consider a file named temperature.csv
with the following contents:
"day","Phoenix","Seattle" "2018, November-08",81,51 "2018, November-09",79,45 "2018, November-10",79,50 "2018, November-11",24,50 "2018, November-12",66,59 "2018, November-13",65,56 "2018, November-14",70,55
The date format shown above is not one that GAUSS will automatically recognize but we can load it as a date variable using a BSD specifier.
The date format 2021, June-09
is described by the BSD specifier format string, %Y, %B-%d
:
Original contents | Description | Type | Format string contents |
---|---|---|---|
2021 | The year with century as a decimal number. | Date | %Y |
, | A comma. | Literal | , |
June | The full month name. | Date | %B |
- | A dash. | Literal | - |
31 | The day of month as a number (01-31). | Date | %d |
There are a few helpful things to note about our BSD specifiers:
- They are enclosed within single ticks.
- Any literal symbol can be added into the specifiers, as we did with the comma and dash. For example, if our dates look like
2021 # May-31
our BSD specifiers should be%Y # %b-%d
.
Now we can load our temperature data using loadd
:
temp_data = loadd("temperature.csv", "date(day, '%Y, %B-%d') + Phoenix + Seattle");
Conclusion
In this blog you've learned:
- How to load common date and time data using
loadd
. - What the GAUSS date variable is.
- How to load non-standard dates using the
date
keyword.
Further Reading
1.Dates and Times Made Easy
2.What Is A GAUSS Dataframe And Why Should You Care?
Recognizable date formats
20181031 31-10-2018 10-31-2018 2018-10-31 10/31/2018 2018/10/31 31 October 2018 201810311830 20181031 1830 31-10-2018 18:30 2018-10-31 18:30 10/31/2018 18:30 2018/10/31 18:30 31 October 2018 18:30 20181031183000 20181031 183000 18:30:00 31-10-2018 18:30:00 2018-10-31 18:30:00 10/31/2018 18:30:00 2018/10/31 18:30:00 10/31/2018 18:30:00 PM 2018/10/31 18:30:00.000 10/31/2018 18:30:00.000 31 October 2018 18:30:00 2018-10-31T18:30 2018-10-31T18:30T 2018-10-31T18:30:00 2018-10-31T18:30:00 2018-10-31T18:30:00.000 2018-10-31T18:30:00.000