Introduction
The new GAUSS 23 is the most practical GAUSS yet! It's built with the intention to save you time on everyday research tasks like finding, importing, and modeling data.
Data at Your Fingertips
- Access millions of global economic and financial data series with FRED and DBnomics integration.
- Aggregate, filter, sort, and transform FRED data series during import.
- Search FRED series from GAUSS.
Load Data from Anywhere on the Internet
// Load an Excel file from the aptech website
file_url = "https://www.aptech.com/wp-content/uploads/2019/03/skincancer2.xlsx";
skin_cancer = loadd(file_url);
// Print the first 5 rows of the dataframe
head(skin_cancer);
State Lat Mort Ocean Long Alabama 33 219 1 87 Arizona 34.5 160 0 112 Arkansas 35 170 0 92.5 California 37.5 182 1 119.5 Colorado 39 149 0 105.5
Simplified Data Loading with...
Automatic Type Detection
Previous versions required formula strings with keywords to specify date, string, and categorical variables from some file types.
Smart data type detection in GAUSS 23 figures out the variable type so you do not have to specify it manually. Automatically detects nearly 40 popular date formats.
Automatic Header and Delimiter Detection
Replace old code like this:
load X[127,4] = mydata.txt;
with
X = loadd("mydata.txt");
Automatically handles
- Present or absent header row.
- Delimiter (tab, comma, semi-colon or space).
- Number of rows and columns.
- Variable types.
First-Class Dataframe Storage
No new code to learn, just use the .gdat
file extension with loadd
and saved
to load and store your dataframes.
Expanded Quantile Regressions
hitters = loadd("islr_hitters.xlsx");
tau = 0.90;
call quantileFit(hitters, "ln(salary) ~ AtBat + Hits + HmRun", tau);
Linear quantile regression =============================================================================== Valid cases: 263 Dependent variable: ln_salary_ Missing cases: 0 Deletion method: None Number variables: 3 DF model 3 DF residuals 259
=============================================================================== Name Coeff. Standard t-value P >|t| lb ub Error
------------------------------------------------------------------------------- Tau = 0.90
CONSTANT 6.285 0.194 32.433 0.0000 5.905 6.664 AtBat -0.001 0.002 -0.737 0.4621 -0.004 0.002 Hits 0.008 0.005 1.526 0.1281 -0.002 0.018 HmRun 0.017 0.009 1.951 0.0521 -0.000 0.034
- New kernel estimated variance-covariance matrix.
- Up to 4x speed improvement.
- Expanded model diagnostics including pseudo R-squared, coefficient t-statistics and p-values, and degrees of freedom.
Kernel Density Estimations
- Estimate unknown probability functions with 13 available kernels.
- Automatic or user-specified bandwidth.
- Kernel density plots with easy-to-use options for customization.
Improved Covariance Computations
// Load data
fname = getGAUSShome("examples/auto2.dta");
auto = loadd(fname);
// Declare control structure
struct olsmtControl ctl;
ctl = olsmtControlCreate();
// Turn on residuals
ctl.res = 1;
// Turn on HAC errors
ctl.cov = "hac";
call olsmt(auto, "mpg ~ weight + foreign", ctl);
Valid cases: 74 Dependent variable: mpg Missing cases: 0 Deletion method: None Total SS: 2443.459 Degrees of freedom: 71 R-squared: 0.663 Rbar-squared: 0.653 Residual SS: 824.172 Std error of est: 3.407 F(2,71): 69.748 Probability of F: 0.000 Durbin-Watson: 2.421 Std Prob Std Cor with Variable Estimate Error t-value >|t| Est Dep Var ------------------------------------------------------------------------------- CONSTANT 41.6797 1.8989 21.95 0.000 --- --- weight -0.00659 0.0006 -11.99 0.000 -0.885 -0.807175 foreign: Foreign -1.65003 0.9071 -1.819 0.073 -0.131 0.393397 Note: HAC robust standard errors reported
- New procedure for computing Newey-West HAC robust standard errors.
- All robust covariance procedures now include the option to turn off small sample corrections.
- Expanded dataframe and formula string compatibility.
New Functions for Data Cleaning and Exploration
between
Returns a binary vector indicating which observations fall in a specified range. It can be used with selif
to select rows. Dates and ordinal categorical columns are supported.
// Return a 1 if the observation is between the listed dates
match = between(unemp[.,"DATE"], "2020-03", "2020-08");
// Select the matching observations
unemp = selif(unemp, match);
DATE UNRATE 2020-03-01 4.4000 2020-04-01 14.700 2020-05-01 13.200 2020-06-01 11.000 2020-07-01 10.200 2020-08-01 8.4000
where
Provides a convenient and intuitive way to combine or modify data. It returns elements from either a
or b
depending upon condition
.
// Daily hotel room price
hotel_price = { 238, 405, 405, 329, 238 };
// Daily temperature forecast
temperature = { 89, 94, 110, 103, 97 };
// Decrease the price by 10% if the
// temperature will be more than 100 degrees
new_price = where(temperature .> 100,
hotel_price .* 0.9,
hotel_price);
new_price = 238 405 364.50 296.10 238
- Explore sample symmetry and tails with
skewness
andkurtosis
functions. - Test for normality using the new
JarqueBera
function.
Speed-ups and Efficiency Improvements
- Up to 10x speed-up and 50% decrease in memory usage for lag creation with
shiftc
andlagn
. - Up to 2x speed-up (or more for large data) and 50% decrease in memory usage for
miss
,missrv
. - Up to 2x speed-up (or more for large data) and 50% decrease in memory usage for element-by-element mathematical (
+
,-
,.*
,./
), relational (.>
,.<
,.>=
,.<=
,.==
,.!=
) and logical (.and
,.not
,.or
,.xor
) operators. - Up to 100x speed-up for some cases with
indsav
. - Up to 40% speed-up for
reclassify
. - Up to 3x speed-up for loading Excel® files with
loadd
and the Data Import Window.
Conclusion
For a complete list of all GAUSS 23 offers please see the complete changelog.