Reclassification and recoding
GAUSS 16 comes with new functions that make it easy to transform categorical variables from text labels to numeric labels, numeric labels back to text labels, or place numeric ranges into separate categories.
The first function is reclassify. You can use it to:
- Reclassify text labels to numeric category labels.
- Reclassify numeric labels to text labels.
- Reclassify vectors individually, an entire matrix or a multidimensional array.
Reclassify text labels to numeric categories
//Create a 7x1 string vector X = "EU" $| "GBP" $| "USD" $| "GBP" $| "USD" $| "EU" $| "EU"; //Use 'uniquesa' to create a string vector //with the unique strings in 'X' listed //in alphabetical order from = uniquesa(X); //Create 3x1 vector of numeric category labels to = { 0, 1, 2 }; //Reclassify elements in 'X' from // EU -> 0 // GBP -> 1 // USD -> 2 X_numeric = reclassify(X, from, to);
-
The second new function is reclassifyCuts, which
- Places data into numeric categories based upon range.
- Allows intervals to be open or closed on the right.
- Takes vector, matrix or multidimensional array inputs.
Data scaling
One of the most common reasons for a maximum likelihood estimation or optimization routine to fail is poorly scaled data. The new function, rescale gives you 8 different scaling options with one simple line of code. You can either:
Use a named method and return the data plus scaling factors
//Scale each column of 'x_train'
{ x_train, location, scale } = rescale(x_train, standardize");
The location and scale passed in later to scale another sample from the same data set:
//Scale each column of 'x_test' with scale and
//location parameters created from training data above
x_test = rescale(x_test, location, scale);