GAUSS rfClassifyFit example

This example uses the red wine quality dataset from Cortez, et al., 2009 to fit a random forest classification model. Predictions are then made from the fitted model. The dataset contains 200 observations and includes 12 variables: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, and quality.

Split the dataset

Prior to running the model, the testTrainSplit function is used to split the model data into test and training sets. The testTrainSplit'function is compatible with the GAUSS formula string syntax. This creates the test and train datasets without loading the full dataset. In the classification model quality is used to create the response variable and all variables excluding density and chlorides. The response variable is an indicator variable equal to 1 if the quality is greater than 6:

// Load wine quality dataset
dataset = getGAUSSHome() $+ "pkgs/gml/examples/winequality-red.csv";

// Split data into training and test sets
{y_train, y_test, x_train, x_test} = testTrainSplit(dataset, "quality ~ . ", 0.7);

// Create indicator variable
y_test = y_test .>6;
y_train = y_train .>6;

Estimate The Model

The rfClassifyFit function is used on the y_train and x_train datasets to fit a random forest regression model. All results are stored in an rfModel structure:

// Output structure
struct rfModel rfm;

// Fit training data using random forest
rfm = rfClassifyFit(y_train, x_train, rfc);

Make predictions

Once the model is fit predictions can be made from the x_test dataset using rfClassifyPredict function. The rfClassifyPredict function requires two inputs, a rfModel structure and a data matrix of predictors:

// Make predictions using test data
predictions = rfClassifyPredict(rfm, x_test);

// Print predictions
print predictions~y_test;
print "accuracy: " meanc(predictions .== y_test);

Output

The output from the code above looks similar to :

    0.00000000       0.00000000
    0.00000000       0.00000000
    0.00000000       0.00000000
    0.00000000        1.0000000
    0.00000000       0.00000000
    0.00000000       0.00000000
    0.00000000        1.0000000
    0.00000000       0.00000000
    0.00000000       0.00000000
    0.00000000       0.00000000
accuracy =       0.88541667 

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.