GAUSS kmeansFit Example

This example uses k-means Clustering method to analyze a "iris.dat" dataset. The features used in this example to compute clusters include :

  • sepal width
  • sepal_length
  • petal length
  • petal_length

Split the dataset

The loadd function is used to load the data from the dataset. In addition, prior to fitting the k-means model, the splitData function is used to split the model data into a test and training set.

new;
cls;
library gml;

// Specify dataset name with full path
dataset = getGAUSSHome() $+ "pkgs/gml/examples/iris.dat", ". -group");

// Step One: Load data from data set
x = loadd(dataset, ". -group");

// Split data into x_train and x_test
{ x_train, x_test } = splitData(x, 0.70);

Estimate The Model

The kmeansFit function is used on the x_train matrix to cluster the data using the kmeans model. All results are stored in a kmeansModel structure:

// Number of clusters
n_clusters = 3;

// Step One declare kmeansModel struct
struct kmeansModel mdl;

// Step Two: Fit kmeans model
mdl = kmeansFit(x_train , n_clusters);

Make predictions

Once the model is fit, predictions can be made from the x_test dataset using kmeansPredict function. The kmeansPredict function requires two inputs, a kmeansModel structure or centroid matrix and a data matrix of predictors:

// Step Three: Fit data
predictions = kmeansPredict(mdl, x_test);

Have a Specific Question?

Get a real answer from a real person

Need Support?

Get help from our friendly experts.