GAUSS kmeansFit Example

This example uses k-means Clustering method to analyze a "iris.dat" dataset. The features used in this example to compute clusters include :

sepal width
sepal_length
petal length
petal_length

Split the dataset

The loadd function is used to load the data from the dataset. In addition, prior to fitting the k-means model, the splitData function is used to split the model data into a test and training set.


new;
cls;
library gml;
 
// Specify dataset name with full path
dataset = getGAUSSHome() $+ "pkgs/gml/examples/iris.dat", ". -group");
 
// Step One: Load data from data set
x = loadd(dataset, ". -group");
 
// Split data into x_train and x_test
{ x_train, x_test } = splitData(x, 0.70);

Estimate The Model

The kmeansFit function is used on the x_train matrix to cluster the data using the kmeans model. All results are stored in a kmeansModel structure:


// Number of clusters
n_clusters = 3;
 
// Step One declare kmeansModel struct
struct kmeansModel mdl;
 
// Step Two: Fit kmeans model
mdl = kmeansFit(x_train , n_clusters);

Make predictions

Once the model is fit, predictions can be made from the x_test dataset using kmeansPredict function. The kmeansPredict function requires two inputs, a kmeansModel structure or centroid matrix and a data matrix of predictors:


// Step Three: Fit data
predictions = kmeansPredict(mdl, x_test);

GAUSS kmeansFit Example

Split the dataset

Estimate The Model

Make predictions

Have a Specific Question?

Need Support?

	new;
	cls;
	library gml;

	// Specify dataset name with full path
	dataset = getGAUSSHome() $+ "pkgs/gml/examples/iris.dat", ". -group");

	// Step One: Load data from data set
	x = loadd(dataset, ". -group");

	// Split data into x_train and x_test
	{ x_train, x_test } = splitData(x, 0.70);

	// Number of clusters
	n_clusters = 3;

	// Step One declare kmeansModel struct
	struct kmeansModel mdl;

	// Step Two: Fit kmeans model
	mdl = kmeansFit(x_train , n_clusters);

	// Step Three: Fit data
	predictions = kmeansPredict(mdl, x_test);