This example uses k-means Clustering method to analyze a "iris.dat" dataset. The features used in this example to compute clusters include :
- sepal width
- sepal_length
- petal length
- petal_length
Split the dataset
The loadd
function is used to load the data from the dataset. In addition, prior to fitting the k-means model, the splitData
function is used to split the model data into a test and training set.
new;
cls;
library gml;
// Specify dataset name with full path
dataset = getGAUSSHome() $+ "pkgs/gml/examples/iris.dat", ". -group");
// Step One: Load data from data set
x = loadd(dataset, ". -group");
// Split data into x_train and x_test
{ x_train, x_test } = splitData(x, 0.70);
Estimate The Model
The kmeansFit
function is used on the x_train
matrix to cluster the data using the kmeans model. All results are stored in a kmeansModel
structure:
// Number of clusters
n_clusters = 3;
// Step One declare kmeansModel struct
struct kmeansModel mdl;
// Step Two: Fit kmeans model
mdl = kmeansFit(x_train , n_clusters);
Make predictions
Once the model is fit, predictions can be made from the x_test dataset using kmeansPredict
function. The kmeansPredict
function requires two inputs, a kmeansModel
structure or centroid matrix and a data matrix of predictors:
// Step Three: Fit data
predictions = kmeansPredict(mdl, x_test);