K-Means Clustering exercise session
In this exercise session, you will read an external file with Iris flowers and create an internal database in Java as it was done in previous exercise session.A new file contains list of 150 observations of iris flowers from 3 different species – iris-setosa,iris-versicolorandiris-virginica. There are 4 measurements of given flowers: sepal length, sepal width, petal length and petal width, all in the same unit of centimetres. In this session you need to implement k-Means Clustering to create K clusters of the database.
Exercise instructions:
- Use the code developed in Java exercise session VI to create an internal database FlowerList (ArrayList of Flower objects) from csv file.
- Use the KMeans clustering to cluster the set into K clusters:
- Arbitrarily assign K centroids in the data set.
- Calculate distance of each value from the centroids.
- Add value to the cluster whose centroid is the closest.
- Calculate the new centroids after all values have been paced in clusters.
- Calculate the difference between old and new centroids.
- Repeat the above 4 steps until the difference between the old and new centroids is less than a specified tolerance.
Core structure of the code:
importjava.io.BufferedReader;
importjava.io.FileNotFoundException;
importjava.io.FileReader;
importjava.io.IOException;
importjava.util.ArrayList;
publicclassKmeansClustering {
staticArrayList<Flower> flowerList = newArrayList<Flower>();
staticArrayList<String> type = newArrayList<String>();
staticdouble [][] values = newdouble [150][4];
//Declare variable to store the centroids, clusters and flower type in each cluster
publicstaticvoid main(String[] args) {
String dataFile = "iris.csv";
}//end main
publicstaticvoidread_data(String dataFile) {
BufferedReaderbr = null;
String line = "";
String SplitBy = ",";
try {
// use code from Java exercise IV to create internal database of iris flowers
}
} catch (FileNotFoundExceptione) {
e.printStackTrace();
} catch (IOExceptione) {
e.printStackTrace();
}
// populate the values array and the type list from the above created database
}
publicstaticvoidkMeanClustering() {
// arbitrarily assign centroids from within the dataset
while(true)
{
// calculate distance of each value from the centroids
// place the value in the closest cluster
// calculate the new centroids
// calculate difference between new and old centroids
// if difference is less than a specified tolerance clustering is done
}
}
}