Posted By

kamthornp on 04/27/18


Tagged

Kindergarden


Versions (?)

Kmean


 / Published in: Python
 

Objective: build kmean clustering model on iris dataset

1) train many kmean models to select the best k 2) plot SSE 3) train kmean model where k = 3 3) print its 3 centroids 4) plot cluster on 2D scatter plots (sepals and petals)

  1. import seaborn as sns
  2. import matplotlib.pyplot as plt
  3. from sklearn.cluster import KMeans
  4.  
  5. # load iris dataset
  6. iris = sns.load_dataset("iris")
  7. X_iris = iris.iloc[:,0:4]
  8. y_iris = iris.iloc[:,4]
  9.  
  10. # run kmean from k=2 to k=30
  11. # collect SSE of each k to "SSE" list
  12. # ignore models
  13. K = range(2,30)
  14. sse = list()
  15. for k in K:
  16. kmean = KMeans(n_clusters=k)
  17. kmean.fit(X_iris)
  18. sse.append(kmean.inertia_)
  19.  
  20. # plot sse
  21. sns.set()
  22. fig = plt.figure()
  23. ax = fig.add_subplot(111)
  24. ax.plot(K,sse)
  25. ax.set_xticks(range(2,30,1))
  26. ax.set_xlabel("k")
  27. ax.set_ylabel("SSE")
  28. fig.show()
  29.  
  30. # Run kmean where k=3
  31. # keep the k=3 model
  32. kmean = KMeans(n_clusters=3)
  33. kmean.fit(X_iris)
  34. y_kmean = kmean.fit_predict(X_iris)
  35. iris['cluster'] = y_kmean
  36.  
  37. # print its centroids
  38. print(kmean.cluster_centers_)
  39.  
  40. # plot clusters using 2D scatterplot
  41. sns.lmplot("sepal_length", "sepal_width", data=iris, hue='species', col='cluster', fit_reg=False);
  42. sns.lmplot("petal_length", "petal_width", data=iris, hue='species', col='cluster', fit_reg=False);

Report this snippet  

You need to login to post a comment.