You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Option to scale data before clustering, so that output isn't biased by different feature scales
Works with high-dimensional data
Install
gem install kmeans-clusterer
Usage
Simple example:
require'kmeans-clusterer'data=[[40.71,-74.01],[34.05,-118.24],[39.29,-76.61],[45.52,-122.68],[38.9,-77.04],[36.11,-115.17]]labels=['New York','Los Angeles','Baltimore','Portland','Washington DC','Las Vegas']k=2# find 2 clusters in datakmeans=KMeansClusterer.runk,data,labels: labels,runs: 5kmeans.clusters.eachdo |cluster|
putscluster.id.to_s + '. ' +
cluster.points.map(&:label).join(", ") + "\t" +
cluster.centroid.to_send# Use existing clusters for prediction with new data:predicted=kmeans.predict[[41.85,-87.65]]# Chicagoputs"\nClosest cluster to Chicago: #{predicted[0]}"# Clustering quality score. Value between -1.0..1.0 (1.0 is best)puts"\nSilhouette score: #{kmeans.silhouette.round(2)}"
Output of simple example:
0. New York, Baltimore, Washington DC [39.63, -75.89]
1. Los Angeles, Portland, Las Vegas [38.56, -118.7]
Closest cluster to Chicago: 0
Silhouette score: 0.91
Options
The following options can be passed in to KMeansClusterer.run:
option
default
description
:labels
nil
optional array of Ruby objects to collate with data array
:runs
10
number of times to run kmeans
:log
false
print stats after each run
:init
:kmpp
algorithm for picking initial cluster centroids. Accepts :kmpp, :random, or an array of k centroids
:scale_data
false
scales features before clustering using formula (data - mean) / std