You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fast and Provably Good Seedings for k-Means using k-MC^2 and AFK-MC^2
Introduction
The package provides a Cython implementation of the algorithms k-MC^2 and AFK-MC^2 described in the two papers:
Approximate K-Means++ in Sublinear Time.Olivier Bachem, Mario Lucic, S. Hamed Hassani and Andreas Krause.
In Proc. Conference on Artificial Intelligence (AAAI), 2016.
Fast and Provably Good Seedings for k-Means.Olivier Bachem, Mario Lucic, S. Hamed Hassani and Andreas Krause.
To appear in Neural Information Processing Systems (NIPS), 2016.
The implementation is compatible with Python 2.7.
Installation
First make sure that numpy is installed by running
pip install numpy
The following command will then install kmc2 from PyPI:
pip install kmc2
To install kmc2 locally from this repository, you may use
pip install .
Quickstart
The kmc2 function may be used to run the algorithm and obtain a seeding. The data should be provided in a Numpy array or a Scipy CSR matrix.
importkmc2X=<Numpyarraycontainingthedata>seeding=kmc2.kmc2(X, 5) # Run k-MC2 with k=5
The seeding can then be refined using MiniBatchKMeans of scikit-learn:
The kmc2 module exposes a single function kmc2(...) with all the functionality:
defkmc2(X, k, chain_length=200, afkmc2=True, random_state=None, weights=None):
"""Cython implementation of k-MC2 and AFK-MC2 seeding Args: X: (n,d)-shaped np.ndarray with data points (or scipy CSR matrix) k: number of cluster centers chain_length: length of the MCMC chain afkmc2: Whether to run AFK-MC2 (if True) or vanilla K-MC2 (if False) random_state: numpy.random.RandomState instance or integer to be used as seed weights: n-sized np.ndarray with weights of data points (default: uniform weights) Returns: (k, d)-shaped numpy.ndarray with cluster centers """
...
Tests
To run the unittests, use nose in the package directory