Class: KMeans

eclairjs/mllib/clustering.KMeans

new KMeans()

Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
Source:

Methods

(static) train(data, k, maxIterations, runsopt, initializationModeopt, seedopt) → {module:eclairjs/mllib/clustering.KMeansModel}

Trains a k-means model using the given set of parameters.
Parameters:
Name Type Attributes Description
data module:eclairjs/rdd.RDD training points stored as `RDD[Vector]`
k number number of clusters
maxIterations number max number of iterations
runs number <optional>
number of parallel runs, defaults to 1. The best model is returned.
initializationMode string <optional>
initialization model, either "random" or "k-means||" (default).
seed number <optional>
random seed value for cluster initialization
Source:
Returns:
Type
module:eclairjs/mllib/clustering.KMeansModel

getEpsilon() → {Promise.<number>}

The distance threshold within which we've consider centers to have converged.
Source:
Returns:
Type
Promise.<number>

getInitializationMode() → {Promise.<string>}

The initialization algorithm. This can be either "random" or "k-means||".
Source:
Returns:
Type
Promise.<string>

getInitializationSteps() → {Promise.<number>}

Number of steps for the k-means|| initialization mode
Source:
Returns:
Type
Promise.<number>

getK() → {Promise.<number>}

Number of clusters to create (k).
Source:
Returns:
Type
Promise.<number>

getMaxIterations() → {Promise.<number>}

Maximum number of iterations to run.
Source:
Returns:
Type
Promise.<number>

getRuns() → {Promise.<number>}

:: Experimental :: Number of runs of the algorithm to execute in parallel.
Source:
Returns:
Type
Promise.<number>

getSeed() → {Promise.<number>}

The random seed for cluster initialization.
Source:
Returns:
Type
Promise.<number>

run(data) → {module:eclairjs/mllib/clustering.KMeansModel}

Train a K-means model on the given set of points; `data` should be cached for high performance, because this is an iterative algorithm.
Parameters:
Name Type Description
data module:eclairjs/rdd.RDD
Source:
Returns:
Type
module:eclairjs/mllib/clustering.KMeansModel

setEpsilon(epsilon)

Set the distance threshold within which we've consider centers to have converged. If all centers move less than this Euclidean distance, we stop iterating one run.
Parameters:
Name Type Description
epsilon number
Source:
Returns:

setInitializationMode(initializationMode)

Set the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.
Parameters:
Name Type Description
initializationMode string
Source:
Returns:

setInitializationSteps(initializationSteps)

Set the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 5 is almost always enough. Default: 5.
Parameters:
Name Type Description
initializationSteps number
Source:
Returns:

setInitialModel(model)

Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.
Parameters:
Name Type Description
model module:eclairjs/mllib/clustering.KMeansModel
Source:
Returns:

setK(k)

Set the number of clusters to create (k). Default: 2.
Parameters:
Name Type Description
k number
Source:
Returns:

setMaxIterations(maxIterations)

Set maximum number of iterations to run. Default: 20.
Parameters:
Name Type Description
maxIterations number
Source:
Returns:

setRuns(runs)

:: Experimental :: Set the number of runs of the algorithm to execute in parallel. We initialize the algorithm this many times with random starting conditions (configured by the initialization mode), then return the best clustering found over any run. Default: 1.
Parameters:
Name Type Description
runs number
Source:
Returns:

setSeed(seed)

Set the random seed for cluster initialization.
Parameters:
Name Type Description
seed number
Source:
Returns: