JSDoc: Class: KMeans

new KMeans()

Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.

Source:

mllib/clustering/KMeans.js, line 42

Methods

(static) train(data, k, maxIterations, runsopt, initializationModeopt, seedopt) → {module:eclairjs/mllib/clustering.KMeansModel}

Trains a k-means model using the given set of parameters.

Parameters:

Name	Type	Attributes	Description
`data`	module:eclairjs/rdd.RDD		training points stored as `RDD[Vector]`
`k`	number		number of clusters
`maxIterations`	number		max number of iterations
`runs`	number	<optional>	number of parallel runs, defaults to 1. The best model is returned.
`initializationMode`	string	<optional>	initialization model, either "random" or "k-means\|\|" (default).
`seed`	number	<optional>	random seed value for cluster initialization

Source:

mllib/clustering/KMeans.js, line 209

Returns:

Type: module:eclairjs/mllib/clustering.KMeansModel

getEpsilon() → {Promise.<number>}

The distance threshold within which we've consider centers to have converged.

Source:

mllib/clustering/KMeans.js, line 142

Returns:

Type: Promise.<number>

getInitializationMode() → {Promise.<string>}

The initialization algorithm. This can be either "random" or "k-means||".

Source:

mllib/clustering/KMeans.js, line 84

Returns:

Type: Promise.<string>

getInitializationSteps() → {Promise.<number>}

Number of steps for the k-means|| initialization mode

Source:

mllib/clustering/KMeans.js, line 124

Returns:

Type: Promise.<number>

getK() → {Promise.<number>}

Number of clusters to create (k).

Source:

mllib/clustering/KMeans.js, line 50

Returns:

Type: Promise.<number>

getMaxIterations() → {Promise.<number>}

Maximum number of iterations to run.

Source:

mllib/clustering/KMeans.js, line 67

Returns:

Type: Promise.<number>

getRuns() → {Promise.<number>}

:: Experimental :: Number of runs of the algorithm to execute in parallel.

Source:

mllib/clustering/KMeans.js, line 104

Returns:

Type: Promise.<number>

getSeed() → {Promise.<number>}

The random seed for cluster initialization.

Source:

mllib/clustering/KMeans.js, line 160

Returns:

Type: Promise.<number>

run(data) → {module:eclairjs/mllib/clustering.KMeansModel}

Train a K-means model on the given set of points; `data` should be cached for high performance, because this is an iterative algorithm.

Parameters:

Name	Type	Description
`data`	module:eclairjs/rdd.RDD

Source:

mllib/clustering/KMeans.js, line 190

Returns:

Type: module:eclairjs/mllib/clustering.KMeansModel

setEpsilon(epsilon)

Set the distance threshold within which we've consider centers to have converged. If all centers move less than this Euclidean distance, we stop iterating one run.

Parameters:

Name	Type	Description
`epsilon`	number

Source:

mllib/clustering/KMeans.js, line 152

Returns:

setInitializationMode(initializationMode)

Set the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.

Parameters:

Name	Type	Description
`initializationMode`	string

Source:

mllib/clustering/KMeans.js, line 95

Returns:

setInitializationSteps(initializationSteps)

Set the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 5 is almost always enough. Default: 5.

Parameters:

Name	Type	Description
`initializationSteps`	number

Source:

mllib/clustering/KMeans.js, line 134

Returns:

setInitialModel(model)

Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.

Parameters:

Name	Type	Description
`model`	module:eclairjs/mllib/clustering.KMeansModel

Source:

mllib/clustering/KMeans.js, line 180

Returns:

setK(k)

Set the number of clusters to create (k). Default: 2.

Parameters:

Name	Type	Description
`k`	number

Source:

mllib/clustering/KMeans.js, line 59

Returns:

setMaxIterations(maxIterations)

Set maximum number of iterations to run. Default: 20.

Parameters:

Name	Type	Description
`maxIterations`	number

Source:

mllib/clustering/KMeans.js, line 76

Returns:

setRuns(runs)

:: Experimental :: Set the number of runs of the algorithm to execute in parallel. We initialize the algorithm this many times with random starting conditions (configured by the initialization mode), then return the best clustering found over any run. Default: 1.

Parameters:

Name	Type	Description
`runs`	number

Source:

mllib/clustering/KMeans.js, line 116

Returns:

setSeed(seed)

Set the random seed for cluster initialization.

Parameters:

Name	Type	Description
`seed`	number

Source:

mllib/clustering/KMeans.js, line 169