Class: KMeans

eclairjs/mllib/clustering. KMeans

new KMeans()

Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
Source:

Methods

(static) train(data, k, maxIterations, runsopt, initializationModeopt, seedopt) → {KMeansModel}

Trains a k-means model using the given set of parameters.
Parameters:
Name Type Attributes Description
data module:eclairjs.RDD training points stored as `RDD[Vector]`
k number number of clusters
maxIterations number max number of iterations
runs number <optional>
number of parallel runs, defaults to 1. The best model is returned.
initializationMode string <optional>
initialization model, either "random" or "k-means||" (default).
seed number <optional>
random seed value for cluster initialization
Source:
Returns:
Type
KMeansModel

getEpsilon() → {number}

The distance threshold within which we've consider centers to have converged.
Source:
Returns:
Type
number

getInitializationMode() → {string}

The initialization algorithm. This can be either "random" or "k-means||".
Source:
Returns:
Type
string

getInitializationSteps() → {number}

Number of steps for the k-means|| initialization mode
Source:
Returns:
Type
number

getK() → {number}

Number of clusters to create (k).
Source:
Returns:
Type
number

getMaxIterations() → {number}

Maximum number of iterations to run.
Source:
Returns:
Type
number

getRuns() → {number}

:: Experimental :: Number of runs of the algorithm to execute in parallel.
Source:
Returns:
Type
number

getSeed() → {number}

The random seed for cluster initialization.
Source:
Returns:
Type
number

run(data) → {KMeansModel}

Train a K-means model on the given set of points; `data` should be cached for high performance, because this is an iterative algorithm.
Parameters:
Name Type Description
data module:eclairjs.RDD
Source:
Returns:
Type
KMeansModel

setEpsilon(epsilon)

Set the distance threshold within which we've consider centers to have converged. If all centers move less than this Euclidean distance, we stop iterating one run.
Parameters:
Name Type Description
epsilon number
Source:
Returns:

setInitializationMode(initializationMode)

Set the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.
Parameters:
Name Type Description
initializationMode string
Source:
Returns:

setInitializationSteps(initializationSteps)

Set the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 5 is almost always enough. Default: 5.
Parameters:
Name Type Description
initializationSteps number
Source:
Returns:

setInitialModel(model)

Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.
Parameters:
Name Type Description
model KMeansModel
Source:
Returns:

setK(k)

Set the number of clusters to create (k). Default: 2.
Parameters:
Name Type Description
k number
Source:
Returns:

setMaxIterations(maxIterations)

Set maximum number of iterations to run. Default: 20.
Parameters:
Name Type Description
maxIterations number
Source:
Returns:

setRuns(runs)

:: Experimental :: Set the number of runs of the algorithm to execute in parallel. We initialize the algorithm this many times with random starting conditions (configured by the initialization mode), then return the best clustering found over any run. Default: 1.
Parameters:
Name Type Description
runs number
Source:
Returns:

setSeed(seed)

Set the random seed for cluster initialization.
Parameters:
Name Type Description
seed number
Source:
Returns: