new KMeans()
Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1,
initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
- Source:
Methods
(static) train(data, k, maxIterations, runsopt, initializationModeopt, seedopt) → {module:eclairjs/mllib/clustering.KMeansModel}
Trains a k-means model using the given set of parameters.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
data |
module:eclairjs/rdd.RDD | training points stored as `RDD[Vector]` | |
k |
number | number of clusters | |
maxIterations |
number | max number of iterations | |
runs |
number |
<optional> |
number of parallel runs, defaults to 1. The best model is returned. |
initializationMode |
string |
<optional> |
initialization model, either "random" or "k-means||" (default). |
seed |
number |
<optional> |
random seed value for cluster initialization |
- Source:
Returns:
getEpsilon() → {Promise.<number>}
The distance threshold within which we've consider centers to have converged.
- Source:
Returns:
- Type
- Promise.<number>
getInitializationMode() → {Promise.<string>}
The initialization algorithm. This can be either "random" or "k-means||".
- Source:
Returns:
- Type
- Promise.<string>
getInitializationSteps() → {Promise.<number>}
Number of steps for the k-means|| initialization mode
- Source:
Returns:
- Type
- Promise.<number>
getK() → {Promise.<number>}
Number of clusters to create (k).
- Source:
Returns:
- Type
- Promise.<number>
getMaxIterations() → {Promise.<number>}
Maximum number of iterations to run.
- Source:
Returns:
- Type
- Promise.<number>
getRuns() → {Promise.<number>}
:: Experimental ::
Number of runs of the algorithm to execute in parallel.
- Source:
Returns:
- Type
- Promise.<number>
getSeed() → {Promise.<number>}
The random seed for cluster initialization.
- Source:
Returns:
- Type
- Promise.<number>
run(data) → {module:eclairjs/mllib/clustering.KMeansModel}
Train a K-means model on the given set of points; `data` should be cached for high
performance, because this is an iterative algorithm.
Parameters:
Name | Type | Description |
---|---|---|
data |
module:eclairjs/rdd.RDD |
- Source:
Returns:
setEpsilon(epsilon)
Set the distance threshold within which we've consider centers to have converged.
If all centers move less than this Euclidean distance, we stop iterating one run.
Parameters:
Name | Type | Description |
---|---|---|
epsilon |
number |
- Source:
Returns:
setInitializationMode(initializationMode)
Set the initialization algorithm. This can be either "random" to choose random points as
initial cluster centers, or "k-means||" to use a parallel variant of k-means++
(Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.
Parameters:
Name | Type | Description |
---|---|---|
initializationMode |
string |
- Source:
Returns:
setInitializationSteps(initializationSteps)
Set the number of steps for the k-means|| initialization mode. This is an advanced
setting -- the default of 5 is almost always enough. Default: 5.
Parameters:
Name | Type | Description |
---|---|---|
initializationSteps |
number |
- Source:
Returns:
setInitialModel(model)
Set the initial starting point, bypassing the random initialization or k-means||
The condition model.k == this.k must be met, failure results
in an IllegalArgumentException.
Parameters:
Name | Type | Description |
---|---|---|
model |
module:eclairjs/mllib/clustering.KMeansModel |
- Source:
Returns:
setK(k)
Set the number of clusters to create (k). Default: 2.
Parameters:
Name | Type | Description |
---|---|---|
k |
number |
- Source:
Returns:
setMaxIterations(maxIterations)
Set maximum number of iterations to run. Default: 20.
Parameters:
Name | Type | Description |
---|---|---|
maxIterations |
number |
- Source:
Returns:
setRuns(runs)
:: Experimental ::
Set the number of runs of the algorithm to execute in parallel. We initialize the algorithm
this many times with random starting conditions (configured by the initialization mode), then
return the best clustering found over any run. Default: 1.
Parameters:
Name | Type | Description |
---|---|---|
runs |
number |
- Source:
Returns:
setSeed(seed)
Set the random seed for cluster initialization.
Parameters:
Name | Type | Description |
---|---|---|
seed |
number |
- Source: