Class: FloatRDD

eclairjs. FloatRDD

new FloatRDD(srdd)

Parameters:
Name Type Description
srdd module:eclairjs.RDD
Source:

Extends

Methods

aggregate(zeroValue, func1, func2, bindArgs1opt, bindArgs2opt) → {object}

Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
Parameters:
Name Type Attributes Description
zeroValue module:eclairjs.RDD (undocumented)
func1 function seqOp - (undocumented) Function with two parameters
func2 function combOp - (undocumented) Function with two parameters
bindArgs1 Array.<Object> <optional>
array whose values will be added to func1's argument list.
bindArgs2 Array.<Object> <optional>
array whose values will be added to func2's argument list.
Inherited From:
Source:
Returns:
Type
object

cache() → {module:eclairjs.FloatRDD}

Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

cartesian(other) → {module:eclairjs.RDD}

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.
Parameters:
Name Type Description
other module:eclairjs.RDD (undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

checkpoint()

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with `SparkContext#setCheckpointDir` and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.
Inherited From:
Source:
Returns:
void

coalesce(numPartitions, shuffleopt) → {module:eclairjs.FloatRDD}

Return a new RDD that is reduced into `numPartitions` partitions.
Parameters:
Name Type Attributes Description
numPartitions integer
shuffle boolean <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

collect() → {Array}

Return an array that contains all of the elements in this RDD.
Inherited From:
Source:
Returns:
Type
Array

context() → {module:eclairjs.SparkContext}

Return the SparkContext that this RDD was created on.
Inherited From:
Source:
Returns:
Type
module:eclairjs.SparkContext

count() → {integer}

Return the number of elements in the RDD.
Inherited From:
Source:
Returns:
Type
integer

countApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

:: Experimental :: Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
Parameters:
Name Type Attributes Description
timeout number (undocumented)
confidence number <optional>
(undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

countApproxDistinct(relativeSD) → {number}

Return approximate number of distinct elements in the RDD. The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
Parameters:
Name Type Description
relativeSD number Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.
Inherited From:
Source:
Returns:
Type
number

countByValueApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

:: Experimental :: Approximate version of countByValue().
Parameters:
Name Type Attributes Description
timeout number (undocumented)
confidence number <optional>
(undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

distinct(numPartitionsopt) → {module:eclairjs.FloatRDD}

Return a new RDD containing the distinct elements in this RDD.
Parameters:
Name Type Attributes Description
numPartitions number <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

filter(func, bindArgsopt) → {module:eclairjs.FloatRDD}

Return a new RDD containing only the elements that satisfy a predicate.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<object> <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

first() → {object}

Return the first element in this RDD.
Inherited From:
Source:
Returns:
Type
object

flatMap(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
Parameters:
Name Type Attributes Description
func function (undocumented) - Function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

flatMapToPair(bindArgsopt) → {module:eclairjs.PairRDD}

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
Parameters:
Name Type Attributes Description
function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.PairRDD

foreach(func, bindArgsopt) → {void}

Applies a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
func function Function with one parameter that returns void
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Overrides:
Source:
Returns:
Type
void
Example
rdd3.foreach(function(record) {
   var connection = createNewConnection()
   connection.send(record);
   connection.close()
});

fromRDD(rdd) → {module:eclairjs.FloatRDD}

Parameters:
Name Type Description
rdd module:eclairjs.RDD
Source:
Returns:
Type
module:eclairjs.FloatRDD

getCheckpointFile() → {string}

Gets the name of the directory to which this RDD was checkpointed. This is not defined if the RDD is checkpointed locally.
Inherited From:
Source:
Returns:
Type
string

getStorageLevel() → {module:eclairjs/storage.StorageLevel}

Inherited From:
Source:
Returns:
Type
module:eclairjs/storage.StorageLevel

glom() → {module:eclairjs.RDD}

Return an RDD created by coalescing all elements within each partition into an array.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

groupBy(func, numPartitionsopt, bindArgsopt) → {module:eclairjs.RDD}

Return an RDD of grouped items. Each group consists of a key and a sequence of elements mapping to that key. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated. Note: This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using aggregateByKey or reduceByKey will provide much better performance.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
numPartitions number <optional>
How many partitions to use in the resulting RDD (if non-zero partitioner is ignored)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Overrides:
Source:
Returns:
Type
module:eclairjs.RDD

histogram(buckets, evenBucketsopt) → {Array.<number>|module:eclairjs.Tuple2}

Compute a histogram of the data using bucketCount number of buckets evenly spaced between the minimum and maximum of the RDD. For example if the min value is 0 and the max is 100 and there are two buckets the resulting
Parameters:
Name Type Attributes Description
buckets Array.<float> | integer
evenBuckets boolean <optional>
Source:
Returns:
Type
Array.<number> | module:eclairjs.Tuple2

intersection(other) → {module:eclairjs.FloatRDD}

Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. Note that this method performs a shuffle internally.
Parameters:
Name Type Description
other module:eclairjs.FloatRDD
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

isCheckpointed() → {boolean}

Return whether this RDD is checkpointed and materialized, either reliably or locally.
Inherited From:
Source:
Returns:
Type
boolean

isEmpty() → {boolean}

Inherited From:
Source:
Returns:
true if and only if the RDD contains no elements at all. Note that an RDD
Type
boolean

keyBy(func, bindArgsopt) → {module:eclairjs.RDD}

Creates tuples of the elements in this RDD by applying `f`.
Parameters:
Name Type Attributes Description
func function (undocumented)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

map(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapPartitions(func, preservesPartitioningopt, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this RDD. Similar to map, but runs separately on each partition (block) of the RDD, so func must accept an Array. func should return a array rather than a single item.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
preservesPartitioning boolean <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapPartitionsWithIndex(func, preservesPartitioningopt, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this RDD, while tracking the index of the original partition. `preservesPartitioning` indicates whether the input function preserves the partitioner, which should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
preservesPartitioning boolean <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapToFloat((function), bindArgsopt) → {module:eclairjs.FloatRDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
(function) func - (undocumented) Function with one parameter that returns tuple
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.FloatRDD

mapToPair((function), bindArgsopt) → {module:eclairjs.PairRDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
(function) func - (undocumented) Function with one parameter that returns tuple
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.PairRDD

max() → {float}

Returns the maximum element from this RDD as defined by the default comparator natural order.
Overrides:
Source:
Returns:
the maximum of the RDD
Type
float

mean() → {float}

Source:
Returns:
Type
float

meanApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

Parameters:
Name Type Attributes Description
timeout number
confidence float <optional>
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

min() → {float}

Returns the minimum element from this RDD as defined by the default comparator natural order.
Overrides:
Source:
Returns:
the minimum of the RDD
Type
float

name() → {string}

A friendly name for this RDD
Inherited From:
Source:
Returns:
Type
string

persist(newLevel) → {module:eclairjs.RDD}

Parameters:
Name Type Description
newLevel module:eclairjs/storage.StorageLevel
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

pipe(command, env) → {module:eclairjs.RDD}

Return an RDD created by piping elements to a forked external process. The print behavior can be customized by providing two functions.
Parameters:
Name Type Description
command List | string command to run in forked process.
env Map environment variables to set.
Inherited From:
Source:
Returns:
the result RDD
Type
module:eclairjs.RDD

reduce(func, bindArgsopt) → {object}

Reduces the elements of this RDD using the specified commutative and associative binary operator.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with two parameters
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
object

repartition(numPartitions) → {module:eclairjs.FloatRDD}

Return a new RDD that has exactly numPartitions partitions. Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data. If you are decreasing the number of partitions in this RDD, consider using `coalesce`, which can avoid performing a shuffle.
Parameters:
Name Type Description
numPartitions number
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

sample(withReplacement, fraction, seedopt) → {module:eclairjs.FloatRDD}

Return a sampled subset of this RDD.
Parameters:
Name Type Attributes Description
withReplacement boolean
fraction float
seed number <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

sampleStdev() → {float}

Compute the sample standard deviation of this RDD's elements (which corrects for bias in estimating the standard deviation by dividing by N-1 instead of N).
Source:
Returns:
Type
float

sampleVariance() → {float}

Compute the sample variance of this RDD's elements (which corrects for bias in estimating the standard variance by dividing by N-1 instead of N).
Source:
Returns:
Type
float

saveAsObjectFile(path, overwriteopt) → {void}

Save this RDD as a SequenceFile of serialized objects.
Parameters:
Name Type Attributes Description
path string
overwrite boolean <optional>
defaults to false, if true overwrites file if it exists
Inherited From:
Source:
Returns:
Type
void

saveAsTextFile(path, overwriteopt) → {void}

Save this RDD as a text file, using string representations of elements.
Parameters:
Name Type Attributes Description
path string
overwrite boolean <optional>
defaults to false, if true overwrites file if it exists
Inherited From:
Source:
Returns:
Type
void

setName(name) → {module:eclairjs.FloatRDD}

Parameters:
Name Type Description
name string
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

sortBy(func, ascending, numPartitions, bindArgsopt) → {module:eclairjs.RDD}

Return this RDD sorted by the given key function.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
ascending boolean
numPartitions int
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

sparkContext() → {module:eclairjs.SparkContext}

The SparkContext that created this RDD.
Inherited From:
Source:
Returns:
Type
module:eclairjs.SparkContext

stats() → {module:eclairjs/util.StatCounter}

Return a module:eclairjs/util.StatCounter object that captures the mean, variance and count of the RDD's elements in one operation.
Source:
Returns:
Type
module:eclairjs/util.StatCounter

stdev() → {float}

Source:
Returns:
Type
float

subtract(other, numPartitionsopt) → {module:eclairjs.FloatRDD}

Return an RDD with the elements from `this` that are not in `other`.
Parameters:
Name Type Attributes Description
other module:eclairjs.FloatRDD
numPartitions number <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

sum() → {float}

Source:
Returns:
Type
float

sumApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

Approximate operation to return the sum within a timeout.
Parameters:
Name Type Attributes Description
timeout number
confidence float <optional>
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

take(num) → {Array}

Take the first num elements of the RDD.
Parameters:
Name Type Description
num int
Inherited From:
Source:
Returns:
Type
Array

takeOrdered(num, funcopt, bindArgsopt) → {Array}

Returns the first k (smallest) elements from this RDD as defined by the specified implicit Ordering[T] and maintains the ordering. This does the opposite of top.
Parameters:
Name Type Attributes Description
num number the number of elements to return
func function <optional>
compares to arguments
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
an array of top elements
Type
Array
Example
var result = rdd.takeOrdered(25, function(a, b){
      return a > b ? -1 : a == b? 0 : 1;
    });

takeSample(withReplacement, num, seedopt) → {Array}

Return a fixed-size sampled subset of this RDD in an array
Parameters:
Name Type Attributes Description
withReplacement boolean whether sampling is done with replacement
num number size of the returned sample
seed number <optional>
seed for the random number generator
Inherited From:
Source:
Returns:
sample of specified size in an array
Type
Array

toArray() → {Array}

Return an array that contains all of the elements in this RDD.
Inherited From:
Source:
Returns:
Type
Array

toDebugString() → {string}

A description of this RDD and its recursive dependencies for debugging.
Inherited From:
Source:
Returns:
Type
string

top(num) → {Array}

Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T]. This does the opposite of takeOrdered. For example: {{{ sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) // returns Array(12) sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2) // returns Array(6, 5) }}}
Parameters:
Name Type Description
num number k, the number of top elements to return
Inherited From:
Source:
Returns:
an array of top elements
Type
Array

toRDD(rdd) → {module:eclairjs.RDD}

Parameters:
Name Type Description
rdd module:eclairjs.FloatRDD
Source:
Returns:
Type
module:eclairjs.RDD

toString() → {string}

Inherited From:
Source:
Returns:
Type
string

treeAggregate(zeroValue, func1, func2, bindArgs1opt, bindArgs2opt) → {object}

Aggregates the elements of this RDD in a multi-level tree pattern.
Parameters:
Name Type Attributes Description
zeroValue (undocumented)
func1 function (undocumented) Function with two parameters
func2 function combOp - (undocumented) Function with two parameters
bindArgs1 Array.<Object> <optional>
array whose values will be added to func1's argument list.
bindArgs2 Array.<Object> <optional>
array whose values will be added to func2's argument list.
Inherited From:
Source:
See:
  • [[org.apache.spark.rdd.RDD#aggregate]]
Returns:
Type
object

treeReduce(func, depth, bindArgsopt) → {object}

Reduces the elements of this RDD in a multi-level tree pattern.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
depth number suggested depth of the tree (default: 2)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
See:
  • [[org.apache.spark.rdd.RDD#reduce]]
Returns:
Type
object

union(other) → {module:eclairjs.FloatRDD}

Return the union of this RDD and another one. Any identical elements will appear multiple times (use `.distinct()` to eliminate them).
Parameters:
Name Type Description
other module:eclairjs.FloatRDD
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

unpersist(blockingopt) → {module:eclairjs.FloatRDD}

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
Parameters:
Name Type Attributes Description
blocking boolean <optional>
Whether to block until all blocks are deleted.
Overrides:
Source:
Returns:
Type
module:eclairjs.FloatRDD

variance() → {float}

Source:
Returns:
Type
float

wrapRDD(rdd) → {module:eclairjs.FloatRDD}

Parameters:
Name Type Description
rdd module:eclairjs.RDD
Source:
Returns:
Type
module:eclairjs.FloatRDD

zip(other) → {module:eclairjs.RDD}

Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the *same number of partitions* and the *same number of elements in each partition* (e.g. one was made through a map on the other).
Parameters:
Name Type Description
other module:eclairjs.RDD (undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipPartitions(rdd2, func, bindArgsopt) → {module:eclairjs.RDD}

Zip this RDD's partitions with another RDD and return a new RDD by applying a function to the zipped partitions. Assumes that both the RDDs have the same number of partitions, but does not require them to have the same number of elements in each partition.
Parameters:
Name Type Attributes Description
rdd2 module:eclairjs.RDD
func function Function with two parameters
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipWithIndex() → {module:eclairjs.RDD}

Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index. This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions. Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The index assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipWithUniqueId() → {module:eclairjs.RDD}

Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The unique ID assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD