Class: PairRDD

eclairjs. PairRDD

new PairRDD(rdd)

Parameters:
Name Type Description
rdd module:eclairjs.RDD of Tuple(value, value).
Source:

Extends

Methods

aggregate(zeroValue, func1, func2, bindArgs1opt, bindArgs2opt) → {object}

Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
Parameters:
Name Type Attributes Description
zeroValue module:eclairjs.RDD (undocumented)
func1 function seqOp - (undocumented) Function with two parameters
func2 function combOp - (undocumented) Function with two parameters
bindArgs1 Array.<Object> <optional>
array whose values will be added to func1's argument list.
bindArgs2 Array.<Object> <optional>
array whose values will be added to func2's argument list.
Inherited From:
Source:
Returns:
Type
object

aggregateByKey(zeroValue, seqFunc, combFunc, numPartitionsopt, bindArgsopt) → {module:eclairjs.PairRDD}

Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, as in scala.TraversableOnce. The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.
Parameters:
Name Type Attributes Description
zeroValue module:eclairjs.Serializable
seqFunc func
combFunc func
numPartitions number <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.PairRDD
Example
var Serializable = require(EclairJS_Globals.NAMESPACE + '/Serializable');
var s = new Serializable();
 var result = pairRdd.aggregateByKey(s,
  function(hashSetA, b) {
     hashSetA[b] = hashSetA[b] ? hashSetA[b] + 1 : 1;
     return hashSetA;
 },
 function(setA, setB){
    for (var k in setA) {
       if (setB.hasOwnProperty(k)) {
            setA[k] += setB[k];
          }
     }
     return setA;
 });

cache() → {module:eclairjs.RDD}

Persist this RDD with the default storage level (`MEMORY_ONLY`).
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

cartesian(other) → {module:eclairjs.RDD}

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in `this` and b is in `other`.
Parameters:
Name Type Description
other module:eclairjs.RDD (undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

checkpoint()

Mark this RDD for checkpointing. It will be saved to a file inside the checkpoint directory set with `SparkContext#setCheckpointDir` and all references to its parent RDDs will be removed. This function must be called before any job has been executed on this RDD. It is strongly recommended that this RDD is persisted in memory, otherwise saving it on a file will require recomputation.
Inherited From:
Source:
Returns:
void

coalesce(numPartitions, shuffle) → {module:eclairjs.RDD}

Return a new RDD that is reduced into `numPartitions` partitions. This results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions. However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1). To avoid this, you can pass shuffle = true. This will add a shuffle step, but means the current upstream partitions will be executed in parallel (per whatever the current partitioning is). Note: With shuffle = true, you can actually coalesce to a larger number of partitions. This is useful if you have a small number of partitions, say 100, potentially with a few partitions being abnormally large. Calling coalesce(1000, shuffle = true) will result in 1000 partitions with the data distributed using a hash partitioner.
Parameters:
Name Type Description
numPartitions int
shuffle boolean
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

cogroup(other1, other2opt, other3opt, numPartitionsopt) → {module:eclairjs.PairRDD}

For each key k in `this` or `other`, return a resulting RDD that contains a tuple with the list of values for that key in `this` as well as `other`.
Parameters:
Name Type Attributes Description
other1 module:eclairjs.PairRDD
other2 module:eclairjs.PairRDD <optional>
other3 module:eclairjs.PairRDD <optional>
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

collect() → {Array}

Return an array that contains all of the elements in this RDD.
Inherited From:
Source:
Returns:
Type
Array

collectAsMap() → {object}

Return the key-value pairs in this RDD to the master as a Map.
Source:
Returns:
key, value hash map
Type
object

combineByKey(createCombiner, mergeValue, mergeCombiners, numPartitions) → {module:eclairjs.PairRDD}

Simplified version of combineByKey that hash-partitions the output RDD and uses map-side aggregation.
Parameters:
Name Type Description
createCombiner func
mergeValue func
mergeCombiners func
numPartitions number
Source:
Returns:
Type
module:eclairjs.PairRDD

context() → {module:eclairjs.SparkContext}

Return the SparkContext that this RDD was created on.
Inherited From:
Source:
Returns:
Type
module:eclairjs.SparkContext

count() → {integer}

Return the number of elements in the RDD.
Inherited From:
Source:
Returns:
Type
integer

countApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

:: Experimental :: Approximate version of count() that returns a potentially incomplete result within a timeout, even if not all tasks have finished.
Parameters:
Name Type Attributes Description
timeout number (undocumented)
confidence number <optional>
(undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

countApproxDistinct(relativeSD) → {number}

Return approximate number of distinct elements in the RDD. The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
Parameters:
Name Type Description
relativeSD number Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.
Inherited From:
Source:
Returns:
Type
number

countByKey() → {object}

Source:
Returns:
key, value hash map
Type
object

countByKeyApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

Approximate version of countByKey that can return a partial result if it does not finish within a timeout.
Parameters:
Name Type Attributes Description
timeout number
confidence number <optional>
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

countByValueApprox(timeout, confidenceopt) → {module:eclairjs/partial.PartialResult}

:: Experimental :: Approximate version of countByValue().
Parameters:
Name Type Attributes Description
timeout number (undocumented)
confidence number <optional>
(undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs/partial.PartialResult

distinct(numPartitionsopt) → {module:eclairjs.RDD}

Return a new RDD containing the distinct elements in this RDD.
Parameters:
Name Type Attributes Description
numPartitions int <optional>
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

filter(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD containing only the elements that satisfy a predicate.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

first() → {module:eclairjs.module:eclairjs.Tuple2}

Overrides:
Source:
Returns:
Type
module:eclairjs.module:eclairjs.Tuple2

flatMap(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
Parameters:
Name Type Attributes Description
func function (undocumented) - Function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

flatMapToPair(bindArgsopt) → {module:eclairjs.PairRDD}

Return a new RDD by first applying a function to all elements of this RDD, and then flattening the results.
Parameters:
Name Type Attributes Description
function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.PairRDD

flatMapValues(f, bindArgsopt) → {module:eclairjs.PairRDD}

Pass each value in the key-value pair RDD through a flatMap function without changing the keys; this also retains the original RDD's partitioning.
Parameters:
Name Type Attributes Description
f func
bindArgs Array.<object> <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

foldByKey(zeroValue, func, numPartitionsopt, bindArgsopt) → {module:eclairjs.PairRDD}

Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g ., Nil for list concatenation, 0 for addition, or 1 for multiplication.).
Parameters:
Name Type Attributes Description
zeroValue module:eclairjs.Serializable | number
func func
numPartitions integer <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.PairRDD

foreach(func, bindArgsopt) → {void}

Applies a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
func function Function with one parameter that returns void
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Overrides:
Source:
Returns:
Type
void
Example
rdd3.foreach(function(record) {
   var connection = createNewConnection()
   connection.send(record);
   connection.close()
});

fullOuterJoin(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Perform a full outer join of `this` and `other`. For each element (k, v) in `this`, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for w in `other`, or the pair (k, (Some(v), None)) if no elements in `other` have key k. Similarly, for each element (k, w) in `other`, the resulting RDD will either contain all pairs (k, (Some(v), Some(w))) for v in `this`, or the pair (k, (None, Some(w))) if no elements in `this` have key k.
Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

getCheckpointFile() → {string}

Gets the name of the directory to which this RDD was checkpointed. This is not defined if the RDD is checkpointed locally.
Inherited From:
Source:
Returns:
Type
string

getStorageLevel() → {module:eclairjs/storage.StorageLevel}

Inherited From:
Source:
Returns:
Type
module:eclairjs/storage.StorageLevel

glom() → {module:eclairjs.RDD}

Return an RDD created by coalescing all elements within each partition into an array.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

groupBy(func, numPartitionsopt, bindArgsopt) → {module:eclairjs.RDD}

Return an RDD of grouped items. Each group consists of a key and a sequence of elements mapping to that key. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting RDD is evaluated. Note: This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using aggregateByKey or reduceByKey will provide much better performance.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
numPartitions number <optional>
How many partitions to use in the resulting RDD (if non-zero partitioner is ignored)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Overrides:
Source:
Returns:
Type
module:eclairjs.RDD

groupByKey(numberopt) → {module:eclairjs.PairRDD}

Group the values for each key in the RDD into a single sequence. Allows controlling the partitioning of the resulting key-value pair RDD by passing a Partitioner. Note: If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using [[PairRDD.reduceByKey]] or combineByKey will provide much better performance.
Parameters:
Name Type Attributes Description
number integer <optional>
or number of partitions
Source:
Returns:
Type
module:eclairjs.PairRDD

groupWith(other1, other2opt, other3opt) → {module:eclairjs.PairRDD}

Parameters:
Name Type Attributes Description
other1 module:eclairjs.PairRDD
other2 module:eclairjs.PairRDD <optional>
other3 module:eclairjs.PairRDD <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

intersection(other) → {module:eclairjs.RDD}

Return the intersection of this RDD and another one. The output will not contain any duplicate elements, even if the input RDDs did. Note that this method performs a shuffle internally.
Parameters:
Name Type Description
other module:eclairjs.RDD the other RDD
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

isCheckpointed() → {boolean}

Return whether this RDD is checkpointed and materialized, either reliably or locally.
Inherited From:
Source:
Returns:
Type
boolean

isEmpty() → {boolean}

Inherited From:
Source:
Returns:
true if and only if the RDD contains no elements at all. Note that an RDD
Type
boolean

join(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Merge the values for each key using an associative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.
Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

keyBy(func, bindArgsopt) → {module:eclairjs.RDD}

Creates tuples of the elements in this RDD by applying `f`.
Parameters:
Name Type Attributes Description
func function (undocumented)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

leftOuterJoin(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Perform a left outer join of `this` and `other`. For each element (k, v) in `this`, the resulting RDD will either contain all pairs (k, (v, Some(w))) for w in `other`, or the pair (k, (v, None)) if no elements in `other` have key k.
Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

lookup(key) → {Array.<object>}

Return the list of values in the RDD for key `key`. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to.
Parameters:
Name Type Description
key object
Source:
Returns:
Type
Array.<object>

map(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapPartitions(func, preservesPartitioningopt, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this RDD. Similar to map, but runs separately on each partition (block) of the RDD, so func must accept an Array. func should return a array rather than a single item.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
preservesPartitioning boolean <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapPartitionsWithIndex(func, preservesPartitioningopt, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this RDD, while tracking the index of the original partition. `preservesPartitioning` indicates whether the input function preserves the partitioner, which should be `false` unless this is a pair RDD and the input function doesn't modify the keys.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
preservesPartitioning boolean <optional>
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

mapToFloat((function), bindArgsopt) → {module:eclairjs.FloatRDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
(function) func - (undocumented) Function with one parameter that returns tuple
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.FloatRDD

mapToPair((function), bindArgsopt) → {module:eclairjs.PairRDD}

Return a new RDD by applying a function to all elements of this RDD.
Parameters:
Name Type Attributes Description
(function) func - (undocumented) Function with one parameter that returns tuple
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.PairRDD

mapValues(f, bindArgsopt) → {module:eclairjs.PairRDD}

Pass each value in the key-value pair RDD through a map function without changing the keys; this also retains the original RDD's partitioning.
Parameters:
Name Type Attributes Description
f func
bindArgs Array.<object> <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

max((function), bindArgsopt) → {object}

Returns the max of this RDD as defined by the implicit Ordering[T].
Parameters:
Name Type Attributes Description
(function) comparator - Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.
bindArgs Array.<Object> <optional>
array whose values will be added to comparator's argument list.
Inherited From:
Source:
Returns:
the maximum element of the RDD
Type
object

min((function), bindArgsopt) → {object}

Returns the min of this RDD as defined by the implicit Ordering[T].
Parameters:
Name Type Attributes Description
(function) comparator - Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the second argument is less than, equal to, or greater than the first.
bindArgs Array.<Object> <optional>
array whose values will be added to compartor's argument list.
Inherited From:
Source:
Returns:
the minimum element of the RDD
Type
object

name() → {string}

A friendly name for this RDD
Inherited From:
Source:
Returns:
Type
string

persist(newLevel) → {module:eclairjs.RDD}

Parameters:
Name Type Description
newLevel module:eclairjs/storage.StorageLevel
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

pipe(command, env) → {module:eclairjs.RDD}

Return an RDD created by piping elements to a forked external process. The print behavior can be customized by providing two functions.
Parameters:
Name Type Description
command List | string command to run in forked process.
env Map environment variables to set.
Inherited From:
Source:
Returns:
the result RDD
Type
module:eclairjs.RDD

reduce(func, bindArgsopt) → {object}

Reduces the elements of this RDD using the specified commutative and associative binary operator.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with two parameters
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
object

reduceByKey(func, bindArgsopt) → {module:eclairjs.PairRDD}

Merge the values for each key using an associative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.
Parameters:
Name Type Attributes Description
func func
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.PairRDD

reduceByKeyLocally(func) → {Object}

Merge the values for each key using an associative reduce function, but return the results immediately to the master as a Map. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.
Parameters:
Name Type Description
func func
Source:
Returns:
Key value pair hashmap
Type
Object

repartition(numPartitions) → {module:eclairjs.RDD}

Return a new RDD that has exactly numPartitions partitions. Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data. If you are decreasing the number of partitions in this RDD, consider using `coalesce`, which can avoid performing a shuffle.
Parameters:
Name Type Description
numPartitions int (undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

rightOuterJoin(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Perform a right outer join of `this` and `other`. For each element (k, w) in `other`, the resulting RDD will either contain all pairs (k, (Some(v), w)) for v in `this`, or the pair (k, (None, w)) if no elements in `this` have key k.
Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

sample(withReplacement, fraction, seedopt) → {module:eclairjs.RDD}

Return a sampled subset of this RDD.
Parameters:
Name Type Attributes Description
withReplacement boolean can elements be sampled multiple times (replaced when sampled out)
fraction number expected size of the sample as a fraction of this RDD's size without replacement: probability that each element is chosen; fraction must be [0, 1] with replacement: expected number of times each element is chosen; fraction must be >= 0
seed number <optional>
seed for the random number generator
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

sampleByKey(withReplacement, fractions, seedopt) → {module:eclairjs.PairRDD}

Return a subset of this RDD sampled by key (via stratified sampling). Create a sample of this RDD using variable sampling rates for different keys as specified by `fractions`, a key to sampling rate map, via simple random sampling with one pass over the RDD, to produce a sample of size that's approximately equal to the sum of math.ceil(numItems * samplingRate) over all key values.
Parameters:
Name Type Attributes Description
withReplacement boolean
fractions object key, value pair object Hash Map
seed number <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

sampleByKeyExact(withReplacement, fractions, seedopt) → {module:eclairjs.PairRDD}

Return a subset of this RDD sampled by key (via stratified sampling) containing exactly math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key). This method differs from sampleByKey in that we make additional passes over the RDD to create a sample size that's exactly equal to the sum of math.ceil(numItems * samplingRate) over all key values with a 99.99% confidence. When sampling without replacement, we need one additional pass over the RDD to guarantee sample size; when sampling with replacement, we need two additional passes.
Parameters:
Name Type Attributes Description
withReplacement boolean
fractions object key, value pair object Hash Map
seed number <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

saveAsObjectFile(path, overwriteopt) → {void}

Save this RDD as a SequenceFile of serialized objects.
Parameters:
Name Type Attributes Description
path string
overwrite boolean <optional>
defaults to false, if true overwrites file if it exists
Inherited From:
Source:
Returns:
Type
void

saveAsTextFile(path, overwriteopt) → {void}

Save this RDD as a text file, using string representations of elements.
Parameters:
Name Type Attributes Description
path string
overwrite boolean <optional>
defaults to false, if true overwrites file if it exists
Inherited From:
Source:
Returns:
Type
void

setName() → {module:eclairjs.RDD}

Assign a name to this RDD.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

sortBy(func, ascending, numPartitions, bindArgsopt) → {module:eclairjs.RDD}

Return this RDD sorted by the given key function.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
ascending boolean
numPartitions int
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

sortByKey(ascendingopt, numPartitionsopt) → {module:eclairjs.PairRDD}

Sort the RDD by key, so that each partition contains a sorted range of the elements. Calling `collect` or `save` on the resulting RDD will return or output an ordered list of records (in the `save` case, they will be written to multiple `part-X` files in the filesystem, in order of the keys).
Parameters:
Name Type Attributes Description
ascending boolean <optional>
defaults to false
numPartitions number <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

sparkContext() → {module:eclairjs.SparkContext}

The SparkContext that created this RDD.
Inherited From:
Source:
Returns:
Type
module:eclairjs.SparkContext

subtract(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Return an RDD with the elements from `this` that are not in `other`.
Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Overrides:
Source:
Returns:
Type
module:eclairjs.PairRDD

subtractByKey(other, numPartitionsopt) → {module:eclairjs.PairRDD}

Parameters:
Name Type Attributes Description
other module:eclairjs.PairRDD
numPartitions integer <optional>
Source:
Returns:
Type
module:eclairjs.PairRDD

take(num) → {Array}

Take the first num elements of the RDD.
Parameters:
Name Type Description
num int
Inherited From:
Source:
Returns:
Type
Array

takeOrdered(num, funcopt, bindArgsopt) → {Array}

Returns the first k (smallest) elements from this RDD as defined by the specified implicit Ordering[T] and maintains the ordering. This does the opposite of top.
Parameters:
Name Type Attributes Description
num number the number of elements to return
func function <optional>
compares to arguments
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
an array of top elements
Type
Array
Example
var result = rdd.takeOrdered(25, function(a, b){
      return a > b ? -1 : a == b? 0 : 1;
    });

takeSample(withReplacement, num, seedopt) → {Array}

Return a fixed-size sampled subset of this RDD in an array
Parameters:
Name Type Attributes Description
withReplacement boolean whether sampling is done with replacement
num number size of the returned sample
seed number <optional>
seed for the random number generator
Inherited From:
Source:
Returns:
sample of specified size in an array
Type
Array

toArray() → {Array}

Return an array that contains all of the elements in this RDD.
Inherited From:
Source:
Returns:
Type
Array

toDebugString() → {string}

A description of this RDD and its recursive dependencies for debugging.
Inherited From:
Source:
Returns:
Type
string

top(num) → {Array}

Returns the top k (largest) elements from this RDD as defined by the specified implicit Ordering[T]. This does the opposite of takeOrdered. For example: {{{ sc.parallelize(Seq(10, 4, 2, 12, 3)).top(1) // returns Array(12) sc.parallelize(Seq(2, 3, 4, 5, 6)).top(2) // returns Array(6, 5) }}}
Parameters:
Name Type Description
num number k, the number of top elements to return
Inherited From:
Source:
Returns:
an array of top elements
Type
Array

toString() → {string}

Inherited From:
Source:
Returns:
Type
string

treeAggregate(zeroValue, func1, func2, bindArgs1opt, bindArgs2opt) → {object}

Aggregates the elements of this RDD in a multi-level tree pattern.
Parameters:
Name Type Attributes Description
zeroValue (undocumented)
func1 function (undocumented) Function with two parameters
func2 function combOp - (undocumented) Function with two parameters
bindArgs1 Array.<Object> <optional>
array whose values will be added to func1's argument list.
bindArgs2 Array.<Object> <optional>
array whose values will be added to func2's argument list.
Inherited From:
Source:
See:
  • [[org.apache.spark.rdd.RDD#aggregate]]
Returns:
Type
object

treeReduce(func, depth, bindArgsopt) → {object}

Reduces the elements of this RDD in a multi-level tree pattern.
Parameters:
Name Type Attributes Description
func function (undocumented) Function with one parameter
depth number suggested depth of the tree (default: 2)
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
See:
  • [[org.apache.spark.rdd.RDD#reduce]]
Returns:
Type
object

union(other) → {module:eclairjs.PairRDD}

Return the union of this RDD and another one. Any identical elements will appear multiple times (use `.distinct()` to eliminate them).
Parameters:
Name Type Description
other module:eclairjs.PairRDD
Overrides:
Source:
Returns:
Type
module:eclairjs.PairRDD

unpersist(blocking) → {module:eclairjs.RDD}

Mark the RDD as non-persistent, and remove all blocks for it from memory and disk.
Parameters:
Name Type Description
blocking boolean Whether to block until all blocks are deleted.
Inherited From:
Source:
Returns:
This RDD.
Type
module:eclairjs.RDD

wrapRDD(rdd) → {module:eclairjs.PairRDD}

Parameters:
Name Type Description
rdd module:eclairjs.RDD
Source:
Returns:
Type
module:eclairjs.PairRDD

zip(other) → {module:eclairjs.RDD}

Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the *same number of partitions* and the *same number of elements in each partition* (e.g. one was made through a map on the other).
Parameters:
Name Type Description
other module:eclairjs.RDD (undocumented)
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipPartitions(rdd2, func, bindArgsopt) → {module:eclairjs.RDD}

Zip this RDD's partitions with another RDD and return a new RDD by applying a function to the zipped partitions. Assumes that both the RDDs have the same number of partitions, but does not require them to have the same number of elements in each partition.
Parameters:
Name Type Attributes Description
rdd2 module:eclairjs.RDD
func function Function with two parameters
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipWithIndex() → {module:eclairjs.RDD}

Zips this RDD with its element indices. The ordering is first based on the partition index and then the ordering of items within each partition. So the first item in the first partition gets index 0, and the last item in the last partition receives the largest index. This is similar to Scala's zipWithIndex but it uses Long instead of Int as the index type. This method needs to trigger a spark job when this RDD contains more than one partitions. Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The index assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD

zipWithUniqueId() → {module:eclairjs.RDD}

Zips this RDD with generated unique Long ids. Items in the kth partition will get ids k, n+k, 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method won't trigger a spark job, which is different from [[org.apache.spark.rdd.RDD#zipWithIndex]]. Note that some RDDs, such as those returned by groupBy(), do not guarantee order of elements in a partition. The unique ID assigned to each element is therefore not guaranteed, and may even change if the RDD is reevaluated. If a fixed ordering is required to guarantee the same index assignments, you should sort the RDD with sortByKey() or save it to a file.
Inherited From:
Source:
Returns:
Type
module:eclairjs.RDD