Class: DStream

eclairjs/streaming/dstream.DStream

new DStream()

Source:

Methods

filter(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}

Return a new DStream containing only the elements that satisfy a predicate.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.DStream

flatMap(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}

Return a new DStream by first applying a function to all elements of this DStream, and then flattening the results.
Parameters:
Name Type Attributes Description
func
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.DStream

foreachRDD(remoteFunc, bindArgs, localFuncopt) → {Promise.<void>}

Apply a function to each RDD in this DStream. foreachRDD works slightly different in EclairJS-node compared to regular Spark due to the fact that it executes the code remotely in EclairJS-nashorn running in Apache Toree. Instead of just one lambda function, EclairJS-node's foreachRDD takes in two. The first argument is a function which is run remotely on EclairJS-nashorn in Apache Toree and has several restrictions: - Need to use the EclairJS-nashorn API (https://github.com/EclairJS/eclairjs-nashorn/wiki/API-Documentation). The main difference is that methods calls are always synchronous - so for example count() will return the number directly (while in EclairJS-node it would return a Promise). - The code in the remote function must return a JSON serializable value. This means no asynchronous calls can happen. The second argument in foreachRDD is a bindArgs - an array of values that will be added to the remote function's argument list. Set it to null if none are needed. The third argument is function that runs on the Node side. The remote function's return value (which has already been JSON parsed) will be passed into the local function as an argument. You will need to run all Spark computations in the remote function and use the local function to send the result to its final destination (your datastore of choice for example). Example: var dStream = ...; dStream.foreachRDD( function(rdd) { // runs remotely return rdd.collect(); }, null, function(res) { // runs locally in Node console.log('Results: ', res) } ) The remote function collects the contents of the RDD and returns the array, which then gets passed as an argument into the local function.
Parameters:
Name Type Attributes Description
remoteFunc function lambda function that runs on the Spark side. The returned result from the lambda will be passed into localFunc as an argument
bindArgs Array.<Object> array whose values will be added to remoteFunc's argument list.
localFunc function <optional>
lambda function that runs on the Node side.
Source:
Returns:
Type
Promise.<void>

map(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}

Return a new DStream by applying a function to all elements of this DStream.
Parameters:
Name Type Attributes Description
func
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.DStream

mapToPair(func, bindArgsopt) → {module:eclairjs/streaming/dstream.PairDStream}

Return a new DStream by applying a function to all elements of this DStream.
Parameters:
Name Type Attributes Description
func
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.PairDStream

persist(levelopt) → {module:eclairjs/streaming/dstream.DStream}

Persist RDDs of this DStream with the storage level
Parameters:
Name Type Attributes Description
level module:eclairjs/storage.StorageLevel <optional>
(MEMORY_ONLY_SER) by default
Source:
Returns:
Type
module:eclairjs/streaming/dstream.DStream

print() → {void}

Print the first ten elements of each RDD generated in this DStream. This is an output operator, so this DStream will be registered as an output stream and there materialized.
Source:
Returns:
Type
void

transformToPair(transformFunc, bindArgsopt) → {module:eclairjs/streaming/dstream.PairDStream}

Return a new DStream in which each RDD is generated by applying a function on each RDD of 'this' DStream.
Parameters:
Name Type Attributes Description
transformFunc func
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.PairDStream

window(duration) → {module:eclairjs/streaming/dstream.DStream}

Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream. The new DStream generates RDDs with the same interval as this DStream.
Parameters:
Name Type Description
duration width of the window; must be a multiple of this DStream's interval.
Source:
Returns:
Type
module:eclairjs/streaming/dstream.DStream