new DStream()
- Source:
Methods
filter(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}
Return a new DStream containing only the elements that satisfy a predicate.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
flatMap(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}
Return a new DStream by first applying a function to all elements of this DStream, and then flattening the results.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
|||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
foreachRDD(remoteFunc, bindArgs, localFuncopt) → {Promise.<void>}
Apply a function to each RDD in this DStream.
foreachRDD works slightly different in EclairJS-node compared to regular Spark due to the fact that it executes the
code remotely in EclairJS-nashorn running in Apache Toree. Instead of just one lambda function, EclairJS-node's
foreachRDD takes in two.
The first argument is a function which is run remotely on EclairJS-nashorn in Apache Toree and has several
restrictions:
- Need to use the EclairJS-nashorn API (https://github.com/EclairJS/eclairjs-nashorn/wiki/API-Documentation). The
main difference is that methods calls are always synchronous - so for example count() will return the number
directly (while in EclairJS-node it would return a Promise).
- The code in the remote function must return a JSON serializable value. This means no asynchronous calls can
happen.
The second argument in foreachRDD is a bindArgs - an array of values that will be added to the remote function's
argument list. Set it to null if none are needed.
The third argument is function that runs on the Node side. The remote function's return value (which has already
been JSON parsed) will be passed into the local function as an argument.
You will need to run all Spark computations in the remote function and use the local function to send the result to
its final destination (your datastore of choice for example).
Example:
var dStream = ...;
dStream.foreachRDD(
function(rdd) {
// runs remotely
return rdd.collect();
},
null,
function(res) {
// runs locally in Node
console.log('Results: ', res)
}
)
The remote function collects the contents of the RDD and returns the array, which then gets passed as an argument
into the local function.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
remoteFunc |
function | lambda function that runs on the Spark side. The returned result from the lambda will be passed into localFunc as an argument | |
bindArgs |
Array.<Object> | array whose values will be added to remoteFunc's argument list. | |
localFunc |
function |
<optional> |
lambda function that runs on the Node side. |
- Source:
Returns:
- Type
- Promise.<void>
map(func, bindArgsopt) → {module:eclairjs/streaming/dstream.DStream}
Return a new DStream by applying a function to all elements of this DStream.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
|||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
mapToPair(func, bindArgsopt) → {module:eclairjs/streaming/dstream.PairDStream}
Return a new DStream by applying a function to all elements of this DStream.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
|||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
persist(levelopt) → {module:eclairjs/streaming/dstream.DStream}
Persist RDDs of this DStream with the storage level
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
level |
module:eclairjs/storage.StorageLevel |
<optional> |
(MEMORY_ONLY_SER) by default |
- Source:
Returns:
print() → {void}
Print the first ten elements of each RDD generated in this DStream. This is an output operator, so this DStream will be
registered as an output stream and there materialized.
- Source:
Returns:
- Type
- void
transformToPair(transformFunc, bindArgsopt) → {module:eclairjs/streaming/dstream.PairDStream}
Return a new DStream in which each RDD is generated by applying a function
on each RDD of 'this' DStream.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
transformFunc |
func | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
window(duration) → {module:eclairjs/streaming/dstream.DStream}
Return a new DStream in which each RDD contains all the elements in seen in a sliding window of time over this DStream.
The new DStream generates RDDs with the same interval as this DStream.
Parameters:
Name | Type | Description |
---|---|---|
duration |
width of the window; must be a multiple of this DStream's interval. |
- Source: