Class: SparkContext

eclairjs.SparkContext

A JavaScript-friendly version of SparkContext that returns RDDs Only one SparkContext may be active per JVM. You must stop() the active SparkContext before creating a new one. This limitation may eventually be removed; see SPARK-2243 for more details.

Constructor

new SparkContext(master, name)

Parameters:
Name Type Description
master string Cluster URL to connect to
name string A name for your application, to display on the cluster web UI
Source:

Methods

accumulable(initialValue, param, name) → {module:eclairjs.Accumulable}

Create an Accumulable shared variable of the given type, to which tasks can "add" values with add. Only the master can access the accumuable's value.
Parameters:
Name Type Description
initialValue object
param module:eclairjs.AccumulableParam
name string of the accumulator for display in Spark's web UI.
Source:
Returns:
Type
module:eclairjs.Accumulable

accumulator(initialValue, nameopt, paramopt) → {module:eclairjs.Accumulator}

Create an Accumulator variable, which tasks can "add" values to using the add method. Only the master can access the accumulator's value.
Parameters:
Name Type Attributes Description
initialValue int | float
name string | AccumulableParam <optional>
of the accumulator for display in Spark's web UI. or param. defaults to FloatAccumulatorParam
param module:eclairjs.AccumulableParam <optional>
defaults to FloatAccumulatorParam, use only if also specifying name
Source:
Returns:
Type
module:eclairjs.Accumulator

addFile(path, recursiveopt) → {Promise.<Void>}

Add a file to be downloaded with this Spark job on every node. The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, use `SparkFiles.get(fileName)` to find its download location. A directory can be given if the recursive option is set to true. Currently directories are only supported for Hadoop-supported filesystems.
Parameters:
Name Type Attributes Description
path string
recursive boolean <optional>
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

addJar(path) → {Promise.<Void>}

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
Parameters:
Name Type Description
path string
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

applicationAttemptId() → {Promise.<string>}

Source:
Returns:
Type
Promise.<string>

applicationId() → {Promise.<string>}

A unique identifier for the Spark application. Its format depends on the scheduler implementation. (i.e. in case of local spark app something like 'local-1433865536131' in case of YARN something like 'application_1433865536131_34483' )
Source:
Returns:
Type
Promise.<string>

appName() → {Promise.<string>}

Source:
Returns:
Type
Promise.<string>

broadcast(value) → {Broadcast}

Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions. The variable will be sent to each cluster only once.
Parameters:
Name Type Description
value object
Source:
Returns:
Type
Broadcast

clearJobGroup() → {Promise.<Void>}

Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

files() → {Promise.<Array.<string>>}

Source:
Returns:
Type
Promise.<Array.<string>>

getConf() → {module:eclairjs.SparkConf}

Return a copy of this SparkContext's configuration. The configuration ''cannot'' be changed at runtime.
Source:
Returns:
Type
module:eclairjs.SparkConf

getLocalProperty(key) → {Promise.<string>}

Get a local property set in this thread, or null if it is missing. See setLocalProperty.
Parameters:
Name Type Description
key string
Source:
Returns:
Type
Promise.<string>

initLocalProperties() → {Promise.<Void>}

Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

isLocal() → {Promise.<boolean>}

Source:
Returns:
Type
Promise.<boolean>

isStopped() → {Promise.<boolean>}

Source:
Returns:
true if context is stopped or in the midst of stopping.
Type
Promise.<boolean>

jars() → {Promise.<Array.<string>>}

Source:
Returns:
Type
Promise.<Array.<string>>

master() → {Promise.<string>}

Source:
Returns:
Type
Promise.<string>

objectFile(path, minPartitionsopt) → {module:eclairjs/rdd.RDD}

Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and BytesWritable values that contain a serialized partition. This is still an experimental storage format and may not be supported exactly as is in future releases.
Parameters:
Name Type Attributes Description
path string path to file
minPartitions int <optional>
Source:
Returns:
Type
module:eclairjs/rdd.RDD

parallelize(list, numSlicesopt) → {module:eclairjs/rdd.RDD}

Distribute a local array to form an RDD.
Parameters:
Name Type Attributes Description
list array
numSlices integer <optional>
Source:
Returns:
Type
module:eclairjs/rdd.RDD

parallelizePairs(list, numSlicesopt) → {module:eclairjs/rdd.PairRDD}

Distribute a local array to form an RDD.
Parameters:
Name Type Attributes Description
list array
numSlices integer <optional>
Source:
Returns:
Type
module:eclairjs/rdd.PairRDD

range(start, end, step, numSlices) → {module:eclairjs/rdd.RDD}

Creates a new RDD[Long] containing elements from `start` to `end`(exclusive), increased by `step` every element.
Parameters:
Name Type Description
start number the start value.
end number the end value.
step number the incremental step
numSlices number the partition number of the new RDD.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

setCheckpointDir(directory) → {Promise.<Void>}

Set the directory under which RDDs are going to be checkpointed. The directory must be a HDFS path if running on a cluster.
Parameters:
Name Type Description
directory string
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

setJobDescription(value) → {Promise.<Void>}

Parameters:
Name Type Description
value string
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

setJobGroup(groupId, description, interruptOnCancel) → {Promise.<Void>}

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared. Often, a unit of execution in an application consists of multiple Spark actions or jobs. Application programmers can use this method to group all those jobs together and give a group description. Once set, the Spark web UI will associate such jobs with this group. The application can also use cancelJobGroup to cancel all running jobs in this group. For example,
Parameters:
Name Type Description
groupId string
description string
interruptOnCancel boolean
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>
Example
// In the main thread:
sc.setJobGroup("some_job_to_cancel", "some job description")
sc.parallelize(1 to 10000, 2).map { i => Thread.sleep(10); i }.count()

// In a separate thread:
sc.cancelJobGroup("some_job_to_cancel")


If interruptOnCancel is set to true for the job group, then job cancellation will result
in Thread.interrupt() being called on the job's executor threads. This is useful to help ensure
that the tasks are actually stopped in a timely manner, but is off by default due to HDFS-1208,
where HDFS may respond to Thread.interrupt() by marking nodes as dead.

setLocalProperty(key, value) → {Promise.<Void>}

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool.
Parameters:
Name Type Description
key string
value string
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

setLogLevel(logLevel) → {Promise.<Void>}

Parameters:
Name Type Description
logLevel string The desired log level as a string. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

statusTracker() → {SparkStatusTracker}

Source:
Returns:
Type
SparkStatusTracker

textFile(path, minPartitionsopt) → {module:eclairjs/rdd.RDD}

Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.
Parameters:
Name Type Attributes Description
path string path to file
minPartitions int <optional>
Source:
Returns:
Type
module:eclairjs/rdd.RDD

wholeTextFiles(path, minPartitions) → {module:eclairjs/rdd.RDD}

Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.

For example, if you have the following files:

Parameters:
Name Type Description
path string Directory to the input data files, the path can be comma separated paths as the list of inputs.
minPartitions number A suggestion value of the minimal splitting number for input data.
Source:
Returns:
Type
module:eclairjs/rdd.RDD
Examples
hdfs://a-hdfs-path/part-00000
  hdfs://a-hdfs-path/part-00001
  ...
  hdfs://a-hdfs-path/part-nnnnn


Do `val rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,

<p> then `rdd` contains
(a-hdfs-path/part-00000, its content)
  (a-hdfs-path/part-00001, its content)
  ...
  (a-hdfs-path/part-nnnnn, its content)