JSDoc: Class: SparkContext

Constructor

new SparkContext(conf)

Parameters:

Name	Type	Description
`conf`	module:eclairjs.SparkConf	a object specifying Spark parameters

Source:

eclairjs/SparkContext.js, line 54

Methods

accumulable(initialValue, param, name) → {module:eclairjs.Accumulable}

Create an Accumulable shared variable of the given type, to which tasks can "add" values with add. Only the master can access the accumuable's value.

Parameters:

Name	Type	Description
`initialValue`	object
`param`	module:eclairjs.AccumulableParam
`name`	string	of the accumulator for display in Spark's web UI.

Source:

eclairjs/SparkContext.js, line 259

Returns:

Type: module:eclairjs.Accumulable

accumulator(initialValue, nameopt, paramopt) → {module:eclairjs.Accumulator}

Create an Accumulator variable, which tasks can "add" values to using the add method. Only the master can access the accumulator's value.

Parameters:

Name	Type	Attributes	Description
`initialValue`	int \| float
`name`	string \| AccumulableParam	<optional>	of the accumulator for display in Spark's web UI. or param. defaults to FloatAccumulatorParam
`param`	module:eclairjs.AccumulableParam	<optional>	defaults to FloatAccumulatorParam, use only if also specifying name

Source:

eclairjs/SparkContext.js, line 273

Returns:

Type: module:eclairjs.Accumulator

addCustomModules()

Zip up all required files not in JAR to preserve paths and add it to worker node for download via addFile.

Source:

eclairjs/SparkContext.js, line 598

addFile(path)

Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs, use SparkFiles.get(fileName) to find its download location.

Parameters:

Name	Type	Description
`path`	string	Path to the file

Source:

eclairjs/SparkContext.js, line 345

addJar(path)

Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.

Parameters:

Name	Type	Description
`path`	string	Path to the jar

Source:

eclairjs/SparkContext.js, line 352

addModule()

Zip up a file in a directory to preserve it's path and add it to worker node for download via addFile.

Source:

eclairjs/SparkContext.js, line 573

applicationAttemptId() → {string}

Source:

eclairjs/SparkContext.js, line 171

Returns:

Type: string

applicationId() → {string}

A unique identifier for the Spark application. Its format depends on the scheduler implementation. (i.e. in case of local spark app something like 'local-1433865536131' in case of YARN something like 'application_1433865536131_34483' )

Source:

eclairjs/SparkContext.js, line 163

Returns:

Type: string

appName() → {string}

Source:

eclairjs/SparkContext.js, line 115

Returns:

Type: string

broadcast(value) → {module:eclairjs/broadcast.Broadcast}

Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions. The variable will be sent to each cluster only once.

Parameters:

Name	Type	Description
`value`	object	JSON object.

Source:

eclairjs/SparkContext.js, line 362

Returns:

Type: module:eclairjs/broadcast.Broadcast

clearJobGroup()

clearJobGroup

Source:

eclairjs/SparkContext.js, line 245

files() → {Array.<string>}

Source:

eclairjs/SparkContext.js, line 99

Returns:

Type: Array.<string>

floatAccumulator(initialValue, name) → {module:eclairjs.Accumulator}

Create an Accumulator float variable, which tasks can "add" values to using the add method. Only the master can access the accumulator's value.

Parameters:

Name	Type	Description
`initialValue`	float
`name`	string	of the accumulator for display in Spark's web UI.

Source:

eclairjs/SparkContext.js, line 333

Returns:

Type: module:eclairjs.Accumulator

getConf() → {module:eclairjs.SparkConf}

Return a copy of this SparkContext's configuration. The configuration ''cannot'' be changed at runtime.

Source:

eclairjs/SparkContext.js, line 82

Returns:

Type: module:eclairjs.SparkConf

getHadoopConfiguration(key) → {string}

Get the value of the name property of the Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse. '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you plan to set some global configurations for all Hadoop RDDs. , null if no such property exists.

Parameters:

Name	Type	Description
`key`

Source:

eclairjs/SparkContext.js, line 685

Returns:

Type: string

getLocalProperty(key) → {string}

Get a local property set in this thread, or null if it is missing. See setLocalProperty.

Parameters:

Name	Type	Description
`key`	string

Source:

eclairjs/SparkContext.js, line 210

Returns:

Type: string

intAccumulator(initialValue, name) → {module:eclairjs.Accumulator}

Create an Accumulator integer variable, which tasks can "add" values to using the add method. Only the master can access the accumulator's value.

Parameters:

Name	Type	Description
`initialValue`	int
`name`	string	of the accumulator for display in Spark's web UI.

Source:

eclairjs/SparkContext.js, line 319

Returns:

Type: module:eclairjs.Accumulator

isLocal() → {boolean}

Source:

eclairjs/SparkContext.js, line 123

Returns:

Type: boolean

isStopped() → {boolean}

Source:

eclairjs/SparkContext.js, line 130

Returns:

true if context is stopped or in the midst of stopping.

Type: boolean

jars() → {Array.<string>}

Source:

eclairjs/SparkContext.js, line 91

Returns:

Type: Array.<string>

listFiles() → {Array.<string>}

Returns a list of file paths that are added to resources.

Source:

eclairjs/SparkContext.js, line 369

Returns:

Type: Array.<string>

master() → {string}

Source:

eclairjs/SparkContext.js, line 107

Returns:

Type: string

objectFile(path, minPartitionsopt) → {module:eclairjs.RDD}

Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and BytesWritable values that contain a serialized partition. This is still an experimental storage format and may not be supported exactly as is in future releases.

Parameters:

Name	Type	Attributes	Description
`path`	string
`minPartitions`	integer	<optional>

Source:

eclairjs/SparkContext.js, line 626

Returns:

Type: module:eclairjs.RDD

parallelize(list, numSlicesopt) → {module:eclairjs.RDD}

Distribute a local Scala collection to form an RDD.

Parameters:

Name	Type	Attributes	Description
`list`	array
`numSlices`	integer	<optional>

Source:

eclairjs/SparkContext.js, line 386

Returns:

Type: module:eclairjs.RDD

parallelizePairs(list, numSlices) → {module:eclairjs.PairRDD}

Distribute a local collection to form an RDD.

Parameters:

Name	Type	Description
`list`	array	array of Tuple 2
`numSlices`	integer

Source:

eclairjs/SparkContext.js, line 408

Returns:

Type: module:eclairjs.PairRDD

range(start, end, step, numSlices) → {module:eclairjs.RDD}

Creates a new RDD[Long] containing elements from `start` to `end`(exclusive), increased by `step` every element.

Parameters:

Name	Type	Description
`start`	number	the start value.
`end`	number	the end value.
`step`	number	the incremental step
`numSlices`	number	the partition number of the new RDD.

Source:

eclairjs/SparkContext.js, line 433

Returns:

Type: module:eclairjs.RDD

register(acc, nameopt)

Parameters:

Name	Type	Attributes	Description
`acc`	module:eclairjs/util.AccumulatorV2
`name`	string	<optional>

Source:

eclairjs/SparkContext.js, line 296

setCheckpointDir(dir)

Set the directory under which RDDs are going to be checkpointed. The directory must be a HDFS path if running on a cluster.

Parameters:

Name	Type	Description
`dir`	string

Source:

eclairjs/SparkContext.js, line 498

setHadoopConfiguration(key, value) → {void}

Set the value of the name property of the Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse. '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you plan to set some global configurations for all Hadoop RDDs.

Parameters:

Name	Type	Description
`key`
`value`

Source:

eclairjs/SparkContext.js, line 675

Returns:

Type: void

Example

sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.auth.url", "https://identity.open.softlayer.com/v3/auth/tokens");
 sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.auth.endpoint.prefix", "endpoints");
 sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.tenant", "productid"); // IBM BlueMix Object Store product id
 sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.username", "userid"); // IBM BlueMix Object Store user id
 sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.password", "secret"); // IBM BlueMix Object Store password
 sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.apikey", "secret"); // IBM BlueMix Object Store password

 var rdd = sparkContext.textFile("swift://wordcount.softlayer/dream.txt").cache();

setJobDescription(value)

Parameters:

Name	Type	Description
`value`	string

Source:

eclairjs/SparkContext.js, line 218

setJobGroup(groupId, description, interruptOnCancel)

Assigns a group ID to all the jobs started by this thread until the group ID is set to a different value or cleared. Often, a unit of execution in an application consists of multiple Spark actions or jobs. Application programmers can use this method to group all those jobs together and give a group description. Once set, the Spark web UI will associate such jobs with this group. The application can also use cancelJobGroup to cancel all running jobs in this group. For example,

Parameters:

Name	Type	Description
`groupId`	string
`description`	string
`interruptOnCancel`	boolean

Source:

eclairjs/SparkContext.js, line 237

setLocalProperty(key, value)

Set a local property that affects jobs submitted from this thread, such as the Spark fair scheduler pool. User-defined properties may also be set here. These properties are propagated through to worker tasks and can be accessed there via [[org.apache.spark.TaskContext#getLocalProperty]]. These properties are inherited by child threads spawned from this thread. This may have unexpected consequences when working with thread pools. The standard java implementation of thread pools have worker threads spawn other worker threads. As a result, local properties may propagate unpredictably.

Parameters:

Name	Type	Description
`key`	string
`value`	string

Source:

eclairjs/SparkContext.js, line 199

setLogLevel(logLevel)

Parameters:

Name	Type	Description
`logLevel`	string	The desired log level as a string. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN

Source:

eclairjs/SparkContext.js, line 180

statusTracker() → {SparkStatusTracker}

Source:

eclairjs/SparkContext.js, line 138

Returns:

Type: SparkStatusTracker

stop()

Shut down the SparkContext.

Source:

eclairjs/SparkContext.js, line 505

textFile(path, minPartitionsopt) → {module:eclairjs.RDD}

Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.

Parameters:

Name	Type	Attributes	Description
`path`	string		path to file
`minPartitions`	int	<optional>

Source:

eclairjs/SparkContext.js, line 446

Returns:

Type: module:eclairjs.RDD

uiWebUrl() → {string}

Source:

eclairjs/SparkContext.js, line 144

Returns:

Type: string

version() → {string}

The version of EclairJS and Spark on which this application is running.

Source:

eclairjs/SparkContext.js, line 513

Returns:

Type: string

wholeTextFiles(path, minPartitions) → {module:eclairjs.RDD}

Read a directory of text files from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. Each file is read as a single record and returned in a key-value pair, where the key is the path of each file, the value is the content of each file.

For example, if you have the following files:

Parameters:

Name	Type	Description
`path`	string	Directory to the input data files, the path can be comma separated paths as the list of inputs.
`minPartitions`	number	A suggestion value of the minimal splitting number for input data.

Source:

eclairjs/SparkContext.js, line 488

Returns:

Type: module:eclairjs.RDD

Examples

hdfs://a-hdfs-path/part-00000
  hdfs://a-hdfs-path/part-00001
  ...
  hdfs://a-hdfs-path/part-nnnnn


Do `var rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,

<p> then `rdd` contains

(a-hdfs-path/part-00000, its content)
  (a-hdfs-path/part-00001, its content)
  ...
  (a-hdfs-path/part-nnnnn, its content)