Constructor
new SparkContext(conf)
Parameters:
Name | Type | Description |
---|---|---|
conf |
module:eclairjs.SparkConf | a object specifying Spark parameters |
- Source:
Methods
accumulable(initialValue, param, name) → {module:eclairjs.Accumulable}
Create an Accumulable shared variable of the given type, to which tasks can "add" values with add.
Only the master can access the accumuable's value.
Parameters:
Name | Type | Description |
---|---|---|
initialValue |
object | |
param |
module:eclairjs.AccumulableParam | |
name |
string | of the accumulator for display in Spark's web UI. |
- Source:
Returns:
accumulator(initialValue, nameopt, paramopt) → {module:eclairjs.Accumulator}
Create an Accumulator variable, which tasks can "add" values to using the add method.
Only the master can access the accumulator's value.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
initialValue |
int | float | ||
name |
string | AccumulableParam |
<optional> |
of the accumulator for display in Spark's web UI. or param. defaults to FloatAccumulatorParam |
param |
module:eclairjs.AccumulableParam |
<optional> |
defaults to FloatAccumulatorParam, use only if also specifying name |
- Source:
Returns:
addCustomModules()
Zip up all required files not in JAR to preserve paths and add it to worker node for download via addFile.
- Source:
addFile(path)
Add a file to be downloaded with this Spark job on every node. The path passed can be either a local file,
a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
To access the file in Spark jobs, use SparkFiles.get(fileName) to find its download location.
Parameters:
Name | Type | Description |
---|---|---|
path |
string | Path to the file |
- Source:
addJar(path)
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future. The path passed can be either a local file, a file in HDFS (or other Hadoop-supported filesystems), or an HTTP, HTTPS or FTP URI.
Parameters:
Name | Type | Description |
---|---|---|
path |
string | Path to the jar |
- Source:
addModule()
Zip up a file in a directory to preserve it's path and add it to worker node for download via addFile.
- Source:
applicationAttemptId() → {string}
- Source:
Returns:
- Type
- string
applicationId() → {string}
A unique identifier for the Spark application.
Its format depends on the scheduler implementation.
(i.e.
in case of local spark app something like 'local-1433865536131'
in case of YARN something like 'application_1433865536131_34483'
)
- Source:
Returns:
- Type
- string
appName() → {string}
- Source:
Returns:
- Type
- string
broadcast(value) → {module:eclairjs/broadcast.Broadcast}
Broadcast a read-only variable to the cluster, returning a Broadcast object for reading it in distributed functions.
The variable will be sent to each cluster only once.
Parameters:
Name | Type | Description |
---|---|---|
value |
object | JSON object. |
- Source:
Returns:
clearJobGroup()
clearJobGroup
- Source:
files() → {Array.<string>}
- Source:
Returns:
- Type
- Array.<string>
floatAccumulator(initialValue, name) → {module:eclairjs.Accumulator}
Create an Accumulator float variable, which tasks can "add" values to using the add method.
Only the master can access the accumulator's value.
Parameters:
Name | Type | Description |
---|---|---|
initialValue |
float | |
name |
string | of the accumulator for display in Spark's web UI. |
- Source:
Returns:
getConf() → {module:eclairjs.SparkConf}
Return a copy of this SparkContext's configuration. The configuration ''cannot'' be
changed at runtime.
- Source:
Returns:
getHadoopConfiguration(key) → {string}
Get the value of the name property of the Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse.
'''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you plan to set some global configurations for all Hadoop RDDs.
, null if no such property exists.
Parameters:
Name | Type | Description |
---|---|---|
key |
- Source:
Returns:
- Type
- string
getLocalProperty(key) → {string}
Get a local property set in this thread, or null if it is missing. See
setLocalProperty.
Parameters:
Name | Type | Description |
---|---|---|
key |
string |
- Source:
Returns:
- Type
- string
intAccumulator(initialValue, name) → {module:eclairjs.Accumulator}
Create an Accumulator integer variable, which tasks can "add" values to using the add method.
Only the master can access the accumulator's value.
Parameters:
Name | Type | Description |
---|---|---|
initialValue |
int | |
name |
string | of the accumulator for display in Spark's web UI. |
- Source:
Returns:
isLocal() → {boolean}
- Source:
Returns:
- Type
- boolean
isStopped() → {boolean}
- Source:
Returns:
true if context is stopped or in the midst of stopping.
- Type
- boolean
jars() → {Array.<string>}
- Source:
Returns:
- Type
- Array.<string>
listFiles() → {Array.<string>}
Returns a list of file paths that are added to resources.
- Source:
Returns:
- Type
- Array.<string>
master() → {string}
- Source:
Returns:
- Type
- string
objectFile(path, minPartitionsopt) → {module:eclairjs.RDD}
Load an RDD saved as a SequenceFile containing serialized objects, with NullWritable keys and BytesWritable
values that contain a serialized partition. This is still an experimental storage format and may not be supported
exactly as is in future releases.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string | ||
minPartitions |
integer |
<optional> |
- Source:
Returns:
- Type
- module:eclairjs.RDD
parallelize(list, numSlicesopt) → {module:eclairjs.RDD}
Distribute a local Scala collection to form an RDD.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
list |
array | ||
numSlices |
integer |
<optional> |
- Source:
Returns:
- Type
- module:eclairjs.RDD
parallelizePairs(list, numSlices) → {module:eclairjs.PairRDD}
Distribute a local collection to form an RDD.
Parameters:
Name | Type | Description |
---|---|---|
list |
array | array of Tuple 2 |
numSlices |
integer |
- Source:
Returns:
range(start, end, step, numSlices) → {module:eclairjs.RDD}
Creates a new RDD[Long] containing elements from `start` to `end`(exclusive), increased by
`step` every element.
Parameters:
Name | Type | Description |
---|---|---|
start |
number | the start value. |
end |
number | the end value. |
step |
number | the incremental step |
numSlices |
number | the partition number of the new RDD. |
- Source:
Returns:
- Type
- module:eclairjs.RDD
register(acc, nameopt)
Register the given accumulator. Note that accumulators must be registered before use, or it
will throw exception.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
acc |
module:eclairjs/util.AccumulatorV2 | ||
name |
string |
<optional> |
- Source:
setCheckpointDir(dir)
Set the directory under which RDDs are going to be checkpointed.
The directory must be a HDFS path if running on a cluster.
Parameters:
Name | Type | Description |
---|---|---|
dir |
string |
- Source:
setHadoopConfiguration(key, value) → {void}
Set the value of the name property of the Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse.
'''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you plan to set some global configurations for all Hadoop RDDs.
Parameters:
Name | Type | Description |
---|---|---|
key |
||
value |
- Source:
Returns:
- Type
- void
Example
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.auth.url", "https://identity.open.softlayer.com/v3/auth/tokens");
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.auth.endpoint.prefix", "endpoints");
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.tenant", "productid"); // IBM BlueMix Object Store product id
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.username", "userid"); // IBM BlueMix Object Store user id
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.password", "secret"); // IBM BlueMix Object Store password
sparkContext.setHadoopConfiguration("fs.swift.service.softlayer.apikey", "secret"); // IBM BlueMix Object Store password
var rdd = sparkContext.textFile("swift://wordcount.softlayer/dream.txt").cache();
setJobDescription(value)
Parameters:
Name | Type | Description |
---|---|---|
value |
string |
- Source:
setJobGroup(groupId, description, interruptOnCancel)
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
Often, a unit of execution in an application consists of multiple Spark actions or jobs.
Application programmers can use this method to group all those jobs together and give a
group description. Once set, the Spark web UI will associate such jobs with this group.
The application can also use cancelJobGroup to cancel all
running jobs in this group. For example,
Parameters:
Name | Type | Description |
---|---|---|
groupId |
string | |
description |
string | |
interruptOnCancel |
boolean |
- Source:
setLocalProperty(key, value)
Set a local property that affects jobs submitted from this thread, such as the Spark fair
scheduler pool. User-defined properties may also be set here. These properties are propagated
through to worker tasks and can be accessed there via
[[org.apache.spark.TaskContext#getLocalProperty]].
These properties are inherited by child threads spawned from this thread. This
may have unexpected consequences when working with thread pools. The standard java
implementation of thread pools have worker threads spawn other worker threads.
As a result, local properties may propagate unpredictably.
Parameters:
Name | Type | Description |
---|---|---|
key |
string | |
value |
string |
- Source:
setLogLevel(logLevel)
Parameters:
Name | Type | Description |
---|---|---|
logLevel |
string | The desired log level as a string. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN |
- Source:
statusTracker() → {SparkStatusTracker}
- Source:
Returns:
- Type
- SparkStatusTracker
stop()
Shut down the SparkContext.
- Source:
textFile(path, minPartitionsopt) → {module:eclairjs.RDD}
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI,
and return it as an RDD of Strings.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string | path to file | |
minPartitions |
int |
<optional> |
- Source:
Returns:
- Type
- module:eclairjs.RDD
uiWebUrl() → {string}
- Source:
Returns:
- Type
- string
version() → {string}
The version of EclairJS and Spark on which this application is running.
- Source:
Returns:
- Type
- string
wholeTextFiles(path, minPartitions) → {module:eclairjs.RDD}
Read a directory of text files from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI. Each file is read as a single record and returned in a
key-value pair, where the key is the path of each file, the value is the content of each file.
For example, if you have the following files:
Parameters:
Name | Type | Description |
---|---|---|
path |
string | Directory to the input data files, the path can be comma separated paths as the list of inputs. |
minPartitions |
number | A suggestion value of the minimal splitting number for input data. |
- Source:
Returns:
- Type
- module:eclairjs.RDD
Examples
hdfs://a-hdfs-path/part-00000
hdfs://a-hdfs-path/part-00001
...
hdfs://a-hdfs-path/part-nnnnn
Do `var rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,
<p> then `rdd` contains
(a-hdfs-path/part-00000, its content)
(a-hdfs-path/part-00001, its content)
...
(a-hdfs-path/part-nnnnn, its content)