Constructor
new SparkContext(master, name)
Parameters:
Name | Type | Description |
---|---|---|
master |
string | Cluster URL to connect to |
name |
string | A name for your application, to display on the cluster web UI |
- Source:
Methods
accumulable(initialValue, param, name) → {module:eclairjs.Accumulable}
Create an Accumulable shared variable of the given type, to which tasks can "add" values with add.
Only the master can access the accumuable's value.
Parameters:
Name | Type | Description |
---|---|---|
initialValue |
object | |
param |
module:eclairjs.AccumulableParam | |
name |
string | of the accumulator for display in Spark's web UI. |
- Source:
Returns:
accumulator(initialValue, nameopt, paramopt) → {module:eclairjs.Accumulator}
Create an Accumulator variable, which tasks can "add" values to using the add method.
Only the master can access the accumulator's value.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
initialValue |
int | float | ||
name |
string | AccumulableParam |
<optional> |
of the accumulator for display in Spark's web UI. or param. defaults to FloatAccumulatorParam |
param |
module:eclairjs.AccumulableParam |
<optional> |
defaults to FloatAccumulatorParam, use only if also specifying name |
- Source:
Returns:
addFile(path, recursiveopt) → {Promise.<Void>}
Add a file to be downloaded with this Spark job on every node.
The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported
filesystems), or an HTTP, HTTPS or FTP URI. To access the file in Spark jobs,
use `SparkFiles.get(fileName)` to find its download location.
A directory can be given if the recursive option is set to true. Currently directories are only
supported for Hadoop-supported filesystems.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string | ||
recursive |
boolean |
<optional> |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
addJar(path) → {Promise.<Void>}
Adds a JAR dependency for all tasks to be executed on this SparkContext in the future.
The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported
filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
Parameters:
Name | Type | Description |
---|---|---|
path |
string |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
applicationAttemptId() → {Promise.<string>}
- Source:
Returns:
- Type
- Promise.<string>
applicationId() → {Promise.<string>}
A unique identifier for the Spark application.
Its format depends on the scheduler implementation.
(i.e.
in case of local spark app something like 'local-1433865536131'
in case of YARN something like 'application_1433865536131_34483'
)
- Source:
Returns:
- Type
- Promise.<string>
appName() → {Promise.<string>}
- Source:
Returns:
- Type
- Promise.<string>
broadcast(value) → {Broadcast}
Broadcast a read-only variable to the cluster, returning a
Broadcast object for reading it in distributed functions.
The variable will be sent to each cluster only once.
Parameters:
Name | Type | Description |
---|---|---|
value |
object |
- Source:
Returns:
- Type
- Broadcast
clearJobGroup() → {Promise.<Void>}
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
files() → {Promise.<Array.<string>>}
- Source:
Returns:
- Type
- Promise.<Array.<string>>
getConf() → {module:eclairjs.SparkConf}
Return a copy of this SparkContext's configuration. The configuration ''cannot'' be
changed at runtime.
- Source:
Returns:
getLocalProperty(key) → {Promise.<string>}
Get a local property set in this thread, or null if it is missing. See
setLocalProperty.
Parameters:
Name | Type | Description |
---|---|---|
key |
string |
- Source:
Returns:
- Type
- Promise.<string>
initLocalProperties() → {Promise.<Void>}
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
isLocal() → {Promise.<boolean>}
- Source:
Returns:
- Type
- Promise.<boolean>
isStopped() → {Promise.<boolean>}
- Source:
Returns:
true if context is stopped or in the midst of stopping.
- Type
- Promise.<boolean>
jars() → {Promise.<Array.<string>>}
- Source:
Returns:
- Type
- Promise.<Array.<string>>
master() → {Promise.<string>}
- Source:
Returns:
- Type
- Promise.<string>
parallelize(list, numSlicesopt) → {module:eclairjs/rdd.RDD}
Distribute a local array to form an RDD.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
list |
array | ||
numSlices |
integer |
<optional> |
- Source:
Returns:
parallelizePairs(list, numSlicesopt) → {module:eclairjs/rdd.PairRDD}
Distribute a local array to form an RDD.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
list |
array | ||
numSlices |
integer |
<optional> |
- Source:
Returns:
range(start, end, step, numSlices) → {module:eclairjs/rdd.RDD}
Creates a new RDD[Long] containing elements from `start` to `end`(exclusive), increased by
`step` every element.
Parameters:
Name | Type | Description |
---|---|---|
start |
number | the start value. |
end |
number | the end value. |
step |
number | the incremental step |
numSlices |
number | the partition number of the new RDD. |
- Source:
Returns:
setCheckpointDir(directory) → {Promise.<Void>}
Set the directory under which RDDs are going to be checkpointed. The directory must
be a HDFS path if running on a cluster.
Parameters:
Name | Type | Description |
---|---|---|
directory |
string |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
setJobDescription(value) → {Promise.<Void>}
Parameters:
Name | Type | Description |
---|---|---|
value |
string |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
setJobGroup(groupId, description, interruptOnCancel) → {Promise.<Void>}
Assigns a group ID to all the jobs started by this thread until the group ID is set to a
different value or cleared.
Often, a unit of execution in an application consists of multiple Spark actions or jobs.
Application programmers can use this method to group all those jobs together and give a
group description. Once set, the Spark web UI will associate such jobs with this group.
The application can also use cancelJobGroup to cancel all
running jobs in this group. For example,
Parameters:
Name | Type | Description |
---|---|---|
groupId |
string | |
description |
string | |
interruptOnCancel |
boolean |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
Example
// In the main thread:
sc.setJobGroup("some_job_to_cancel", "some job description")
sc.parallelize(1 to 10000, 2).map { i => Thread.sleep(10); i }.count()
// In a separate thread:
sc.cancelJobGroup("some_job_to_cancel")
If interruptOnCancel is set to true for the job group, then job cancellation will result
in Thread.interrupt() being called on the job's executor threads. This is useful to help ensure
that the tasks are actually stopped in a timely manner, but is off by default due to HDFS-1208,
where HDFS may respond to Thread.interrupt() by marking nodes as dead.
setLocalProperty(key, value) → {Promise.<Void>}
Set a local property that affects jobs submitted from this thread, such as the
Spark fair scheduler pool.
Parameters:
Name | Type | Description |
---|---|---|
key |
string | |
value |
string |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
setLogLevel(logLevel) → {Promise.<Void>}
Parameters:
Name | Type | Description |
---|---|---|
logLevel |
string | The desired log level as a string. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
statusTracker() → {SparkStatusTracker}
- Source:
Returns:
- Type
- SparkStatusTracker
textFile(path, minPartitionsopt) → {module:eclairjs/rdd.RDD}
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI,
and return it as an RDD of Strings.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string | path to file | |
minPartitions |
int |
<optional> |
- Source:
Returns:
wholeTextFiles(path, minPartitions) → {module:eclairjs/rdd.RDD}
Read a directory of text files from HDFS, a local file system (available on all nodes), or any
Hadoop-supported file system URI. Each file is read as a single record and returned in a
key-value pair, where the key is the path of each file, the value is the content of each file.
For example, if you have the following files:
Parameters:
Name | Type | Description |
---|---|---|
path |
string | Directory to the input data files, the path can be comma separated paths as the list of inputs. |
minPartitions |
number | A suggestion value of the minimal splitting number for input data. |
- Source:
Returns:
Examples
hdfs://a-hdfs-path/part-00000
hdfs://a-hdfs-path/part-00001
...
hdfs://a-hdfs-path/part-nnnnn
Do `val rdd = sparkContext.wholeTextFile("hdfs://a-hdfs-path")`,
<p> then `rdd` contains
(a-hdfs-path/part-00000, its content)
(a-hdfs-path/part-00001, its content)
...
(a-hdfs-path/part-nnnnn, its content)