Class: SparkSession

eclairjs/sql. SparkSession

The entry point to programming Spark with the Dataset and DataFrame API. In environments that this has been created upfront (e.g. REPL, notebooks), use the builder to get an existing session:

Constructor

new SparkSession()

Source:
Examples
SparkSession.builder().getOrCreate()
 

The builder can also be used to create a new session:
SparkSession.builder()
    .master("local")
    .appName("Word Count")
    .config("spark.some.config.option", "some-value").
    .getOrCreate()
 

Methods

baseRelationToDataFrame(baseRelation) → {DataFrame}

Convert a [[BaseRelation]] created for external data sources into a DataFrame.
Parameters:
Name Type Description
baseRelation module:eclairjs/sql/sources.BaseRelation
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
DataFrame

builder() → {module:eclairjs/sql.SparkSessionBuilder}

Creates a [[module:eclairjs/sql.SparkSessionBuilder]] for constructing a module:eclairjs/sql.SparkSession.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.SparkSessionBuilder

clearActiveSession()

Clears the active SparkSession for current thread. Subsequent calls to getOrCreate will return the first created context instead of a thread-local override.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:

clearDefaultSession()

Clears the default SparkSession that is returned by the builder.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:

createDataFrame(rowRDD_or_values, schema) → {module:eclairjs/sql.DataFrame}

Creates a Dataset from RDD of Rows using the schema
Parameters:
Name Type Description
rowRDD_or_values module:eclairjs.RDD.<module:eclairjs/sql.Row> | Array.<module:eclairjs/sql.Row> A RDD of Rows or array of arrays that contain values of valid DataTypes
schema module:eclairjs/sql/types.StructType -
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var df = sqlSession.createDataFrame([[1,1], [1,2], [2,1], [2,1], [2,3], [3,2], [3,3]], schema);

createDataFrameFromJson(schema) → {module:eclairjs/sql.Dataset}

Creates a Dataset from RDD of JSON
Parameters:
Name Type Description
{{module:eclairjs.RDD} RDD of JSON
schema object object with keys corresponding to JSON field names (or getter functions), and values indicating Datatype
Source:
Returns:
Type
module:eclairjs/sql.Dataset
Example
var df = sqlSession.createDataFrameFromJson([{id:1,"name":"jim"},{id:2,"name":"tom"}], {"id":"Integer","name","String"});

emptyDataset() → {module:eclairjs/sql.Dataset}

:: Experimental :: Creates a new Dataset of type T containing zero elements.
Source:
Returns:
2.0.0
Type
module:eclairjs/sql.Dataset

newSession() → {module:eclairjs/sql.SparkSession}

Start a new session with isolated SQL configurations, temporary tables, registered functions are isolated, but sharing the underlying SparkContext and cached data. Note: Other than the SparkContext, all shared state is initialized lazily. This method will force the initialization of the shared state to ensure that parent and child sessions are set up with the same shared state. If the underlying catalog implementation is Hive, this will initialize the metastore, which may take some time.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.SparkSession

range(tableName, start, end, stepopt, numPartitionsopt) → {module:eclairjs/sql.Dataset}

:: Experimental :: Creates a [[Dataset]] with a single LongType column named `id`, containing elements in a range from `start` to `end` (exclusive) with a step value, with partition number specified.
Parameters:
Name Type Attributes Description
tableName string
start number
end number
step number <optional>
numPartitions number <optional>
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

read() → {module:eclairjs/sql.DataFrameReader}

Returns a DataFrameReader that can be used to read non-streaming data in as a DataFrame.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader
Example
sparkSession.read.parquet("/path/to/file.parquet")
  sparkSession.read.schema(schema).json("/path/to/file.json")
 

readStream() → {module:eclairjs/sql/streaming.DataStreamReader}

:: Experimental :: Returns a [[DataStreamReader]] that can be used to read streaming data in as a DataFrame.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql/streaming.DataStreamReader
Example
sparkSession.readStream.parquet("/path/to/directory/of/parquet/files")
  sparkSession.readStream.schema(schema).json("/path/to/directory/of/json/files")
 

setActiveSession(session)

Changes the SparkSession that will be returned in this thread and its children when SparkSession.getOrCreate() is called. This can be used to ensure that a given thread receives a SparkSession with an isolated session, instead of the global (first created) context.
Parameters:
Name Type Description
session module:eclairjs/sql.SparkSession
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:

setDefaultSession(session)

Sets the default SparkSession that is returned by the builder.
Parameters:
Name Type Description
session module:eclairjs/sql.SparkSession
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:

sparkContext() → {module:eclairjs.SparkContext}

The underlying SparkContext.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs.SparkContext

sql(sqlText) → {module:eclairjs/sql.Dataset}

Executes a SQL query using Spark, returning the result as a module:eclairjs/sql.Dataset. The dialect that is used for SQL parsing can be configured with 'spark.sql.dialect'.
Parameters:
Name Type Description
sqlText string
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

stop()

Stop the underlying module:eclairjs.SparkContext.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:

streams() → {module:eclairjs/sql/streaming.StreamingQueryManager}

:: Experimental :: Returns a StreamingQueryManager that allows managing all the [[StreamingQuery StreamingQueries]] active on `this`.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql/streaming.StreamingQueryManager

table(tableName) → {module:eclairjs/sql.Dataset}

Returns the specified table as a module:eclairjs/sql.Dataset.
Parameters:
Name Type Description
tableName string
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

udf() → {module:eclairjs/sql.UDFRegistration}

A collection of methods for registering user-defined functions (UDF). Note that the user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.UDFRegistration

version() → {string}

The version of Spark on which this application is running.
Since:
  • EclairJS 0.6 Spark 2.0.0
Source:
Returns:
Type
string