Class: DataFrameReader

eclairjs/sql.DataFrameReader

Interface used to load a DataFrame from external storage systems (e.g. file systems, key-value stores, etc). Use SQLContext.read to access this.

Constructor

new DataFrameReader()

Source:

Methods

format(source) → {module:eclairjs/sql.DataFrameReader}

Specifies the input data source format.
Parameters:
Name Type Description
source string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.DataFrame}

Construct a DataFrame representing the database table accessible via JDBC URL
Parameters:
Name Type Description
url string
table string
connectionPropertiesMap|columnName|predicates object | string | Array.<string> If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition.
lowerBound|connectionPropertiesMap number | object If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
upperBound number the maximum value of `columnName` used to decide partition stride
numPartitions number the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions
connectionProperties object JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];

// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);

// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);

// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);

json(input) → {module:eclairjs/sql.DataFrame}

Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a DataFrame.
Parameters:
Name Type Description
input string | RDD
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

load(pathopt) → {module:eclairjs/sql.DataFrame}

Loads input in as a DataFrame
Parameters:
Name Type Attributes Description
path string <optional>
Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores).
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

option(key, value) → {module:eclairjs/sql.DataFrameReader}

Adds an input option for the underlying data source.
Parameters:
Name Type Description
key string
value string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

options(options) → {module:eclairjs/sql.DataFrameReader}

Adds input options for the underlying data source.
Parameters:
Name Type Description
options Map
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

orc(path) → {module:eclairjs/sql.DataFrame}

Loads an ORC file and returns the result as a DataFrame.
Parameters:
Name Type Description
path string input path
Since:
  • EclairJS 0.1 Spark 1.5.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

parquet() → {module:eclairjs/sql.DataFrame}

Loads a Parquet file, returning the result as a DataFrame. This function returns an empty DataFrame if no paths are passed in.
Parameters:
Name Type Description
paths,...paths string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

schema(schema) → {module:eclairjs/sql.DataFrameReader}

Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema automatically from data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.
Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

table(tableName) → {module:eclairjs/sql.DataFrame}

Returns the specified table as a DataFrame.
Parameters:
Name Type Description
tableName string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

textFile(path) → {module:eclairjs/sql.Dataset}

Loads text files and returns a Dataset of String. See the documentation on the other overloaded `textFile()` method for more details.
Parameters:
Name Type Description
path string
Since:
  • EclairJS 0.5 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset