JSDoc: Class: DataFrameReader

Constructor

new DataFrameReader()

Source:

sql/DataFrameReader.js, line 27

Methods

format(source) → {module:eclairjs/sql.DataFrameReader}

Specifies the input data source format.

Parameters:

Name	Type	Description
`source`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 39

Returns:

Type: module:eclairjs/sql.DataFrameReader

jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.DataFrame}

Construct a DataFrame representing the database table accessible via JDBC URL

Parameters:

Name	Type	Description
`url`	string
`table`	string
`connectionPropertiesMap\|columnName\|predicates`	object \| string \| Array.<string>	If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition.
`lowerBound\|connectionPropertiesMap`	number \| object	If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
`upperBound`	number	the maximum value of `columnName` used to decide partition stride
`numPartitions`	number	the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions
`connectionProperties`	object	JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 174

Returns:

Type: module:eclairjs/sql.DataFrame

Example

// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];

// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);

// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);

// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);

json(input) → {module:eclairjs/sql.DataFrame}

Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a DataFrame.

Parameters:

Name	Type	Description
`input`	string \| RDD

Source:

sql/DataFrameReader.js, line 190

Returns:

Type: module:eclairjs/sql.DataFrame

load(pathopt) → {module:eclairjs/sql.DataFrame}

Loads input in as a DataFrame

Parameters:

Name	Type	Attributes	Description
`path`	string	<optional>	Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores).

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 116

Returns:

Type: module:eclairjs/sql.DataFrame

option(key, value) → {module:eclairjs/sql.DataFrameReader}

Adds an input option for the underlying data source.

Parameters:

Name	Type	Description
`key`	string
`value`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 78

Returns:

Type: module:eclairjs/sql.DataFrameReader

options(options) → {module:eclairjs/sql.DataFrameReader}

Adds input options for the underlying data source.

Parameters:

Name	Type	Description
`options`	Map

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 96

Returns:

Type: module:eclairjs/sql.DataFrameReader

orc(path) → {module:eclairjs/sql.DataFrame}

Loads an ORC file and returns the result as a DataFrame.

Parameters:

Name	Type	Description
`path`	string	input path

Since:

EclairJS 0.1 Spark 1.5.0

Source:

sql/DataFrameReader.js, line 228

Returns:

Type: module:eclairjs/sql.DataFrame

parquet() → {module:eclairjs/sql.DataFrame}

Loads a Parquet file, returning the result as a DataFrame. This function returns an empty DataFrame if no paths are passed in.

Parameters:

Name	Type	Description
`paths,...paths`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

sql/DataFrameReader.js, line 209

Returns:

Type: module:eclairjs/sql.DataFrame

schema(schema) → {module:eclairjs/sql.DataFrameReader}

Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema automatically from data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.