JSDoc: Class: DataFrameReader

Constructor

new DataFrameReader()

Source:

eclairjs/sql/DataFrameReader.js, line 29

Methods

csv() → {module:eclairjs/sql.Dataset}

Loads a CSV file and returns the result as a Dataset.

Source:

eclairjs/sql/DataFrameReader.js, line 287

Returns:

Type: module:eclairjs/sql.Dataset

format(source) → {module:eclairjs/sql.DataFrameReader}

Specifies the input data source format.

Parameters:

Name	Type	Description
`source`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 47

Returns:

Type: module:eclairjs/sql.DataFrameReader

jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.Dataset}

Construct a Dataset representing the database table accessible via JDBC URL

Parameters:

Name	Type	Description
`url`	string
`table`	string
`connectionPropertiesMap\|columnName\|predicates`	object \| string \| Array.<string>	If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition.
`lowerBound\|connectionPropertiesMap`	number \| object	If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
`upperBound`	number	the maximum value of `columnName` used to decide partition stride
`numPartitions`	number	the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions
`connectionProperties`	object	JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 169

Returns:

Type: module:eclairjs/sql.Dataset

Example

// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];

// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);

// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);

// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the Dataset.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);

json(path) → {module:eclairjs/sql.Dataset}

Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a Dataset. If path this function goes through the input once to determine the input schema. If you know the schema in advance, use the version that specifies the schema to avoid the extra scan. If RDD unless the schema is specified using schema function, this function goes through the input once to determine the input schema.

Parameters:

Name	Type	Description
`path`	string \| module:eclairjs.RDD	or RDD

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 200

Returns:

Type: module:eclairjs/sql.Dataset

load(pathopt) → {module:eclairjs/sql.Dataset}

Loads input in as a DataFrame

Parameters:

Name	Type	Attributes	Description
`path`	string	<optional>	Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores).

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 111

Returns:

Type: module:eclairjs/sql.Dataset

option(keyOrMap) → {module:eclairjs/sql.DataFrameReader}

Adds an input option for the underlying data source.

Parameters:

Name	Type	Description
`keyOrMap`	string \| object	If object, the object is expected to be a HashMap, the key of the map is type: 'String' The value must be of the following type: `String`.

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 78

Returns:

Type: module:eclairjs/sql.DataFrameReader

options(map) → {module:eclairjs/sql.DataFrameReader}

Adds input options for the underlying data source.

Parameters:

Name	Type	Description
`map`	Map

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 96

Returns:

Type: module:eclairjs/sql.DataFrameReader

orc(…path) → {module:eclairjs/sql.Dataset}

Loads an ORC file and returns the result as a Dataset.

Parameters:

Name	Type	Attributes	Description
`path`	string	<repeatable>	input path

Since:

EclairJS 0.1 Spark 1.5.0

Source:

eclairjs/sql/DataFrameReader.js, line 233

Returns:

Type: module:eclairjs/sql.Dataset

parquet(…path) → {module:eclairjs/sql.Dataset}

Loads a Parquet file, returning the result as a Dataset. This function returns an empty Dataset if no paths are passed in.

Parameters:

Name	Type	Attributes	Description
`path`	string	<repeatable>

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 219

Returns:

Type: module:eclairjs/sql.Dataset

schema(schema) → {module:eclairjs/sql.DataFrameReader}

Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema automatically from data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.

Parameters:

Name	Type	Description
`schema`	module:eclairjs/sql/types.StructType

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 62

Returns:

Type: module:eclairjs/sql.DataFrameReader

table(tableName) → {module:eclairjs/sql.Dataset}

Returns the specified table as a Dataset.

Parameters:

Name	Type	Description
`tableName`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameReader.js, line 247

Returns:

Type: module:eclairjs/sql.Dataset

text(…paths) → {module:eclairjs/sql.Dataset}

Loads a text file and returns a Dataset with a single string column named "value". Each line in the text file is a new row in the resulting Dataset. For example:

Parameters:

Name	Type	Attributes	Description
`paths`	string	<repeatable>	input path

Since:

EclairJS 0.1 Spark 1.6.0

Source:

eclairjs/sql/DataFrameReader.js, line 264

Returns:

Type: module:eclairjs/sql.Dataset

Example

sqlContext.read().text("/path/to/spark/README.md")

textFile(path) → {module:eclairjs/sql.Dataset}

Loads text files and returns a Dataset of String. See the documentation on the other overloaded `textFile()` method for more details.

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.5 Spark 2.0.0

Source:

eclairjs/sql/DataFrameReader.js, line 277

Returns:

Type: module:eclairjs/sql.Dataset