Class: DataFrameReader

eclairjs/sql. DataFrameReader

Interface used to load a Dataset from external storage systems (e.g. file systems, key-value stores, etc). Use SQLContext.read to access this.

Constructor

new DataFrameReader()

Source:

Methods

csv() → {module:eclairjs/sql.Dataset}

Loads a CSV file and returns the result as a Dataset.
Source:
Returns:
Type
module:eclairjs/sql.Dataset

format(source) → {module:eclairjs/sql.DataFrameReader}

Specifies the input data source format.
Parameters:
Name Type Description
source string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.Dataset}

Construct a Dataset representing the database table accessible via JDBC URL
Parameters:
Name Type Description
url string
table string
connectionPropertiesMap|columnName|predicates object | string | Array.<string> If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition.
lowerBound|connectionPropertiesMap number | object If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
upperBound number the maximum value of `columnName` used to decide partition stride
numPartitions number the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions
connectionProperties object JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset
Example
// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];

// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);

// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);

// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the Dataset.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);

json(path) → {module:eclairjs/sql.Dataset}

Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a Dataset. If path this function goes through the input once to determine the input schema. If you know the schema in advance, use the version that specifies the schema to avoid the extra scan. If RDD unless the schema is specified using schema function, this function goes through the input once to determine the input schema.
Parameters:
Name Type Description
path string | module:eclairjs.RDD or RDD
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

load(pathopt) → {module:eclairjs/sql.Dataset}

Loads input in as a DataFrame
Parameters:
Name Type Attributes Description
path string <optional>
Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores).
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

option(keyOrMap) → {module:eclairjs/sql.DataFrameReader}

Adds an input option for the underlying data source.
Parameters:
Name Type Description
keyOrMap string | object If object, the object is expected to be a HashMap, the key of the map is type: 'String' The value must be of the following type: `String`.
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

options(map) → {module:eclairjs/sql.DataFrameReader}

Adds input options for the underlying data source.
Parameters:
Name Type Description
map Map
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

orc(…path) → {module:eclairjs/sql.Dataset}

Loads an ORC file and returns the result as a Dataset.
Parameters:
Name Type Attributes Description
path string <repeatable>
input path
Since:
  • EclairJS 0.1 Spark 1.5.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

parquet(…path) → {module:eclairjs/sql.Dataset}

Loads a Parquet file, returning the result as a Dataset. This function returns an empty Dataset if no paths are passed in.
Parameters:
Name Type Attributes Description
path string <repeatable>
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

schema(schema) → {module:eclairjs/sql.DataFrameReader}

Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema automatically from data. By specifying the schema here, the underlying data source can skip the schema inference step, and thus speed up data loading.
Parameters:
Name Type Description
schema module:eclairjs/sql/types.StructType
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.DataFrameReader

table(tableName) → {module:eclairjs/sql.Dataset}

Returns the specified table as a Dataset.
Parameters:
Name Type Description
tableName string
Since:
  • EclairJS 0.1 Spark 1.4.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset

text(…paths) → {module:eclairjs/sql.Dataset}

Loads a text file and returns a Dataset with a single string column named "value". Each line in the text file is a new row in the resulting Dataset. For example:
Parameters:
Name Type Attributes Description
paths string <repeatable>
input path
Since:
  • EclairJS 0.1 Spark 1.6.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset
Example
sqlContext.read().text("/path/to/spark/README.md")

textFile(path) → {module:eclairjs/sql.Dataset}

Loads text files and returns a Dataset of String. See the documentation on the other overloaded `textFile()` method for more details.
Parameters:
Name Type Description
path string
Since:
  • EclairJS 0.5 Spark 2.0.0
Source:
Returns:
Type
module:eclairjs/sql.Dataset