Methods
format(source) → {module:eclairjs/sql.DataFrameReader}
Specifies the input data source format.
Parameters:
Name | Type | Description |
---|---|---|
source |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.DataFrame}
Construct a DataFrame representing the database table accessible via JDBC URL
Parameters:
Name | Type | Description |
---|---|---|
url |
string | |
table |
string | |
connectionPropertiesMap|columnName|predicates |
object | string | Array.<string> | If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition. |
lowerBound|connectionPropertiesMap |
number | object | If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. |
upperBound |
number | the maximum value of `columnName` used to decide partition stride |
numPartitions |
number | the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions |
connectionProperties |
object | JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
Example
// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];
// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);
// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);
// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the DataFrame.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);
json(input) → {module:eclairjs/sql.DataFrame}
Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a DataFrame.
Parameters:
Name | Type | Description |
---|---|---|
input |
string | RDD |
- Source:
Returns:
load(pathopt) → {module:eclairjs/sql.DataFrame}
Loads input in as a DataFrame
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string |
<optional> |
Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores). |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
option(key, value) → {module:eclairjs/sql.DataFrameReader}
Adds an input option for the underlying data source.
Parameters:
Name | Type | Description |
---|---|---|
key |
string | |
value |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
options(options) → {module:eclairjs/sql.DataFrameReader}
Adds input options for the underlying data source.
Parameters:
Name | Type | Description |
---|---|---|
options |
Map |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
orc(path) → {module:eclairjs/sql.DataFrame}
Loads an ORC file and returns the result as a DataFrame.
Parameters:
Name | Type | Description |
---|---|---|
path |
string | input path |
- Since:
- EclairJS 0.1 Spark 1.5.0
- Source:
Returns:
parquet() → {module:eclairjs/sql.DataFrame}
Loads a Parquet file, returning the result as a DataFrame. This function returns an empty
DataFrame if no paths are passed in.
Parameters:
Name | Type | Description |
---|---|---|
paths,...paths |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
schema(schema) → {module:eclairjs/sql.DataFrameReader}
Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema
automatically from data. By specifying the schema here, the underlying data source can
skip the schema inference step, and thus speed up data loading.
Parameters:
Name | Type | Description |
---|---|---|
schema |
module:eclairjs/sql/types.StructType |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
table(tableName) → {module:eclairjs/sql.DataFrame}
Returns the specified table as a DataFrame.
Parameters:
Name | Type | Description |
---|---|---|
tableName |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source: