Methods
csv() → {module:eclairjs/sql.Dataset}
Loads a CSV file and returns the result as a Dataset.
- Source:
Returns:
format(source) → {module:eclairjs/sql.DataFrameReader}
Specifies the input data source format.
Parameters:
Name | Type | Description |
---|---|---|
source |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
jdbc(url, table, connectionPropertiesMap|columnName|predicates, lowerBound|connectionPropertiesMap, upperBound, numPartitions, connectionProperties) → {module:eclairjs/sql.Dataset}
Construct a Dataset representing the database table accessible via JDBC URL
Parameters:
Name | Type | Description |
---|---|---|
url |
string | |
table |
string | |
connectionPropertiesMap|columnName|predicates |
object | string | Array.<string> | If connectionPropertiesMap connectionProperties JDBC database connection arguments, a map of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. If columnName the name of a column of integral type that will be used for partitioning. If predicates Condition in the where clause for each partition. |
lowerBound|connectionPropertiesMap |
number | object | If lowerBound the minimum value of `columnName` used to decide partition stride If connectionProperties JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. |
upperBound |
number | the maximum value of `columnName` used to decide partition stride |
numPartitions |
number | the number of partitions. the range `minValue`-`maxValue` will be split evenly into this many partitions |
connectionProperties |
object | JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included. |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
Example
// url named table and connection properties.
var url="jdbc:mysql://localhost:3306/eclairjstesting";
var table = "people";
var connectionProperties = {"user" : "root", "password": "mypassword"};
var predicates = ["age > 20"];
// url named table and connection properties.
var peopleDF = sqlContext.read().jdbc(url, table, connectionProperties);
// or
// Partitions of the table will be retrieved in parallel based on the parameters
// passed to this function.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
//your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,columnName,lowerBound,upperBound,numPartitions,connectionProperties);
// or
// url named table using connection properties. The `predicates` parameter gives a list
// expressions suitable for inclusion in WHERE clauses; each one defines one partition of the Dataset.
// Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash
// your external database systems.
var peopleDF = sqlContext.read().jdbc(url,table,predicates,connectionProperties);
json(path) → {module:eclairjs/sql.Dataset}
Loads a JSON file, or RDD[String] storing JSON objects (one object per line) and returns the result as a Dataset.
If path this function goes through the input once to determine the input schema. If you know the
schema in advance, use the version that specifies the schema to avoid the extra scan.
If RDD unless the schema is specified using schema function, this function goes through the
input once to determine the input schema.
Parameters:
Name | Type | Description |
---|---|---|
path |
string | module:eclairjs.RDD | or RDD |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
load(pathopt) → {module:eclairjs/sql.Dataset}
Loads input in as a DataFrame
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string |
<optional> |
Loads data sources that require a path (e.g. data backed by a local or distributed file system). If not specified loads data sources that don't require a path (e.g. external key-value stores). |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
option(keyOrMap) → {module:eclairjs/sql.DataFrameReader}
Adds an input option for the underlying data source.
Parameters:
Name | Type | Description |
---|---|---|
keyOrMap |
string | object | If object, the object is expected to be a HashMap, the key of the map is type: 'String' The value must be of the following type: `String`. |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
options(map) → {module:eclairjs/sql.DataFrameReader}
Adds input options for the underlying data source.
Parameters:
Name | Type | Description |
---|---|---|
map |
Map |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
orc(…path) → {module:eclairjs/sql.Dataset}
Loads an ORC file and returns the result as a Dataset.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string |
<repeatable> |
input path |
- Since:
- EclairJS 0.1 Spark 1.5.0
- Source:
Returns:
parquet(…path) → {module:eclairjs/sql.Dataset}
Loads a Parquet file, returning the result as a Dataset. This function returns an empty
Dataset if no paths are passed in.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
path |
string |
<repeatable> |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
schema(schema) → {module:eclairjs/sql.DataFrameReader}
Specifies the input schema. Some data sources (e.g. JSON) can infer the input schema
automatically from data. By specifying the schema here, the underlying data source can
skip the schema inference step, and thus speed up data loading.
Parameters:
Name | Type | Description |
---|---|---|
schema |
module:eclairjs/sql/types.StructType |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
table(tableName) → {module:eclairjs/sql.Dataset}
Returns the specified table as a Dataset.
Parameters:
Name | Type | Description |
---|---|---|
tableName |
string |
- Since:
- EclairJS 0.1 Spark 1.4.0
- Source:
Returns:
text(…paths) → {module:eclairjs/sql.Dataset}
Loads a text file and returns a Dataset with a single string column named "value".
Each line in the text file is a new row in the resulting Dataset. For example:
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
paths |
string |
<repeatable> |
input path |
- Since:
- EclairJS 0.1 Spark 1.6.0
- Source:
Returns:
Example
sqlContext.read().text("/path/to/spark/README.md")
textFile(path) → {module:eclairjs/sql.Dataset}
Loads text files and returns a Dataset of String. See the documentation on the
other overloaded `textFile()` method for more details.
Parameters:
Name | Type | Description |
---|---|---|
path |
string |
- Since:
- EclairJS 0.5 Spark 2.0.0
- Source: