JSDoc: Class: DataFrameWriter

new DataFrameWriter()

:: Experimental :: Interface used to write a DataFrame to external storage systems (e.g. file systems, key-value stores, etc). Use write to access this.

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 33

Methods

bucketBy(numBuckets, colName, …colNames) → {module:eclairjs/sql.DataFrameWriter}

Buckets the output by the given columns. If specified, the output is laid out on the file system similar to Hive's bucketing scheme. This is applicable for Parquet, JSON and ORC.

Parameters:

Name	Type	Attributes
`numBuckets`	number
`colName`	string
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 2.0

Source:

eclairjs/sql/DataFrameWriter.js, line 260

Returns:

Type: module:eclairjs/sql.DataFrameWriter

csv(path)

Saves the content of the DataFrame in CSV format at the specified path. This is equivalent to:

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.7 Spark 2.0.0

Source:

eclairjs/sql/DataFrameWriter.js, line 299

Example

format("csv").save(path)


You can set the following CSV-specific option(s) for writing CSV files:
<li>`sep` (default `,`): sets the single character as a separator for each
field and value.</li>
<li>`quote` (default `"`): sets the single character used for escaping quoted values where
the separator can be part of the value.</li>
<li>`escape` (default `\`): sets the single character used for escaping quotes inside
an already quoted value.</li>
<li>`escapeQuotes` (default `true`): a flag indicating whether values containing
quotes should always be enclosed in quotes. Default is to escape all values containing
a quote character.</li>
<li>`quoteAll` (default `false`): A flag indicating whether all values should always be
enclosed in quotes. Default is to only escape values containing a quote character.</li>
<li>`header` (default `false`): writes the names of columns as the first line.</li>
<li>`nullValue` (default empty string): sets the string representation of a null value.</li>
<li>`compression` (default `null`): compression codec to use when saving to file. This can be
one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, `lz4`,
`snappy` and `deflate`). </li>

format(source) → {module:eclairjs/sql.DataFrameWriter}

Specifies the underlying output data source. Built-in options include "parquet", "json", etc.

Parameters:

Name	Type	Description
`source`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 68

Returns:

Type: module:eclairjs/sql.DataFrameWriter

insertInto(tableName)

Inserts the content of the DataFrame to the specified table. It requires that the schema of the DataFrame is the same as the schema of the table. Because it inserts data to an existing table, format or options will be ignored.

Parameters:

Name	Type	Description
`tableName`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 149

jdbc(url, table, connectionProperties)

Saves the content of the DataFrame to a external database table via JDBC. In the case the table already exists in the external database, behavior of this function depends on the save mode, specified by the `mode` function (default to throwing an exception). Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash your external database systems.

Parameters:

Name	Type	Description
`url`	string	JDBC database url of the form `jdbc:subprotocol:subname`
`table`	string	Name of the table in the external database.
`connectionProperties`	object	JDBC database connection arguments, a list of arbitrary string tag/value. Normally at least a "user" and "password" property should be included.

Source:

eclairjs/sql/DataFrameWriter.js, line 192

json(path)

Saves the content of the DataFrame in JSON format at the specified path. This is equivalent to:

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 210

Example

format("json").save(path)

mode(saveMode) → {module:eclairjs/sql.DataFrameWriter}

Specifies the behavior when data or table already exists. Options include: - `overwrite`: overwrite the existing data. - `append`: append the data. - `ignore`: ignore the operation (i.e. no-op). - `error`: default option, throw an exception at runtime.

Parameters:

Name	Type	Description
`saveMode`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 55

Returns:

Type: module:eclairjs/sql.DataFrameWriter

option(keyOrMap, value) → {module:eclairjs/sql.DataFrameWriter}

Adds an output option for the underlying data source.

Parameters:

Name	Type	Description
`keyOrMap`	string \| object	If object, the object is expected to be a HashMap, the key of the map is type: 'String' The value must be of the following type: `String`.
`value`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 84

Returns:

Type: module:eclairjs/sql.DataFrameWriter

orc(path)

Saves the content of the DataFrame in ORC format at the specified path. This is equivalent to:

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.1 Spark 1.5.0

Source:

eclairjs/sql/DataFrameWriter.js, line 241

Example

format("orc").save(path)

parquet(path)

Saves the content of the DataFrame in Parquet format at the specified path. This is equivalent to:

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 225

Example

format("parquet").save(path)

partitionBy() → {module:eclairjs/sql.DataFrameWriter}

Partitions the output by the given columns on the file system. If specified, the output is laid out on the file system similar to Hive's partitioning scheme. This is only applicable for Parquet at the moment.

Parameters:

Name	Type	Description
`colName,...colName`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 106

Returns:

Type: module:eclairjs/sql.DataFrameWriter

saveAsTable(tableName)

Saves the content of the DataFrame as the specified table. In the case the table already exists, behavior of this function depends on the save mode, specified by the `mode` function (default to throwing an exception). When `mode` is `Overwrite`, the schema of the DataFrame does not need to be the same as that of the existing table. When `mode` is `Append`, the schema of the DataFrame need to be the same as that of the existing table, and format or options will be ignored. When the DataFrame is created from a non-partitioned HadoopFsRelation with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.

Parameters:

Name	Type	Description
`tableName`	string

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 173

savewithPath(pathopt)

Saves the content of the DataFrame as the specified table., unless path is specified.

Parameters:

Name	Type	Attributes	Description
`path`	string	<optional>	Saves the content of the DataFrame at the specified path.

Since:

EclairJS 0.1 Spark 1.4.0

Source:

eclairjs/sql/DataFrameWriter.js, line 131

sortBy(colName, …colNames) → {module:eclairjs/sql.DataFrameWriter}

Sorts the output in each bucket by the given columns. This is applicable for Parquet, JSON and ORC.

Parameters:

Name	Type	Attributes	Description
`colName`	string
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 2.0

Source:

eclairjs/sql/DataFrameWriter.js, line 281

Returns:

Type: module:eclairjs/sql.DataFrameWriter

text(path)

Saves the content of the DataFrame in a text file at the specified path. The DataFrame must have only one column that is of string type. Each row becomes a new line in the output file. For example:

Parameters:

Name	Type	Description
`path`	string

Since:

EclairJS 0.1 Spark 1.6.0

Source:

eclairjs/sql/DataFrameWriter.js, line 256

Example

df.write().text("/path/to/output")