JSDoc: Class: DataFrame

Constructor

new DataFrame()

Source:

sql/DataFrame.js, line 58

Examples

var people = sqlContext.read.parquet("...")

// Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
// DataFrame (this class), Column, and functions.
// To select a column from the data frame:
var ageCol = people("age")

Methods

(static) show(rows, truncateopt)

Displays the DataFrame rows in a tabular form. The array of rows are the result of take(), etc

Parameters:

Name	Type	Attributes	Description
`rows`	Array.<module:spark/sql.Row>
`truncate`	boolean	<optional>	defaults to false, Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right

Source:

sql/DataFrame.js, line 1193

agg(hashMap) → {module:eclairjs/sql.DataFrame}

aggregates on the entire DataFrame without groups.

Parameters:

Name	Type	Description
`hashMap`	hashMap	hashMap exprs

Source:

sql/DataFrame.js, line 75

Returns:

Type: module:eclairjs/sql.DataFrame

Example

// df.agg(...) is a shorthand for df.groupBy().agg(...)
var map = {};
map["age"] = "max";
map["salary"] = "avg";
df.agg(map)
df.groupBy().agg(map)

apply(colName) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.

Parameters:

Name	Type	Description
`colName`	string

Source:

sql/DataFrame.js, line 112

Returns:

Type: module:eclairjs/sql.Column

as(alias) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with an alias set.

Parameters:

Name	Type	Description
`alias`	string

Source:

sql/DataFrame.js, line 93

Returns:

Type: module:eclairjs/sql.DataFrame

cache() → {module:eclairjs/sql.DataFrame}

Persist this DataFrame with the default storage level (`MEMORY_ONLY`).

Source:

sql/DataFrame.js, line 129

Returns:

Type: module:eclairjs/sql.DataFrame

coalesce(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

Parameters:

Name	Type	Description
`numPartitions`	integer

Source:

sql/DataFrame.js, line 147

Returns:

Type: module:eclairjs/sql.DataFrame

col(name) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column.

Parameters:

Name	Type	Description
`name`	string

Source:

sql/DataFrame.js, line 165

Returns:

Type: module:eclairjs/sql.Column

collect() → {Promise.<Array.<Row>>}

Returns an array that contains all of Rows in this DataFrame.

Source:

sql/DataFrame.js, line 182

Returns:

A Promise that resolves to an array containing all Rows.

Type: Promise.<Array.<Row>>

columns() → {Promise.<Array.<string>>}

Returns all column names as an array.

Source:

sql/DataFrame.js, line 198

Returns:

A Promise that resolves to an array containing all column names.

Type: Promise.<Array.<string>>

count() → {Promise.<integer>}

Returns the number of rows in the DataFrame.

Source:

sql/DataFrame.js, line 213

Returns:

A Promise that resolves to the number of rows in the DataFrame.

Type: Promise.<integer>

cube() → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.

Parameters:

Name	Type	Description
`cols...`	string \| Column

Source:

sql/DataFrame.js, line 230

Returns:

Type: module:eclairjs/sql.GroupedData

Example

var df = dataFrame.cube("age", "expense");

describe() → {module:eclairjs/sql.DataFrame}

Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.

Parameters:

Name	Type	Description
`cols....`	string

Source:

sql/DataFrame.js, line 256

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var df = peopleDataFrame.describe("age", "expense");

distinct() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.

Source:

sql/DataFrame.js, line 273

Returns:

Type: module:eclairjs/sql.DataFrame

drop(column) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column dropped.

Parameters:

Name	Type	Description
`column`	string \| Column

Source:

sql/DataFrame.js, line 288

Returns:

Type: module:eclairjs/sql.DataFrame

dropDuplicates(colNames) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame, if colNames then considering only the subset of columns.

Parameters:

Name	Type	Description
`colNames`	Array.<string>

Source:

sql/DataFrame.js, line 306

Returns:

Type: module:eclairjs/sql.DataFrame

dtypes() → {Promise.<Array>}

Returns all column names and their data types as an array of arrays. ex. [["name","StringType"],["age","IntegerType"],["expense","IntegerType"]]

Source:

sql/DataFrame.js, line 321

Returns:

A Promise that resolves to an Array of Array[2].

Type: Promise.<Array>

except(otherDataFrame) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.

Parameters:

Name	Type	Description
`otherDataFrame`	module:eclairjs/sql.DataFrame	to compare to this DataFrame

Source:

sql/DataFrame.js, line 348

Returns:

Type: module:eclairjs/sql.DataFrame

explain() → {Promise.<Void>}

Prints the plans (logical and physical) to the console for debugging purposes.

Source:

sql/DataFrame.js, line 364

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

filter(column) → {module:eclairjs/sql.DataFrame}

Filters rows using the given SQL expression string or Filters rows using the given Column..

Parameters:

Name	Type	Description
`column`	string \| Column

Source:

sql/DataFrame.js, line 380

Returns:

Type: module:eclairjs/sql.DataFrame

first() → {module:eclairjs/sql.Row}

Returns the first row.

Source:

sql/DataFrame.js, line 395

Returns:

Type: module:eclairjs/sql.Row

flatMap(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

sql/DataFrame.js, line 413

Returns:

Type: module:eclairjs/rdd.RDD

foreach(func, bindArgsopt) → {Promise.<Void>}

Applies a function func to all rows.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

sql/DataFrame.js, line 433

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

foreachPartition(func, bindArgsopt) → {Promise.<Void>}

Applies a function to each partition of this DataFrame.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

sql/DataFrame.js, line 453

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

groupBy() → {module:eclairjs/sql.GroupedData}

Groups the DataFrame using the specified columns, so we can run aggregation on them

Parameters:

Type	Description
Array.<string> \| Array.<Column>	Array of Column objects of column name strings

Source:

sql/DataFrame.js, line 472

Returns:

Type: module:eclairjs/sql.GroupedData

head() → {Promise.<module:eclairjs/sql.Row>}

Returns the first row.

Source:

sql/DataFrame.js, line 491

Returns:

Type: Promise.<module:eclairjs/sql.Row>

inputFiles() → {Promise.<Array.<string>>}

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

Source:

sql/DataFrame.js, line 509

Returns:

Promise which resolves to a list of files.

Type: Promise.<Array.<string>>

intersect(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL

Parameters:

Name	Type	Description
`other`	module:eclairjs/sql.DataFrame

Source:

sql/DataFrame.js, line 535

Returns:

Type: module:eclairjs/sql.DataFrame

isLocal() → {Promise.<boolean>}

Returns true if the collect and take methods can be run locally (without any Spark executors).

Source:

sql/DataFrame.js, line 550

Returns:

Type: Promise.<boolean>

join(Right, columnNamesOrJoinExpropt, joinTypeopt) → {module:eclairjs/sql.DataFrame}

Cartesian join with another DataFrame. Note that cartesian joins are very expensive without an extra filter that can be pushed down.

Parameters:

Name	Type	Attributes	Description
`Right`	module:eclairjs/sql.DataFrame		side of the join operation.
`columnNamesOrJoinExpr`	string \| Array.<string> \| Column	<optional>	If string or array of strings column names, inner equi-join with another DataFrame using the given columns. Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax. If Column object, joinExprs inner join with another DataFrame, using the given join expression.
`joinType`	string	<optional>	only valid if using Column joinExprs.

Source:

sql/DataFrame.js, line 579

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var joinedDf = df1.join(df2);
// or
var joinedDf = df1.join(df2,"age");
// or
var joinedDf = df1.join(df2, ["age", "DOB"]);
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")));
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")), "outer");

limit(number) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.

Parameters:

Name	Type	Description
`number`	integer

Source:

sql/DataFrame.js, line 616

Returns:

Type: module:eclairjs/sql.DataFrame

map(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Returns a new RDD by applying a function to all rows of this DataFrame.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

sql/DataFrame.js, line 633

Returns:

Type: module:eclairjs/rdd.RDD

mapPartitions(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Return a new RDD by applying a function to each partition of this DataFrame. Similar to map, but runs separately on each partition (block) of the DataFrame, so func must accept an Array. func should return a array rather than a single item.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

sql/DataFrame.js, line 655

Returns:

Type: module:eclairjs/rdd.RDD

na() → {module:eclairjs/sql.DataFrameNaFunctions}

Returns a DataFrameNaFunctions for working with missing data.

Source:

sql/DataFrame.js, line 673

Returns:

Type: module:eclairjs/sql.DataFrameNaFunctions

orderBy() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order. This is an alias of the sort function.

Parameters:

Name	Type	Description
`columnName,...columnName`	string \| Column	or sortExprs,... sortExprs

Source:

sql/DataFrame.js, line 691

Returns:

Type: module:eclairjs/sql.DataFrame

persist(newLevel) → {module:eclairjs/sql.DataFrame}

Parameters:

Name	Type	Description
`newLevel`	module:eclairjs/storage.StorageLevel

Source:

sql/DataFrame.js, line 708

Returns:

Type: module:eclairjs/sql.DataFrame

printSchema() → {Promise.<Void>}

Prints the schema to the console in a nice tree format.

Source:

sql/DataFrame.js, line 725

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

queryExecution() → {module:eclairjs/sql.SQLContextQueryExecution}

Source:

sql/DataFrame.js, line 738

Returns:

Type: module:eclairjs/sql.SQLContextQueryExecution

randomSplit(weights, seed) → {Array.<DataFrame>}

Randomly splits this DataFrame with the provided weights.

Parameters:

Name	Type	Description
`weights`	Array.<float>	weights for splits, will be normalized if they don't sum to 1.
`seed`	int	Seed for sampling.

Source:

sql/DataFrame.js, line 756

Returns:

Type: Array.<DataFrame>

rdd() → {module:eclairjs/rdd.RDD}

Represents the content of the DataFrame as an RDD of Rows.

Source:

sql/DataFrame.js, line 774

Returns:

Type: module:eclairjs/rdd.RDD

registerTempTable(tableName) → {Promise.<Void>}

Registers this DataFrame as a temporary table using the given name.

Parameters:

Name	Type	Description
`tableName`	string

Source:

sql/DataFrame.js, line 789

Returns:

A Promise that resolves when the temp table has been created.

Type: Promise.<Void>

repartition(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions.

Parameters:

Name	Type	Description
`numPartitions`	integer

Source:

sql/DataFrame.js, line 807

Returns:

Type: module:eclairjs/sql.DataFrame

rollup(columnName,) → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.

Parameters:

Name	Type	Description
`columnName,`	string \| Column	.....columnName or sortExprs,... sortExprs

Source:

sql/DataFrame.js, line 831

Returns:

Type: module:eclairjs/sql.GroupedData

Example

var result = peopleDataFrame.rollup("age", "networth").count();
 // or
 var col = peopleDataFrame.col("age");
	var result = peopleDataFrame.rollup(col).count();

sample(withReplacement, fraction, seedopt) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by sampling a fraction of rows, using a random seed.

Parameters:

Name	Type	Attributes
`withReplacement`	boolean
`fraction`	float
`seed`	integer	<optional>

Source:

sql/DataFrame.js, line 853

Returns:

Type: module:eclairjs/sql.DataFrame

schema() → {module:eclairjs/sql/types.StructType}

Returns the schema of this DataFrame.

Source:

sql/DataFrame.js, line 872

Returns:

Type: module:eclairjs/sql/types.StructType

select() → {module:eclairjs/sql.DataFrame}

Selects a set of column based expressions.

Parameters:

Type	Description
Array.<module:eclairjs/sql.Column> \| Array.<string>

Source:

sql/DataFrame.js, line 889

Returns:

Type: module:eclairjs/sql.DataFrame

selectExpr() → {module:eclairjs/sql.DataFrame}

Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.

Parameters:

Name	Type	Description
`exprs,...exprs`	string

Source:

sql/DataFrame.js, line 909

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = peopleDataFrame.selectExpr("name", "age > 19");

show() → {Promise.<Void>}

Displays the top 20 rows of DataFrame in a tabular form.

Source:

sql/DataFrame.js, line 927

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

sort() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.

Parameters:

Name	Type	Description
`columnName,...columnName`	string \| Column	or sortExprs,... sortExprs

Source:

sql/DataFrame.js, line 948

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = peopleDataFrame.sort("age", "name");
 // or
 var col = peopleDataFrame.col("age");
	var colExpr = col.desc();
	var result = peopleDataFrame.sort(colExpr);

sqlContext() → {module:eclairjs/sql.SQLContext}

Returns SQLContext

Source:

sql/DataFrame.js, line 965

Returns:

Type: module:eclairjs/sql.SQLContext

stat() → {module:eclairjs/sql.DataFrameStatFunctions}

Returns a DataFrameStatFunctions for working statistic functions support.

Source:

sql/DataFrame.js, line 979

Returns:

Type: module:eclairjs/sql.DataFrameStatFunctions

Example

var stat = peopleDataFrame.stat().cov("income", "networth");

take(num) → {Promise.<Array>}

Returns the first n rows in the DataFrame.

Parameters:

Name	Type	Description
`num`	integer

Source:

sql/DataFrame.js, line 996

Returns:

A Promise that resolves to an array containing the first num elements in this DataFrame.

Type: Promise.<Array>

toDF() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:

Parameters:

Name	Type	Description
`colNames,...colNames`	string

Source:

sql/DataFrame.js, line 1019

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = nameAgeDF.toDF("newName", "newAge");

toJSON() → {Promise.<Array.<object>>}

Returns the content of the DataFrame as a RDD of JSON strings.

Source:

sql/DataFrame.js, line 1036

Returns:

Type: Promise.<Array.<object>>

toRDD() → {module:eclairjs/rdd.RDD}

Returns a RDD object.

Source:

sql/DataFrame.js, line 1062

Returns:

Type: module:eclairjs/rdd.RDD

unionAll(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.

Parameters:

Name	Type	Description
`other`	module:eclairjs/sql.DataFrame

Source:

sql/DataFrame.js, line 1087

Returns:

Type: module:eclairjs/sql.DataFrame

unpersist(blocking) → {Promise.<Void>}

Parameters:

Name	Type	Description
`blocking`	boolean

Source:

sql/DataFrame.js, line 1102

Returns:

A Promise that resolves to nothing.

Type: Promise.<Void>

where(condition) → {module:eclairjs/sql.DataFrame}

Filters rows using the given Column or SQL expression.

Parameters:

Name	Type	Description
`condition`	module:eclairjs/sql.Column \| string	.

Source:

sql/DataFrame.js, line 1118

Returns:

Type: module:eclairjs/sql.DataFrame

withColumn(name, col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.

Parameters:

Name	Type	Description
`name`	string
`col`	module:eclairjs/sql.Column

Source:

sql/DataFrame.js, line 1138

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var col = peopleDataFrame.col("age");
 var df1 = peopleDataFrame.withColumn("newCol", col);

write() → {module:eclairjs/sql.DataFrameWriter}

Interface for saving the content of the DataFrame out into external storage.

Source:

sql/DataFrame.js, line 1153

Returns:

Type: module:eclairjs/sql.DataFrameWriter