Class: DataFrame

eclairjs/sql.DataFrame

A distributed collection of data organized into named columns. A DataFrame is equivalent to a relational table in Spark SQL.

Constructor

new DataFrame()

Source:
Examples
var people = sqlContext.read.parquet("...")
// Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
// DataFrame (this class), Column, and functions.
// To select a column from the data frame:
var ageCol = people("age")

Methods

(static) show(rows, truncateopt)

Displays the DataFrame rows in a tabular form. The array of rows are the result of take(), etc
Parameters:
Name Type Attributes Description
rows Array.<module:spark/sql.Row>
truncate boolean <optional>
defaults to false, Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right
Source:

agg(hashMap) → {module:eclairjs/sql.DataFrame}

aggregates on the entire DataFrame without groups.
Parameters:
Name Type Description
hashMap hashMap hashMap exprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
// df.agg(...) is a shorthand for df.groupBy().agg(...)
var map = {};
map["age"] = "max";
map["salary"] = "avg";
df.agg(map)
df.groupBy().agg(map)

apply(colName) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.
Parameters:
Name Type Description
colName string
Source:
Returns:
Type
module:eclairjs/sql.Column

as(alias) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with an alias set.
Parameters:
Name Type Description
alias string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

cache() → {module:eclairjs/sql.DataFrame}

Persist this DataFrame with the default storage level (`MEMORY_ONLY`).
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

coalesce(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
Parameters:
Name Type Description
numPartitions integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

col(name) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column.
Parameters:
Name Type Description
name string
Source:
Returns:
Type
module:eclairjs/sql.Column

collect() → {Promise.<Array.<Row>>}

Returns an array that contains all of Rows in this DataFrame.
Source:
Returns:
A Promise that resolves to an array containing all Rows.
Type
Promise.<Array.<Row>>

columns() → {Promise.<Array.<string>>}

Returns all column names as an array.
Source:
Returns:
A Promise that resolves to an array containing all column names.
Type
Promise.<Array.<string>>

count() → {Promise.<integer>}

Returns the number of rows in the DataFrame.
Source:
Returns:
A Promise that resolves to the number of rows in the DataFrame.
Type
Promise.<integer>

cube() → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
Parameters:
Name Type Description
cols... string | Column
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Example
var df = dataFrame.cube("age", "expense");

describe() → {module:eclairjs/sql.DataFrame}

Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.
Parameters:
Name Type Description
cols.... string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var df = peopleDataFrame.describe("age", "expense");

distinct() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

drop(column) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column dropped.
Parameters:
Name Type Description
column string | Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

dropDuplicates(colNames) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame, if colNames then considering only the subset of columns.
Parameters:
Name Type Description
colNames Array.<string>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

dtypes() → {Promise.<Array>}

Returns all column names and their data types as an array of arrays. ex. [["name","StringType"],["age","IntegerType"],["expense","IntegerType"]]
Source:
Returns:
A Promise that resolves to an Array of Array[2].
Type
Promise.<Array>

except(otherDataFrame) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.
Parameters:
Name Type Description
otherDataFrame module:eclairjs/sql.DataFrame to compare to this DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

explain() → {Promise.<Void>}

Prints the plans (logical and physical) to the console for debugging purposes.
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

filter(column) → {module:eclairjs/sql.DataFrame}

Filters rows using the given SQL expression string or Filters rows using the given Column..
Parameters:
Name Type Description
column string | Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

first() → {module:eclairjs/sql.Row}

Returns the first row.
Source:
Returns:
Type
module:eclairjs/sql.Row

flatMap(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

foreach(func, bindArgsopt) → {Promise.<Void>}

Applies a function func to all rows.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

foreachPartition(func, bindArgsopt) → {Promise.<Void>}

Applies a function to each partition of this DataFrame.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

groupBy() → {module:eclairjs/sql.GroupedData}

Groups the DataFrame using the specified columns, so we can run aggregation on them
Parameters:
Type Description
Array.<string> | Array.<Column> Array of Column objects of column name strings
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Returns the first row.
Source:
Returns:
Type
Promise.<module:eclairjs/sql.Row>

inputFiles() → {Promise.<Array.<string>>}

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
Source:
Returns:
Promise which resolves to a list of files.
Type
Promise.<Array.<string>>

intersect(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL
Parameters:
Name Type Description
other module:eclairjs/sql.DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

isLocal() → {Promise.<boolean>}

Returns true if the collect and take methods can be run locally (without any Spark executors).
Source:
Returns:
Type
Promise.<boolean>

join(Right, columnNamesOrJoinExpropt, joinTypeopt) → {module:eclairjs/sql.DataFrame}

Cartesian join with another DataFrame. Note that cartesian joins are very expensive without an extra filter that can be pushed down.
Parameters:
Name Type Attributes Description
Right module:eclairjs/sql.DataFrame side of the join operation.
columnNamesOrJoinExpr string | Array.<string> | Column <optional>
If string or array of strings column names, inner equi-join with another DataFrame using the given columns. Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax. If Column object, joinExprs inner join with another DataFrame, using the given join expression.
joinType string <optional>
only valid if using Column joinExprs.
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var joinedDf = df1.join(df2);
// or
var joinedDf = df1.join(df2,"age");
// or
var joinedDf = df1.join(df2, ["age", "DOB"]);
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")));
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")), "outer");

limit(number) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.
Parameters:
Name Type Description
number integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

map(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Returns a new RDD by applying a function to all rows of this DataFrame.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

mapPartitions(func, bindArgsopt) → {module:eclairjs/rdd.RDD}

Return a new RDD by applying a function to each partition of this DataFrame. Similar to map, but runs separately on each partition (block) of the DataFrame, so func must accept an Array. func should return a array rather than a single item.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

na() → {module:eclairjs/sql.DataFrameNaFunctions}

Returns a DataFrameNaFunctions for working with missing data.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameNaFunctions

orderBy() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order. This is an alias of the sort function.
Parameters:
Name Type Description
columnName,...columnName string | Column or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

persist(newLevel) → {module:eclairjs/sql.DataFrame}

Parameters:
Name Type Description
newLevel module:eclairjs/storage.StorageLevel
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

printSchema() → {Promise.<Void>}

Prints the schema to the console in a nice tree format.
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

queryExecution() → {module:eclairjs/sql.SQLContextQueryExecution}

Source:
Returns:
Type
module:eclairjs/sql.SQLContextQueryExecution

randomSplit(weights, seed) → {Array.<DataFrame>}

Randomly splits this DataFrame with the provided weights.
Parameters:
Name Type Description
weights Array.<float> weights for splits, will be normalized if they don't sum to 1.
seed int Seed for sampling.
Source:
Returns:
Type
Array.<DataFrame>

rdd() → {module:eclairjs/rdd.RDD}

Represents the content of the DataFrame as an RDD of Rows.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

registerTempTable(tableName) → {Promise.<Void>}

Registers this DataFrame as a temporary table using the given name.
Parameters:
Name Type Description
tableName string
Source:
Returns:
A Promise that resolves when the temp table has been created.
Type
Promise.<Void>

repartition(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions.
Parameters:
Name Type Description
numPartitions integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

rollup(columnName,) → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
Parameters:
Name Type Description
columnName, string | Column .....columnName or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Example
var result = peopleDataFrame.rollup("age", "networth").count();
 // or
 var col = peopleDataFrame.col("age");
	var result = peopleDataFrame.rollup(col).count();

sample(withReplacement, fraction, seedopt) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by sampling a fraction of rows, using a random seed.
Parameters:
Name Type Attributes Description
withReplacement boolean
fraction float
seed integer <optional>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

schema() → {module:eclairjs/sql/types.StructType}

Returns the schema of this DataFrame.
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

select() → {module:eclairjs/sql.DataFrame}

Selects a set of column based expressions.
Parameters:
Type Description
Array.<module:eclairjs/sql.Column> | Array.<string>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

selectExpr() → {module:eclairjs/sql.DataFrame}

Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
Parameters:
Name Type Description
exprs,...exprs string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = peopleDataFrame.selectExpr("name", "age > 19");

show() → {Promise.<Void>}

Displays the top 20 rows of DataFrame in a tabular form.
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

sort() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.
Parameters:
Name Type Description
columnName,...columnName string | Column or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = peopleDataFrame.sort("age", "name");
 // or
 var col = peopleDataFrame.col("age");
	var colExpr = col.desc();
	var result = peopleDataFrame.sort(colExpr);

sqlContext() → {module:eclairjs/sql.SQLContext}

Returns SQLContext
Source:
Returns:
Type
module:eclairjs/sql.SQLContext

stat() → {module:eclairjs/sql.DataFrameStatFunctions}

Returns a DataFrameStatFunctions for working statistic functions support.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameStatFunctions
Example
var stat = peopleDataFrame.stat().cov("income", "networth");

take(num) → {Promise.<Array>}

Returns the first n rows in the DataFrame.
Parameters:
Name Type Description
num integer
Source:
Returns:
A Promise that resolves to an array containing the first num elements in this DataFrame.
Type
Promise.<Array>

toDF() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:
Parameters:
Name Type Description
colNames,...colNames string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = nameAgeDF.toDF("newName", "newAge");

toJSON() → {Promise.<Array.<object>>}

Returns the content of the DataFrame as a RDD of JSON strings.
Source:
Returns:
Type
Promise.<Array.<object>>

toRDD() → {module:eclairjs/rdd.RDD}

Returns a RDD object.
Source:
Returns:
Type
module:eclairjs/rdd.RDD

unionAll(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.
Parameters:
Name Type Description
other module:eclairjs/sql.DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

unpersist(blocking) → {Promise.<Void>}

Parameters:
Name Type Description
blocking boolean
Source:
Returns:
A Promise that resolves to nothing.
Type
Promise.<Void>

where(condition) → {module:eclairjs/sql.DataFrame}

Filters rows using the given Column or SQL expression.
Parameters:
Name Type Description
condition module:eclairjs/sql.Column | string .
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

withColumn(name, col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
Parameters:
Name Type Description
name string
col module:eclairjs/sql.Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var col = peopleDataFrame.col("age");
 var df1 = peopleDataFrame.withColumn("newCol", col);

write() → {module:eclairjs/sql.DataFrameWriter}

Interface for saving the content of the DataFrame out into external storage.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameWriter