Class: DataFrame

eclairjs/sql. DataFrame

A distributed collection of data organized into named columns. A DataFrame is equivalent to a relational table in Spark SQL.

Constructor

new DataFrame()

Source:
Examples
var people = sqlContext.read.parquet("...")
// Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
// DataFrame (this class), Column, and functions.
// To select a column from the data frame:
var ageCol = people("age")

Methods

agg(hashMap) → {module:eclairjs/sql.DataFrame}

aggregates on the entire DataFrame without groups.
Parameters:
Name Type Description
hashMap hashMap hashMap exprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
// df.agg(...) is a shorthand for df.groupBy().agg(...)
var map = {};
map["age"] = "max";
map["salary"] = "avg";
df.agg(map)
df.groupBy().agg(map)

apply(colName) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.
Parameters:
Name Type Description
colName string
Source:
Returns:
Type
module:eclairjs/sql.Column

as(alias) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with an alias set.
Parameters:
Name Type Description
alias string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

cache() → {module:eclairjs/sql.DataFrame}

Persist this DataFrame with the default storage level (`MEMORY_ONLY`).
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

coalesce(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.
Parameters:
Name Type Description
numPartitions integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

col(name) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column.
Parameters:
Name Type Description
name string
Source:
Returns:
Type
module:eclairjs/sql.Column

collect() → {Array.<module:eclairjs/sql.Row>}

Returns an array that contains all of Rows in this DataFrame.
Source:
Returns:
Type
Array.<module:eclairjs/sql.Row>

columns() → {Array.<string>}

Returns all column names as an array.
Source:
Returns:
Type
Array.<string>

count() → {integer}

Returns the number of rows in the DataFrame.
Source:
Returns:
Type
integer

cube() → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
Parameters:
Name Type Description
cols... string | Column
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Example
var df = peopleDataFrame.cube("age", "expense");

describe() → {module:eclairjs/sql.DataFrame}

Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.
Parameters:
Name Type Description
cols.... string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var df = peopleDataFrame.describe("age", "expense");

distinct()

Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.
Source:

drop(col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column dropped.
Parameters:
Name Type Description
col string | module:eclairjs/sql.Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

dropDuplicates(colNames) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame, if colNames then considering only the subset of columns.
Parameters:
Name Type Description
colNames Array.<string>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

dtypes() → {Array}

Returns all column names and their data types as an array of arrays. ex. [["name","StringType"],["age","IntegerType"],["expense","IntegerType"]]
Source:
Returns:
Array of Array[2]
Type
Array

except(otherDataFrame) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.
Parameters:
Name Type Description
otherDataFrame module:eclairjs/sql.DataFrame to compare to this DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

explain()

Prints the plans (logical and physical) to the console for debugging purposes.
Source:

filter(arg) → {module:eclairjs/sql.DataFrame}

Filters rows using the given SQL expression string or Filters rows using the given Column..
Parameters:
Name Type Description
arg string | module:eclairjs/sql.Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

first()

Returns the first row. Alias for head(). returns {module:eclairjs/sql.Row}
Source:

flatMap(func, bindArgsopt) → {module:eclairjs.RDD}

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.RDD

foreach(Function, bindArgsopt) → {void}

Applies a function to all elements of this DataFrame.
Parameters:
Name Type Attributes Description
Function function with one parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
void
Example
rdd3.foreach(function(record) {
   var connection = createNewConnection()
   connection.send(record);	
   connection.close()
});

foreachPartition(Function, bindArgsopt) → {void}

Applies a function to each partition of this DataFrame.
Parameters:
Name Type Attributes Description
Function function with one Array parameter
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
void
Example
df.foreachPartition(function(partitionOfRecords) {
   var connection = createNewConnection()
   partitionOfRecords.forEach(function(record){
      connection.send(record);	
   });
   connection.close()
});

groupBy() → {module:eclairjs/sql.GroupedData}

Groups the DataFrame using the specified columns, so we can run aggregation on them
Parameters:
Type Description
Array.<string> | Array.<module:eclairjs/sql.Column> Array of Column objects of column name strings
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Returns the first row.
Source:
Returns:
Type
module:eclairjs/sql.Row

inputFiles() → {Array.<string>}

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.
Source:
Returns:
files
Type
Array.<string>

intersect(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL
Parameters:
Name Type Description
other module:eclairjs/sql.DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

isLocal() → {boolean}

Returns true if the collect and take methods can be run locally (without any Spark executors).
Source:
Returns:
Type
boolean

join(Right, columnNamesOrJoinExpropt, joinTypeopt) → {module:eclairjs/sql.DataFrame}

Cartesian join with another DataFrame. Note that cartesian joins are very expensive without an extra filter that can be pushed down.
Parameters:
Name Type Attributes Description
Right module:eclairjs/sql.DataFrame side of the join operation.
columnNamesOrJoinExpr string | Array.<string> | module:eclairjs/sql.Column <optional>
If string or array of strings column names, inner equi-join with another DataFrame using the given columns. Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax. If Column object, joinExprs inner join with another DataFrame, using the given join expression.
joinType string <optional>
only valid if using Column joinExprs.
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var joinedDf = df1.join(df2);
// or
var joinedDf = df1.join(df2,"age");
// or
var joinedDf = df1.join(df2, ["age", "DOB"]);
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")));
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")), "outer");

limit(number) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.
Parameters:
Name Type Description
number integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

map(func, bindArgsopt) → {module:eclairjs.RDD}

Returns a new RDD by applying a function to all rows of this DataFrame.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.RDD

mapPartitions(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this DataFrame. Similar to map, but runs separately on each partition (block) of the DataFrame, so func must accept an Array. func should return a array rather than a single item.
Parameters:
Name Type Attributes Description
func function
bindArgs Array.<Object> <optional>
array whose values will be added to func's argument list.
Source:
Returns:
Type
module:eclairjs.RDD

na() → {module:eclairjs/sql.DataFrameNaFunctions}

Returns a DataFrameNaFunctions for working with missing data.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameNaFunctions

orderBy() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order. This is an alias of the sort function.
Parameters:
Name Type Description
columnName,...columnName string | module:eclairjs/sql.Column or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

persist(newLevel) → {module:eclairjs/sql.DataFrame}

Parameters:
Name Type Description
newLevel module:eclairjs/storage.StorageLevel
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

printSchema()

Prints the schema to the console in a nice tree format.
Source:

queryExecution() → {SQLContextQueryExecution}

Source:
Returns:
Type
SQLContextQueryExecution

randomSplit(weights, seed) → {Array.<module:eclairjs/sql.DataFrame>}

Randomly splits this DataFrame with the provided weights.
Parameters:
Name Type Description
weights Array.<float> weights for splits, will be normalized if they don't sum to 1.
seed int Seed for sampling.
Source:
Returns:
Type
Array.<module:eclairjs/sql.DataFrame>

rdd() → {module:eclairjs.RDD}

Represents the content of the DataFrame as an RDD of Rows.
Source:
Returns:
Type
module:eclairjs.RDD

registerTempTable(tableName)

Registers this DataFrame as a temporary table using the given name.
Parameters:
Name Type Description
tableName string
Source:

repartition(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions.
Parameters:
Name Type Description
numPartitions integer
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

rollup(columnName,) → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.
Parameters:
Name Type Description
columnName, string | module:eclairjs/sql.Column .....columnName or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.GroupedData
Example
var result = peopleDataFrame.rollup("age", "networth").count();
 // or
 var col = peopleDataFrame.col("age");
   var result = peopleDataFrame.rollup(col).count();

sample(withReplacement, fraction, seedopt) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by sampling a fraction of rows, using a random seed.
Parameters:
Name Type Attributes Description
withReplacement boolean
fraction float
seed integer <optional>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

schema() → {module:eclairjs/sql/types.StructType}

Returns the schema of this DataFrame.
Source:
Returns:
Type
module:eclairjs/sql/types.StructType

select() → {module:eclairjs/sql.DataFrame}

Selects a set of column based expressions.
Parameters:
Type Description
Array.<module:eclairjs/sql.Column> | Array.<string>
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

selectExpr() → {module:eclairjs/sql.DataFrame}

Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
Parameters:
Name Type Description
exprs,...exprs string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = peopleDataFrame.selectExpr("name", "age > 19");

show(numberOfRowsopt, {boolean)

Displays the DataFrame rows in a tabular form.
Parameters:
Name Type Attributes Description
numberOfRows interger <optional>
defaults to 20.
{boolean [truncate] defaults to false, Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right
Source:

sort() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.
Parameters:
Name Type Description
columnName,...columnName string | module:eclairjs/sql.Column or sortExprs,... sortExprs
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = peopleDataFrame.sort("age", "name");
 // or
 var col = peopleDataFrame.col("age");
   var colExpr = col.desc();
   var result = peopleDataFrame.sort(colExpr);

sqlContext() → {module:eclairjs/sql.SQLContext}

Returns SQLContext
Source:
Returns:
Type
module:eclairjs/sql.SQLContext

stat() → {module:eclairjs/sql.DataFrameStatFunctions}

Returns a DataFrameStatFunctions for working statistic functions support.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameStatFunctions
Example
var stat = peopleDataFrame.stat().cov("income", "networth");

take(num) → {Array.<module:eclairjs/sql.Row>}

Returns the first n rows in the DataFrame.
Parameters:
Name Type Description
num integer
Source:
Returns:
Type
Array.<module:eclairjs/sql.Row>

toDF() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:
Parameters:
Name Type Description
colNames,...colNames string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var result = nameAgeDF.toDF("newName", "newAge");

toJSON() → {object}

Returns the content of the DataFrame as JSON.
Source:
Returns:
Type
object

toRDD() → {module:eclairjs.RDD}

Represents the content of the DataFrame as an RDD of Rows.
Source:
Returns:
Type
module:eclairjs.RDD

unionAll(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.
Parameters:
Name Type Description
other module:eclairjs/sql.DataFrame
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

unpersist(blocking)

Parameters:
Name Type Description
blocking boolean
Source:

where(condition) → {module:eclairjs/sql.DataFrame}

Filters rows using the given Column or SQL expression.
Parameters:
Name Type Description
condition module:eclairjs/sql.Column | string .
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

withColumn(name, col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
Parameters:
Name Type Description
name string
col module:eclairjs/sql.Column
Source:
Returns:
Type
module:eclairjs/sql.DataFrame
Example
var col = peopleDataFrame.col("age");
 var df1 = peopleDataFrame.withColumn("newCol", col);

withColumnRenamed(existingName, newName) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column renamed. This is a no-op if schema doesn't contain existingName.
Parameters:
Name Type Description
existingName string
newName string
Source:
Returns:
Type
module:eclairjs/sql.DataFrame

write() → {module:eclairjs/sql.DataFrameWriter}

Interface for saving the content of the DataFrame out into external storage.
Source:
Returns:
Type
module:eclairjs/sql.DataFrameWriter