JSDoc: Class: DataFrame

Constructor

new DataFrame()

Source:

eclairjs/sql/DataFrame.js, line 37

Examples

var people = sqlContext.read.parquet("...")

// Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
// DataFrame (this class), Column, and functions.
// To select a column from the data frame:
var ageCol = people("age")

Methods

agg(hashMap) → {module:eclairjs/sql.DataFrame}

aggregates on the entire DataFrame without groups.

Parameters:

Name	Type	Description
`hashMap`	hashMap	hashMap exprs

Source:

eclairjs/sql/DataFrame.js, line 62

Returns:

Type: module:eclairjs/sql.DataFrame

Example

// df.agg(...) is a shorthand for df.groupBy().agg(...)
var map = {};
map["age"] = "max";
map["salary"] = "avg";
df.agg(map)
df.groupBy().agg(map)

apply(colName) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column. Note that the column name can also reference to a nested column like a.b.

Parameters:

Name	Type	Description
`colName`	string

Source:

eclairjs/sql/DataFrame.js, line 81

Returns:

Type: module:eclairjs/sql.Column

as(alias) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with an alias set.

Parameters:

Name	Type	Description
`alias`	string

Source:

eclairjs/sql/DataFrame.js, line 72

Returns:

Type: module:eclairjs/sql.DataFrame

cache() → {module:eclairjs/sql.DataFrame}

Persist this DataFrame with the default storage level (`MEMORY_ONLY`).

Source:

eclairjs/sql/DataFrame.js, line 88

Returns:

Type: module:eclairjs/sql.DataFrame

coalesce(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions. Similar to coalesce defined on an RDD, this operation results in a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions.

Parameters:

Name	Type	Description
`numPartitions`	integer

Source:

eclairjs/sql/DataFrame.js, line 99

Returns:

Type: module:eclairjs/sql.DataFrame

col(name) → {module:eclairjs/sql.Column}

Selects column based on the column name and return it as a Column.

Parameters:

Name	Type	Description
`name`	string

Source:

eclairjs/sql/DataFrame.js, line 107

Returns:

Type: module:eclairjs/sql.Column

collect() → {Array.<module:eclairjs/sql.Row>}

Returns an array that contains all of Rows in this DataFrame.

Source:

eclairjs/sql/DataFrame.js, line 114

Returns:

Type: Array.<module:eclairjs/sql.Row>

columns() → {Array.<string>}

Returns all column names as an array.

Source:

eclairjs/sql/DataFrame.js, line 127

Returns:

Type: Array.<string>

count() → {integer}

Returns the number of rows in the DataFrame.

Source:

eclairjs/sql/DataFrame.js, line 139

Returns:

Type: integer

cube() → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.

Parameters:

Name	Type	Description
`cols...`	string \| Column

Source:

eclairjs/sql/DataFrame.js, line 149

Returns:

Type: module:eclairjs/sql.GroupedData

Example

var df = peopleDataFrame.cube("age", "expense");

describe() → {module:eclairjs/sql.DataFrame}

Computes statistics for numeric columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical columns. This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting DataFrame. If you want to programmatically compute summary statistics, use the agg function instead.

Parameters:

Name	Type	Description
`cols....`	string

Source:

eclairjs/sql/DataFrame.js, line 174

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var df = peopleDataFrame.describe("age", "expense");

distinct()

Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.

Source:

eclairjs/sql/DataFrame.js, line 181

drop(col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column dropped.

Parameters:

Name	Type	Description
`col`	string \| module:eclairjs/sql.Column

Source:

eclairjs/sql/DataFrame.js, line 189

Returns:

Type: module:eclairjs/sql.DataFrame

dropDuplicates(colNames) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that contains only the unique rows from this DataFrame, if colNames then considering only the subset of columns.

Parameters:

Name	Type	Description
`colNames`	Array.<string>

Source:

eclairjs/sql/DataFrame.js, line 197

Returns:

Type: module:eclairjs/sql.DataFrame

dtypes() → {Array}

Returns all column names and their data types as an array of arrays. ex. [["name","StringType"],["age","IntegerType"],["expense","IntegerType"]]

Source:

eclairjs/sql/DataFrame.js, line 209

Returns:

Array of Array[2]

Type: Array

except(otherDataFrame) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.

Parameters:

Name	Type	Description
`otherDataFrame`	module:eclairjs/sql.DataFrame	to compare to this DataFrame

Source:

eclairjs/sql/DataFrame.js, line 225

Returns:

Type: module:eclairjs/sql.DataFrame

explain()

Prints the plans (logical and physical) to the console for debugging purposes.

Source:

eclairjs/sql/DataFrame.js, line 232

filter(arg) → {module:eclairjs/sql.DataFrame}

Filters rows using the given SQL expression string or Filters rows using the given Column..

Parameters:

Name	Type	Description
`arg`	string \| module:eclairjs/sql.Column

Source:

eclairjs/sql/DataFrame.js, line 241

Returns:

Type: module:eclairjs/sql.DataFrame

first()

Returns the first row. Alias for head(). returns {module:eclairjs/sql.Row}

Source:

eclairjs/sql/DataFrame.js, line 269

flatMap(func, bindArgsopt) → {module:eclairjs.RDD}

Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

eclairjs/sql/DataFrame.js, line 278

Returns:

Type: module:eclairjs.RDD

foreach(Function, bindArgsopt) → {void}

Applies a function to all elements of this DataFrame.

Parameters:

Name	Type	Attributes	Description
`Function`	function		with one parameter
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

eclairjs/sql/DataFrame.js, line 293

Returns:

Type: void

Example

rdd3.foreach(function(record) {
   var connection = createNewConnection()
   connection.send(record);	
   connection.close()
});

foreachPartition(Function, bindArgsopt) → {void}

Applies a function to each partition of this DataFrame.

Parameters:

Name	Type	Attributes	Description
`Function`	function		with one Array parameter
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

eclairjs/sql/DataFrame.js, line 311

Returns:

Type: void

Example

df.foreachPartition(function(partitionOfRecords) {
   var connection = createNewConnection()
   partitionOfRecords.forEach(function(record){
      connection.send(record);	
   });
   connection.close()
});

groupBy() → {module:eclairjs/sql.GroupedData}

Groups the DataFrame using the specified columns, so we can run aggregation on them

Parameters:

Type	Description
Array.<string> \| Array.<module:eclairjs/sql.Column>	Array of Column objects of column name strings

Source:

eclairjs/sql/DataFrame.js, line 321

Returns:

Type: module:eclairjs/sql.GroupedData

head() → {module:eclairjs/sql.Row}

Returns the first row.

Source:

eclairjs/sql/DataFrame.js, line 363

Returns:

Type: module:eclairjs/sql.Row

inputFiles() → {Array.<string>}

Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent BaseRelation for its respective files and takes the union of all results. Depending on the source relations, this may not find all input files. Duplicates are removed.

Source:

eclairjs/sql/DataFrame.js, line 372

Returns:

files

Type: Array.<string>

intersect(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL

Parameters:

Name	Type	Description
`other`	module:eclairjs/sql.DataFrame

Source:

eclairjs/sql/DataFrame.js, line 385

Returns:

Type: module:eclairjs/sql.DataFrame

isLocal() → {boolean}

Returns true if the collect and take methods can be run locally (without any Spark executors).

Source:

eclairjs/sql/DataFrame.js, line 392

Returns:

Type: boolean

join(Right, columnNamesOrJoinExpropt, joinTypeopt) → {module:eclairjs/sql.DataFrame}

Cartesian join with another DataFrame. Note that cartesian joins are very expensive without an extra filter that can be pushed down.

Parameters:

Name	Type	Attributes	Description
`Right`	module:eclairjs/sql.DataFrame		side of the join operation.
`columnNamesOrJoinExpr`	string \| Array.<string> \| module:eclairjs/sql.Column	<optional>	If string or array of strings column names, inner equi-join with another DataFrame using the given columns. Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax. If Column object, joinExprs inner join with another DataFrame, using the given join expression.
`joinType`	string	<optional>	only valid if using Column joinExprs.

Source:

eclairjs/sql/DataFrame.js, line 414

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var joinedDf = df1.join(df2);
// or
var joinedDf = df1.join(df2,"age");
// or
var joinedDf = df1.join(df2, ["age", "DOB"]);
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")));
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")), "outer");

limit(number) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head returns an array while limit returns a new DataFrame.

Parameters:

Name	Type	Description
`number`	integer

Source:

eclairjs/sql/DataFrame.js, line 439

Returns:

Type: module:eclairjs/sql.DataFrame

map(func, bindArgsopt) → {module:eclairjs.RDD}

Returns a new RDD by applying a function to all rows of this DataFrame.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

eclairjs/sql/DataFrame.js, line 449

Returns:

Type: module:eclairjs.RDD

mapPartitions(func, bindArgsopt) → {module:eclairjs.RDD}

Return a new RDD by applying a function to each partition of this DataFrame. Similar to map, but runs separately on each partition (block) of the DataFrame, so func must accept an Array. func should return a array rather than a single item.

Parameters:

Name	Type	Attributes	Description
`func`	function
`bindArgs`	Array.<Object>	<optional>	array whose values will be added to func's argument list.

Source:

eclairjs/sql/DataFrame.js, line 460

Returns:

Type: module:eclairjs.RDD

na() → {module:eclairjs/sql.DataFrameNaFunctions}

Returns a DataFrameNaFunctions for working with missing data.

Source:

eclairjs/sql/DataFrame.js, line 468

Returns:

Type: module:eclairjs/sql.DataFrameNaFunctions

orderBy() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order. This is an alias of the sort function.

Parameters:

Name	Type	Description
`columnName,...columnName`	string \| module:eclairjs/sql.Column	or sortExprs,... sortExprs

Source:

eclairjs/sql/DataFrame.js, line 477

Returns:

Type: module:eclairjs/sql.DataFrame

persist(newLevel) → {module:eclairjs/sql.DataFrame}

Parameters:

Name	Type	Description
`newLevel`	module:eclairjs/storage.StorageLevel

Source:

eclairjs/sql/DataFrame.js, line 484

Returns:

Type: module:eclairjs/sql.DataFrame

printSchema()

Prints the schema to the console in a nice tree format.

Source:

eclairjs/sql/DataFrame.js, line 491

queryExecution() → {SQLContextQueryExecution}

Source:

eclairjs/sql/DataFrame.js, line 497

Returns:

Type: SQLContextQueryExecution

randomSplit(weights, seed) → {Array.<module:eclairjs/sql.DataFrame>}

Randomly splits this DataFrame with the provided weights.

Parameters:

Name	Type	Description
`weights`	Array.<float>	weights for splits, will be normalized if they don't sum to 1.
`seed`	int	Seed for sampling.

Source:

eclairjs/sql/DataFrame.js, line 506

Returns:

Type: Array.<module:eclairjs/sql.DataFrame>

rdd() → {module:eclairjs.RDD}

Represents the content of the DataFrame as an RDD of Rows.

Source:

eclairjs/sql/DataFrame.js, line 518

Returns:

Type: module:eclairjs.RDD

registerTempTable(tableName)

Registers this DataFrame as a temporary table using the given name.

Parameters:

Name	Type	Description
`tableName`	string

Source:

eclairjs/sql/DataFrame.js, line 525

repartition(numPartitions) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame that has exactly numPartitions partitions.

Parameters:

Name	Type	Description
`numPartitions`	integer

Source:

eclairjs/sql/DataFrame.js, line 533

Returns:

Type: module:eclairjs/sql.DataFrame

rollup(columnName,) → {module:eclairjs/sql.GroupedData}

Create a multi-dimensional rollup for the current DataFrame using the specified columns, so we can run aggregation on them. See GroupedData for all the available aggregate functions.

Parameters:

Name	Type	Description
`columnName,`	string \| module:eclairjs/sql.Column	.....columnName or sortExprs,... sortExprs

Source:

eclairjs/sql/DataFrame.js, line 547

Returns:

Type: module:eclairjs/sql.GroupedData

Example

var result = peopleDataFrame.rollup("age", "networth").count();
 // or
 var col = peopleDataFrame.col("age");
   var result = peopleDataFrame.rollup(col).count();

sample(withReplacement, fraction, seedopt) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by sampling a fraction of rows, using a random seed.

Parameters:

Name	Type	Attributes
`withReplacement`	boolean
`fraction`	float
`seed`	integer	<optional>

Source:

eclairjs/sql/DataFrame.js, line 567

Returns:

Type: module:eclairjs/sql.DataFrame

schema() → {module:eclairjs/sql/types.StructType}

Returns the schema of this DataFrame.

Source:

eclairjs/sql/DataFrame.js, line 574

Returns:

Type: module:eclairjs/sql/types.StructType

select() → {module:eclairjs/sql.DataFrame}

Selects a set of column based expressions.

Parameters:

Type	Description
Array.<module:eclairjs/sql.Column> \| Array.<string>

Source:

eclairjs/sql/DataFrame.js, line 582

Returns:

Type: module:eclairjs/sql.DataFrame

selectExpr() → {module:eclairjs/sql.DataFrame}

Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.

Parameters:

Name	Type	Description
`exprs,...exprs`	string

Source:

eclairjs/sql/DataFrame.js, line 597

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = peopleDataFrame.selectExpr("name", "age > 19");

show(numberOfRowsopt, {boolean)

Displays the DataFrame rows in a tabular form.

Parameters:

Name	Type	Attributes	Description
`numberOfRows`	interger	<optional>	defaults to 20.
`{boolean`			[truncate] defaults to false, Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right

Source:

eclairjs/sql/DataFrame.js, line 636

sort() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.

Parameters:

Name	Type	Description
`columnName,...columnName`	string \| module:eclairjs/sql.Column	or sortExprs,... sortExprs

Source:

eclairjs/sql/DataFrame.js, line 652

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = peopleDataFrame.sort("age", "name");
 // or
 var col = peopleDataFrame.col("age");
   var colExpr = col.desc();
   var result = peopleDataFrame.sort(colExpr);

sqlContext() → {module:eclairjs/sql.SQLContext}

Returns SQLContext

Source:

eclairjs/sql/DataFrame.js, line 669

Returns:

Type: module:eclairjs/sql.SQLContext

stat() → {module:eclairjs/sql.DataFrameStatFunctions}

Returns a DataFrameStatFunctions for working statistic functions support.

Source:

eclairjs/sql/DataFrame.js, line 679

Returns:

Type: module:eclairjs/sql.DataFrameStatFunctions

Example

var stat = peopleDataFrame.stat().cov("income", "networth");

take(num) → {Array.<module:eclairjs/sql.Row>}

Returns the first n rows in the DataFrame.

Parameters:

Name	Type	Description
`num`	integer

Source:

eclairjs/sql/DataFrame.js, line 687

Returns:

Type: Array.<module:eclairjs/sql.Row>

toDF() → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a RDD of tuples into a DataFrame with meaningful names. For example:

Parameters:

Name	Type	Description
`colNames,...colNames`	string

Source:

eclairjs/sql/DataFrame.js, line 703

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var result = nameAgeDF.toDF("newName", "newAge");

toJSON() → {object}

Returns the content of the DataFrame as JSON.

Source:

eclairjs/sql/DataFrame.js, line 711

Returns:

Type: object

toRDD() → {module:eclairjs.RDD}

Represents the content of the DataFrame as an RDD of Rows.

Source:

eclairjs/sql/DataFrame.js, line 722

Returns:

Type: module:eclairjs.RDD

unionAll(other) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.

Parameters:

Name	Type	Description
`other`	module:eclairjs/sql.DataFrame

Source:

eclairjs/sql/DataFrame.js, line 730

Returns:

Type: module:eclairjs/sql.DataFrame

unpersist(blocking)

Parameters:

Name	Type	Description
`blocking`	boolean

Source:

eclairjs/sql/DataFrame.js, line 736

where(condition) → {module:eclairjs/sql.DataFrame}

Filters rows using the given Column or SQL expression.

Parameters:

Name	Type	Description
`condition`	module:eclairjs/sql.Column \| string	.

Source:

eclairjs/sql/DataFrame.js, line 744

Returns:

Type: module:eclairjs/sql.DataFrame

withColumn(name, col) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame by adding a column or replacing the existing column that has the same name.

Parameters:

Name	Type	Description
`name`	string
`col`	module:eclairjs/sql.Column

Source:

eclairjs/sql/DataFrame.js, line 756

Returns:

Type: module:eclairjs/sql.DataFrame

Example

var col = peopleDataFrame.col("age");
 var df1 = peopleDataFrame.withColumn("newCol", col);

withColumnRenamed(existingName, newName) → {module:eclairjs/sql.DataFrame}

Returns a new DataFrame with a column renamed. This is a no-op if schema doesn't contain existingName.

Parameters:

Name	Type	Description
`existingName`	string
`newName`	string

Source:

eclairjs/sql/DataFrame.js, line 765

Returns:

Type: module:eclairjs/sql.DataFrame

write() → {module:eclairjs/sql.DataFrameWriter}

Interface for saving the content of the DataFrame out into external storage.

Source:

eclairjs/sql/DataFrame.js, line 772

Returns:

Type: module:eclairjs/sql.DataFrameWriter