JSDoc: Class: RelationalGroupedDataset

Class: RelationalGroupedDataset

eclairjs/sql.RelationalGroupedDataset

A set of methods for aggregations on a [[DataFrame]], created by groupBy. The main method is the agg function, which has multiple variants. This class also contains convenience some first order statistics such as mean, sum for convenience. This class was named `GroupedData` in Spark 1.x.

Constructor

new RelationalGroupedDataset()

Since:

EclairJS 0.7 Spark 2.0.0

Source:

sql/RelationalGroupedDataset.js, line 33

Methods

agg()

Compute aggregates by specifying a series of aggregate columns. Note that this function by default retains the grouping columns in its output. To not retain grouping columns, set spark.sql.retainGroupColumns to false. The available aggregate methods are defined in functions.

Parameters:

Name	Type	Description
`columnExpr,...columnExpr`	module:eclairjs/sql.Column \| string	or columnName, ...columnName

Since:

EclairJS 0.1 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 49

Returns:

module:eclairjs/sql.Dataset}

Example

// Java:
df.groupBy("department").agg(max("age"), sum("expense"));

avg(…colNames)

Compute the mean value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the mean values for them.

Parameters:

Name	Type	Attributes	Description
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 134

Returns:

module:eclairjs/sql.Dataset}

count()

Count the number of rows for each group. The resulting DataFrame will also contain the grouping columns.

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 69

Returns:

module:eclairjs/sql.Dataset}

max(…colNames)

Compute the max value for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the max values for them.

Parameters:

Name	Type	Attributes	Description
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 112

Returns:

module:eclairjs/sql.Dataset}

mean()

Compute the average value for each numeric columns for each group. This is an alias for `avg`. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the average values for them.

Parameters:

Name	Type	Description
`colNames...`	string

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 90

Returns:

module:eclairjs/sql.Dataset}

min(…colNames)

Compute the min value for each numeric column for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the min values for them.

Parameters:

Name	Type	Attributes	Description
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 156

Returns:

module:eclairjs/sql.Dataset}

pivot(pivotColumn, valuesopt) → {module:eclairjs/sql.RelationalGroupedDataset}

Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs to first compute the list of distinct values internally.

Parameters:

Name	Type	Attributes	Description
`pivotColumn`	string		Name of the column to pivot.
`values`	module:eclairjs.List	<optional>	List of values that will be translated to columns in the output DataFrame.

Since:

EclairJS 0.1 Spark 1.6.0

Source:

sql/RelationalGroupedDataset.js, line 210

Returns:

Type: module:eclairjs/sql.RelationalGroupedDataset

Example

// Compute the sum of earnings for each year by course with each course as a separate column
  df.groupBy("year").pivot("course", new List(["dotNET", "Java"])).sum("earnings")

  // Or without specifying column values (less efficient)
  df.groupBy("year").pivot("course").sum("earnings")

sum(…colNames)

Compute the sum for each numeric columns for each group. The resulting DataFrame will also contain the grouping columns. When specified columns are given, only compute the sum for them.

Parameters:

Name	Type	Attributes	Description
`colNames`	string	<repeatable>

Since:

EclairJS 0.7 Spark 1.3.0

Source:

sql/RelationalGroupedDataset.js, line 178

Returns:

module:eclairjs/sql.Dataset}