Constructor
new DataFrame()
- Source:
Examples
var people = sqlContext.read.parquet("...")
// Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in:
// DataFrame (this class), Column, and functions.
// To select a column from the data frame:
var ageCol = people("age")
Methods
(static) show(rows, truncateopt)
Displays the DataFrame rows in a tabular form.
The array of rows are the result of take(), etc
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
rows |
Array.<module:spark/sql.Row> | ||
truncate |
boolean |
<optional> |
defaults to false, Whether truncate long strings. If true, strings more than 20 characters will be truncated and all cells will be aligned right |
- Source:
agg(hashMap) → {module:eclairjs/sql.DataFrame}
aggregates on the entire DataFrame without groups.
Parameters:
Name | Type | Description |
---|---|---|
hashMap |
hashMap | hashMap |
- Source:
Returns:
Example
// df.agg(...) is a shorthand for df.groupBy().agg(...)
var map = {};
map["age"] = "max";
map["salary"] = "avg";
df.agg(map)
df.groupBy().agg(map)
apply(colName) → {module:eclairjs/sql.Column}
Selects column based on the column name and return it as a Column.
Note that the column name can also reference to a nested column like a.b.
Parameters:
Name | Type | Description |
---|---|---|
colName |
string |
- Source:
Returns:
as(alias) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame with an alias set.
Parameters:
Name | Type | Description |
---|---|---|
alias |
string |
- Source:
Returns:
cache() → {module:eclairjs/sql.DataFrame}
Persist this DataFrame with the default storage level (`MEMORY_ONLY`).
- Source:
Returns:
coalesce(numPartitions) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame that has exactly numPartitions partitions.
Similar to coalesce defined on an RDD, this operation results in a narrow dependency,
e.g. if you go from 1000 partitions to 100 partitions, there will not be a shuffle,
instead each of the 100 new partitions will claim 10 of the current partitions.
Parameters:
Name | Type | Description |
---|---|---|
numPartitions |
integer |
- Source:
Returns:
col(name) → {module:eclairjs/sql.Column}
Selects column based on the column name and return it as a Column.
Parameters:
Name | Type | Description |
---|---|---|
name |
string |
- Source:
Returns:
collect() → {Promise.<Array.<Row>>}
Returns an array that contains all of Rows in this DataFrame.
- Source:
Returns:
A Promise that resolves to an array containing all Rows.
- Type
- Promise.<Array.<Row>>
columns() → {Promise.<Array.<string>>}
Returns all column names as an array.
- Source:
Returns:
A Promise that resolves to an array containing all column names.
- Type
- Promise.<Array.<string>>
count() → {Promise.<integer>}
Returns the number of rows in the DataFrame.
- Source:
Returns:
A Promise that resolves to the number of rows in the DataFrame.
- Type
- Promise.<integer>
cube() → {module:eclairjs/sql.GroupedData}
Create a multi-dimensional cube for the current DataFrame using the specified columns, so we can run aggregation on them.
Parameters:
Name | Type | Description |
---|---|---|
cols... |
string | Column |
- Source:
Returns:
Example
var df = dataFrame.cube("age", "expense");
describe() → {module:eclairjs/sql.DataFrame}
Computes statistics for numeric columns, including count, mean, stddev, min, and max.
If no columns are given, this function computes statistics for all numerical columns.
This function is meant for exploratory data analysis, as we make no guarantee about the backward
compatibility of the schema of the resulting DataFrame. If you want to programmatically compute
summary statistics, use the agg function instead.
Parameters:
Name | Type | Description |
---|---|---|
cols.... |
string |
- Source:
Returns:
Example
var df = peopleDataFrame.describe("age", "expense");
distinct() → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame that contains only the unique rows from this DataFrame. This is an alias for dropDuplicates.
- Source:
Returns:
drop(column) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame with a column dropped.
Parameters:
Name | Type | Description |
---|---|---|
column |
string | Column |
- Source:
Returns:
dropDuplicates(colNames) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame that contains only the unique rows from this DataFrame, if colNames then considering only the subset of columns.
Parameters:
Name | Type | Description |
---|---|---|
colNames |
Array.<string> |
- Source:
Returns:
dtypes() → {Promise.<Array>}
Returns all column names and their data types as an array of arrays. ex. [["name","StringType"],["age","IntegerType"],["expense","IntegerType"]]
- Source:
Returns:
A Promise that resolves to an Array of Array[2].
- Type
- Promise.<Array>
except(otherDataFrame) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame containing rows in this frame but not in another frame. This is equivalent to EXCEPT in SQL.
Parameters:
Name | Type | Description |
---|---|---|
otherDataFrame |
module:eclairjs/sql.DataFrame | to compare to this DataFrame |
- Source:
Returns:
explain() → {Promise.<Void>}
Prints the plans (logical and physical) to the console for debugging purposes.
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
filter(column) → {module:eclairjs/sql.DataFrame}
Filters rows using the given SQL expression string or Filters rows using the given Column..
Parameters:
Name | Type | Description |
---|---|---|
column |
string | Column |
- Source:
Returns:
first() → {module:eclairjs/sql.Row}
Returns the first row.
- Source:
Returns:
flatMap(func, bindArgsopt) → {module:eclairjs/rdd.RDD}
Returns a new RDD by first applying a function to all rows of this DataFrame, and then flattening the results.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
foreach(func, bindArgsopt) → {Promise.<Void>}
Applies a function func to all rows.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
foreachPartition(func, bindArgsopt) → {Promise.<Void>}
Applies a function to each partition of this DataFrame.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
groupBy() → {module:eclairjs/sql.GroupedData}
Groups the DataFrame using the specified columns, so we can run aggregation on them
Parameters:
Type | Description |
---|---|
Array.<string> | Array.<Column> | Array of Column objects of column name strings |
- Source:
Returns:
head() → {Promise.<module:eclairjs/sql.Row>}
Returns the first row.
- Source:
Returns:
- Type
- Promise.<module:eclairjs/sql.Row>
inputFiles() → {Promise.<Array.<string>>}
Returns a best-effort snapshot of the files that compose this DataFrame. This method simply asks each constituent
BaseRelation for its respective files and takes the union of all results. Depending on the source relations,
this may not find all input files. Duplicates are removed.
- Source:
Returns:
Promise which resolves to a list of files.
- Type
- Promise.<Array.<string>>
intersect(other) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame containing rows only in both this frame and another frame. This is equivalent to INTERSECT in SQL
Parameters:
Name | Type | Description |
---|---|---|
other |
module:eclairjs/sql.DataFrame |
- Source:
Returns:
isLocal() → {Promise.<boolean>}
Returns true if the collect and take methods can be run locally (without any Spark executors).
- Source:
Returns:
- Type
- Promise.<boolean>
join(Right, columnNamesOrJoinExpropt, joinTypeopt) → {module:eclairjs/sql.DataFrame}
Cartesian join with another DataFrame. Note that cartesian joins are very expensive without an extra filter that can be pushed down.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
Right |
module:eclairjs/sql.DataFrame | side of the join operation. | |
columnNamesOrJoinExpr |
string | Array.<string> | Column |
<optional> |
If string or array of strings column names, inner equi-join with another DataFrame using the given columns. Different from other join functions, the join columns will only appear once in the output, i.e. similar to SQL's JOIN USING syntax. If Column object, joinExprs inner join with another DataFrame, using the given join expression. |
joinType |
string |
<optional> |
only valid if using Column joinExprs. |
- Source:
Returns:
Example
var joinedDf = df1.join(df2);
// or
var joinedDf = df1.join(df2,"age");
// or
var joinedDf = df1.join(df2, ["age", "DOB"]);
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")));
// or Column joinExpr
var joinedDf = df1.join(df2, df1.col("name").equalTo(df2.col("name")), "outer");
limit(number) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame by taking the first n rows. The difference between this function and head is that head
returns an array while limit returns a new DataFrame.
Parameters:
Name | Type | Description |
---|---|---|
number |
integer |
- Source:
Returns:
map(func, bindArgsopt) → {module:eclairjs/rdd.RDD}
Returns a new RDD by applying a function to all rows of this DataFrame.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
mapPartitions(func, bindArgsopt) → {module:eclairjs/rdd.RDD}
Return a new RDD by applying a function to each partition of this DataFrame.
Similar to map, but runs separately on each partition (block) of the DataFrame, so func must accept an Array.
func should return a array rather than a single item.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
func |
function | ||
bindArgs |
Array.<Object> |
<optional> |
array whose values will be added to func's argument list. |
- Source:
Returns:
na() → {module:eclairjs/sql.DataFrameNaFunctions}
Returns a DataFrameNaFunctions for working with missing data.
- Source:
Returns:
orderBy() → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.
This is an alias of the sort function.
Parameters:
Name | Type | Description |
---|---|---|
columnName,...columnName |
string | Column | or sortExprs,... sortExprs |
- Source:
Returns:
persist(newLevel) → {module:eclairjs/sql.DataFrame}
Parameters:
Name | Type | Description |
---|---|---|
newLevel |
module:eclairjs/storage.StorageLevel |
- Source:
Returns:
printSchema() → {Promise.<Void>}
Prints the schema to the console in a nice tree format.
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
queryExecution() → {module:eclairjs/sql.SQLContextQueryExecution}
- Source:
Returns:
randomSplit(weights, seed) → {Array.<DataFrame>}
Randomly splits this DataFrame with the provided weights.
Parameters:
Name | Type | Description |
---|---|---|
weights |
Array.<float> | weights for splits, will be normalized if they don't sum to 1. |
seed |
int | Seed for sampling. |
- Source:
Returns:
- Type
- Array.<DataFrame>
rdd() → {module:eclairjs/rdd.RDD}
Represents the content of the DataFrame as an RDD of Rows.
- Source:
Returns:
registerTempTable(tableName) → {Promise.<Void>}
Registers this DataFrame as a temporary table using the given name.
Parameters:
Name | Type | Description |
---|---|---|
tableName |
string |
- Source:
Returns:
A Promise that resolves when the temp table has been created.
- Type
- Promise.<Void>
repartition(numPartitions) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame that has exactly numPartitions partitions.
Parameters:
Name | Type | Description |
---|---|---|
numPartitions |
integer |
- Source:
Returns:
rollup(columnName,) → {module:eclairjs/sql.GroupedData}
Create a multi-dimensional rollup for the current DataFrame using the specified columns,
so we can run aggregation on them. See GroupedData for all the available aggregate functions.
Parameters:
Name | Type | Description |
---|---|---|
columnName, |
string | Column | .....columnName or sortExprs,... sortExprs |
- Source:
Returns:
Example
var result = peopleDataFrame.rollup("age", "networth").count();
// or
var col = peopleDataFrame.col("age");
var result = peopleDataFrame.rollup(col).count();
sample(withReplacement, fraction, seedopt) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame by sampling a fraction of rows, using a random seed.
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
withReplacement |
boolean | ||
fraction |
float | ||
seed |
integer |
<optional> |
- Source:
Returns:
schema() → {module:eclairjs/sql/types.StructType}
Returns the schema of this DataFrame.
- Source:
Returns:
select() → {module:eclairjs/sql.DataFrame}
Selects a set of column based expressions.
Parameters:
Type | Description |
---|---|
Array.<module:eclairjs/sql.Column> | Array.<string> |
- Source:
Returns:
selectExpr() → {module:eclairjs/sql.DataFrame}
Selects a set of SQL expressions. This is a variant of select that accepts SQL expressions.
Parameters:
Name | Type | Description |
---|---|---|
exprs,...exprs |
string |
- Source:
Returns:
Example
var result = peopleDataFrame.selectExpr("name", "age > 19");
show() → {Promise.<Void>}
Displays the top 20 rows of DataFrame in a tabular form.
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
sort() → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame sorted by the specified columns, if columnName is used sorted in ascending order.
Parameters:
Name | Type | Description |
---|---|---|
columnName,...columnName |
string | Column | or sortExprs,... sortExprs |
- Source:
Returns:
Example
var result = peopleDataFrame.sort("age", "name");
// or
var col = peopleDataFrame.col("age");
var colExpr = col.desc();
var result = peopleDataFrame.sort(colExpr);
sqlContext() → {module:eclairjs/sql.SQLContext}
Returns SQLContext
- Source:
Returns:
stat() → {module:eclairjs/sql.DataFrameStatFunctions}
Returns a DataFrameStatFunctions for working statistic functions support.
- Source:
Returns:
Example
var stat = peopleDataFrame.stat().cov("income", "networth");
take(num) → {Promise.<Array>}
Returns the first n rows in the DataFrame.
Parameters:
Name | Type | Description |
---|---|---|
num |
integer |
- Source:
Returns:
A Promise that resolves to an array containing the first num elements in this DataFrame.
- Type
- Promise.<Array>
toDF() → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame with columns renamed. This can be quite convenient in conversion from a
RDD of tuples into a DataFrame with meaningful names. For example:
Parameters:
Name | Type | Description |
---|---|---|
colNames,...colNames |
string |
- Source:
Returns:
Example
var result = nameAgeDF.toDF("newName", "newAge");
toJSON() → {Promise.<Array.<object>>}
Returns the content of the DataFrame as a RDD of JSON strings.
- Source:
Returns:
- Type
- Promise.<Array.<object>>
toRDD() → {module:eclairjs/rdd.RDD}
Returns a RDD object.
- Source:
Returns:
unionAll(other) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame containing union of rows in this frame and another frame. This is equivalent to UNION ALL in SQL.
Parameters:
Name | Type | Description |
---|---|---|
other |
module:eclairjs/sql.DataFrame |
- Source:
Returns:
unpersist(blocking) → {Promise.<Void>}
Parameters:
Name | Type | Description |
---|---|---|
blocking |
boolean |
- Source:
Returns:
A Promise that resolves to nothing.
- Type
- Promise.<Void>
where(condition) → {module:eclairjs/sql.DataFrame}
Filters rows using the given Column or SQL expression.
Parameters:
Name | Type | Description |
---|---|---|
condition |
module:eclairjs/sql.Column | string | . |
- Source:
Returns:
withColumn(name, col) → {module:eclairjs/sql.DataFrame}
Returns a new DataFrame by adding a column or replacing the existing column that has the same name.
Parameters:
Name | Type | Description |
---|---|---|
name |
string | |
col |
module:eclairjs/sql.Column |
- Source:
Returns:
Example
var col = peopleDataFrame.col("age");
var df1 = peopleDataFrame.withColumn("newCol", col);
write() → {module:eclairjs/sql.DataFrameWriter}
Interface for saving the content of the DataFrame out into external storage.
- Source: