Package 'neopolars'

Title: R Bindings for the 'polars' Rust Library
Description: Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Authors: Tatsuya Shima [aut, cre], Authors of the dependency Rust crates [aut]
Maintainer: Tatsuya Shima <[email protected]>
License: MIT + file LICENSE
Version: 0.0.0.9000
Built: 2025-01-19 06:23:49 UTC
Source: https://github.com/eitsupi/neo-r-polars

Help Index


Create a Polars DataFrame from an R object

Description

The as_polars_df() function creates a polars DataFrame from various R objects. Polars DataFrame is based on a sequence of Polars Series, so basically, the input object is converted to a list of Polars Series by as_polars_series(), then a Polars DataFrame is created from the list.

Usage

as_polars_df(x, ...)

## Default S3 method:
as_polars_df(x, ...)

## S3 method for class 'polars_series'
as_polars_df(x, ..., column_name = NULL, from_struct = TRUE)

## S3 method for class 'polars_data_frame'
as_polars_df(x, ...)

## S3 method for class 'polars_group_by'
as_polars_df(x, ...)

## S3 method for class 'polars_lazy_frame'
as_polars_df(
  x,
  ...,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  no_optimization = FALSE,
  streaming = FALSE
)

## S3 method for class 'list'
as_polars_df(x, ...)

## S3 method for class 'data.frame'
as_polars_df(x, ...)

## S3 method for class ''NULL''
as_polars_df(x, ...)

Arguments

x

An R object.

...

Additional arguments passed to the methods.

column_name

A character or NULL. If not NULL, name/rename the Series column in the new DataFrame. If NULL, the column name is taken from the Series name.

from_struct

A logical. If TRUE (default) and the Series data type is a struct, the <Series>$struct$unnest() method is used to create a DataFrame from the struct Series. In this case, the column_name argument is ignored.

type_coercion

A logical, indicats type coercion optimization.

predicate_pushdown

A logical, indicats predicate pushdown optimization.

projection_pushdown

A logical, indicats projection pushdown optimization.

simplify_expression

A logical, indicats simplify expression optimization.

slice_pushdown

A logical, indicats slice pushdown optimization.

comm_subplan_elim

A logical, indicats tring to cache branching subplans that occur on self-joins or unions.

comm_subexpr_elim

A logical, indicats tring to cache common subexpressions.

cluster_with_columns

A logical, indicats to combine sequential independent calls to with_columns.

no_optimization

A logical. If TRUE, turn off (certain) optimizations.

streaming

A logical. If TRUE, process the query in batches to handle larger-than-memory data. If FALSE (default), the entire query is processed in a single batch. Note that streaming mode is considered unstable. It may be changed at any point without it being considered a breaking change.

Details

The default method of as_polars_df() throws an error, so we need to define methods for the classes we want to support.

S3 method for list

  • The argument ... (except name) is passed to as_polars_series() for each element of the list.

  • All elements of the list must be converted to the same length of Series by as_polars_series().

  • The name of the each element is used as the column name of the DataFrame. For unnamed elements, the column name will be an empty string "" or if the element is a Series, the column name will be the name of the Series.

S3 method for data.frame

S3 method for polars_series

This is a shortcut for <Series>$to_frame() or <Series>$struct$unnest(), depending on the from_struct argument and the Series data type. The column_name argument is passed to the name argument of the $to_frame() method.

S3 method for polars_lazy_frame

This is a shortcut for <LazyFrame>$collect().

Value

A polars DataFrame

See Also

Examples

# list
as_polars_df(list(a = 1:2, b = c("foo", "bar")))

# data.frame
as_polars_df(data.frame(a = 1:2, b = c("foo", "bar")))

# polars_series
s_int <- as_polars_series(1:2, "a")
s_struct <- as_polars_series(
  data.frame(a = 1:2, b = c("foo", "bar")),
  "struct"
)

## Use the Series as a column
as_polars_df(s_int)
as_polars_df(s_struct, column_name = "values", from_struct = FALSE)

## Unnest the struct data
as_polars_df(s_struct)

Create a Polars expression from an R object

Description

The as_polars_expr() function creates a polars expression from various R objects. This function is used internally by various polars functions that accept expressions. In most cases, users should use pl$lit() instead of this function, which is a shorthand for as_polars_expr(x, as_lit = TRUE). (In other words, this function can be considered as an internal implementation to realize the lit function of the Polars API in other languages.)

Usage

as_polars_expr(x, ...)

## Default S3 method:
as_polars_expr(x, ...)

## S3 method for class 'polars_expr'
as_polars_expr(x, ..., structify = FALSE)

## S3 method for class 'polars_series'
as_polars_expr(x, ...)

## S3 method for class 'character'
as_polars_expr(x, ..., as_lit = FALSE)

## S3 method for class 'logical'
as_polars_expr(x, ...)

## S3 method for class 'integer'
as_polars_expr(x, ...)

## S3 method for class 'double'
as_polars_expr(x, ...)

## S3 method for class 'raw'
as_polars_expr(x, ...)

## S3 method for class ''NULL''
as_polars_expr(x, ...)

Arguments

x

An R object.

...

Additional arguments passed to the methods.

structify

A logical. If TRUE, convert multi-column expressions to a single struct expression by calling pl$struct(). Otherwise (default), done nothing.

as_lit

A logical value indicating whether to treat vector as literal values or not. This argument is always set to TRUE when calling this function from pl$lit(), and expects to return literal values. See examples for details.

Details

Because R objects are typically mapped to Series, this function often calls as_polars_series() internally. However, unlike R, Polars has scalars of length 1, so if an R object is converted to a Series of length 1, this function get the first value of the Series and convert it to a scalar literal. If you want to implement your own conversion from an R class to a Polars object, define an S3 method for as_polars_series() instead of this function.

Default S3 method

Create a Series by calling as_polars_series() and then convert that Series to an Expr. If the length of the Series is 1, it will be converted to a scalar value.

Additional arguments ... are passed to as_polars_series().

S3 method for character

If the as_lit argument is FALSE (default), this function will call pl$col() and the character vector is treated as column names.

Value

A polars expression

Literal scalar mapping

Since R has no scalar class, each of the following types of length 1 cases is specially converted to a scalar literal.

  • character: String

  • logical: Boolean

  • integer: Int32

  • double: Float64

These types' NA is converted to a null literal with casting to the corresponding Polars type.

The raw type vector is converted to a Binary scalar.

  • raw: Binary

NULL is converted to a Null type null literal.

  • NULL: Null

For other R class, the default S3 method is called and R object will be converted via as_polars_series(). So the type mapping is defined by as_polars_series().

See Also

Examples

# character
## as_lit = FALSE (default)
as_polars_expr("a") # Same as `pl$col("a")`
as_polars_expr(c("a", "b")) # Same as `pl$col("a", "b")`

## as_lit = TRUE
as_polars_expr(character(0), as_lit = TRUE)
as_polars_expr("a", as_lit = TRUE)
as_polars_expr(NA_character_, as_lit = TRUE)
as_polars_expr(c("a", "b"), as_lit = TRUE)

# logical
as_polars_expr(logical(0))
as_polars_expr(TRUE)
as_polars_expr(NA)
as_polars_expr(c(TRUE, FALSE))

# integer
as_polars_expr(integer(0))
as_polars_expr(1L)
as_polars_expr(NA_integer_)
as_polars_expr(c(1L, 2L))

# double
as_polars_expr(double(0))
as_polars_expr(1)
as_polars_expr(NA_real_)
as_polars_expr(c(1, 2))

# raw
as_polars_expr(raw(0))
as_polars_expr(charToRaw("foo"))

# NULL
as_polars_expr(NULL)

# default method (for list)
as_polars_expr(list())
as_polars_expr(list(1))
as_polars_expr(list(1, 2))

# default method (for Date)
as_polars_expr(as.Date(integer(0)))
as_polars_expr(as.Date("2021-01-01"))
as_polars_expr(as.Date(c("2021-01-01", "2021-01-02")))

# polars_series
## Unlike the default method, this method does not extract the first value
as_polars_series(1) |>
  as_polars_expr()

# polars_expr
as_polars_expr(pl$col("a", "b"))
as_polars_expr(pl$col("a", "b"), structify = TRUE)

Create a Polars LazyFrame from an R object

Description

The as_polars_lf() function creates a LazyFrame from various R objects. It is basically a shortcut for as_polars_df(x, ...) with the ⁠$lazy()⁠method.

Usage

as_polars_lf(x, ...)

## Default S3 method:
as_polars_lf(x, ...)

## S3 method for class 'polars_lazy_frame'
as_polars_lf(x, ...)

Arguments

x

An R object.

...

Additional arguments passed to the methods.

Details

Default S3 method

Create a DataFrame by calling as_polars_df() and then create a LazyFrame from the DataFrame. Additional arguments ... are passed to as_polars_df().

Value

A polars LazyFrame


Create a Polars Series from an R object

Description

The as_polars_series() function creates a polars Series from various R objects. The Data Type of the Series is determined by the class of the input object.

Usage

as_polars_series(x, name = NULL, ...)

## Default S3 method:
as_polars_series(x, name = NULL, ...)

## S3 method for class 'polars_series'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'polars_data_frame'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'double'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'integer'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'character'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'logical'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'raw'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'factor'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'Date'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'POSIXct'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'POSIXlt'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'difftime'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'hms'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'blob'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'array'
as_polars_series(x, name = NULL, ...)

## S3 method for class ''NULL''
as_polars_series(x, name = NULL, ...)

## S3 method for class 'list'
as_polars_series(x, name = NULL, ..., strict = FALSE)

## S3 method for class 'AsIs'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'data.frame'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'integer64'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'ITime'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'vctrs_unspecified'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'vctrs_rcrd'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'clock_time_point'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'clock_sys_time'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'clock_zoned_time'
as_polars_series(x, name = NULL, ...)

## S3 method for class 'clock_duration'
as_polars_series(x, name = NULL, ...)

Arguments

x

An R object.

name

A single string or NULL. Name of the Series. Will be used as a column name when used in a polars DataFrame. When not specified, name is set to an empty string.

...

Additional arguments passed to the methods.

strict

A logical value to indicate whether throwing an error when the input list's elements have different data types. If FALSE (default), all elements are automatically cast to the super type, or, casting to the super type is failed, the value will be null. If TRUE, the first non-NULL element's data type is used as the data type of the inner Series.

Details

The default method of as_polars_series() throws an error, so we need to define S3 methods for the classes we want to support.

S3 method for list and list based classes

In R, a list can contain elements of different types, but in Polars (Apache Arrow), all elements must have the same type. So the as_polars_series() function automatically casts all elements to the same type or throws an error, depending on the strict argument. If you want to create a list with all elements of the same type in R, consider using the vctrs::list_of() function.

Since a list can contain another list, the strict argument is also used when creating Series from the inner list in the case of classes constructed on top of a list, such as data.frame or vctrs_rcrd.

S3 method for Date

Sub-day values will be ignored (floored to the day).

S3 method for POSIXct

Sub-millisecond values will be ignored (floored to the millisecond).

If the tzone attribute is not present or an empty string (""), the Series' dtype will be Datetime without timezone.

S3 method for POSIXlt

Sub-nanosecond values will be ignored (floored to the nanosecond).

S3 method for difftime

Sub-millisecond values will be rounded to milliseconds.

S3 method for hms

Sub-nanosecond values will be ignored (floored to the nanosecond).

If the hms vector contains values greater-equal to 24-oclock or less than 0-oclock, an error will be thrown.

S3 method for clock_duration

Calendrical durations (years, quarters, months) are treated as chronologically with the internal representation of seconds. Please check the clock_duration documentation for more details.

S3 method for polars_data_frame

This method is a shortcut for <DataFrame>$to_struct().

Value

A polars Series

See Also

Examples

# double
as_polars_series(c(NA, 1, 2))

# integer
as_polars_series(c(NA, 1:2))

# character
as_polars_series(c(NA, "foo", "bar"))

# logical
as_polars_series(c(NA, TRUE, FALSE))

# raw
as_polars_series(charToRaw("foo"))

# factor
as_polars_series(factor(c(NA, "a", "b")))

# Date
as_polars_series(as.Date(c(NA, "2021-01-01")))

## Sub-day precision will be ignored
as.Date(c(-0.5, 0, 0.5)) |>
  as_polars_series()

# POSIXct with timezone
as_polars_series(as.POSIXct(c(NA, "2021-01-01 00:00:00.123456789"), "UTC"))

# POSIXct without timezone
as_polars_series(as.POSIXct(c(NA, "2021-01-01 00:00:00.123456789")))

# POSIXlt
as_polars_series(as.POSIXlt(c(NA, "2021-01-01 00:00:00.123456789"), "UTC"))

# difftime
as_polars_series(as.difftime(c(NA, 1), units = "days"))

## Sub-millisecond values will be rounded to milliseconds
as.difftime(c(0.0005, 0.0010, 0.0015, 0.0020), units = "secs") |>
  as_polars_series()

as.difftime(c(0.0005, 0.0010, 0.0015, 0.0020), units = "weeks") |>
  as_polars_series()

# NULL
as_polars_series(NULL)

# list
as_polars_series(list(NA, NULL, list(), 1, "foo", TRUE))

## 1st element will be `null` due to the casting failure
as_polars_series(list(list("bar"), "foo"))

# data.frame
as_polars_series(
  data.frame(x = 1:2, y = c("foo", "bar"), z = I(list(1, 2)))
)

# vctrs_unspecified
if (requireNamespace("vctrs", quietly = TRUE)) {
  as_polars_series(vctrs::unspecified(3L))
}

# hms
if (requireNamespace("hms", quietly = TRUE)) {
  as_polars_series(hms::as_hms(c(NA, "01:00:00")))
}

# blob
if (requireNamespace("blob", quietly = TRUE)) {
  as_polars_series(blob::as_blob(c(NA, "foo", "bar")))
}

# integer64
if (requireNamespace("bit64", quietly = TRUE)) {
  as_polars_series(bit64::as.integer64(c(NA, "9223372036854775807")))
}

# clock_naive_time
if (requireNamespace("clock", quietly = TRUE)) {
  as_polars_series(clock::naive_time_parse(c(
    NA,
    "1900-01-01T12:34:56.123456789",
    "2020-01-01T12:34:56.123456789"
  ), precision = "nanosecond"))
}

# clock_duration
if (requireNamespace("clock", quietly = TRUE)) {
  as_polars_series(clock::duration_nanoseconds(c(NA, 1)))
}

## Calendrical durations are treated as chronologically
if (requireNamespace("clock", quietly = TRUE)) {
  as_polars_series(clock::duration_years(c(NA, 1)))
}

Export the polars object as a tibble data frame

Description

This S3 method is basically a shortcut of as_polars_df(x, ...)$to_struct()$to_r_vector(ensure_vector = FALSE, struct = "tibble"). Additionally, you can check or repair the column names by specifying the .name_repair argument. Because polars DataFrame allows empty column name, which is not generally valid column name in R data frame.

Usage

## S3 method for class 'polars_data_frame'
as_tibble(
  x,
  ...,
  .name_repair = c("check_unique", "unique", "universal", "minimal"),
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

## S3 method for class 'polars_lazy_frame'
as_tibble(
  x,
  ...,
  .name_repair = c("check_unique", "unique", "universal", "minimal"),
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

Arguments

x

A polars object

...

Passed to as_polars_df().

.name_repair

Treatment of problematic column names:

  • "minimal": No name repair or checks, beyond basic existence,

  • "unique": Make sure names are unique and not empty,

  • "check_unique": (default value), no name repair, but check they are unique,

  • "universal": Make the names unique and syntactic

  • a function: apply custom name repair (e.g., .name_repair = make.names for names in the style of base R).

  • A purrr-style anonymous function, see rlang::as_function()

This argument is passed on as repair to vctrs::vec_as_names(). See there for more details on these terms and the strategies used to enforce them.

int64

Determine how to convert Polars' Int64, UInt32, or UInt64 type values to R type. One of the followings:

date

Determine how to convert Polars' Date type values to R class. One of the followings:

time

Determine how to convert Polars' Time type values to R class. One of the followings:

decimal

Determine how to convert Polars' Decimal type values to R type. One of the followings:

  • "double" (default): Convert to the R's double type.

  • "character": Convert to the R's character type.

as_clock_class

A logical value indicating whether to export datetimes and duration as the clock package's classes.

  • FALSE (default): Duration values are exported as difftime and datetime values are exported as POSIXct. Accuracy may be degraded.

  • TRUE: Duration values are exported as clock_duration, datetime without timezone values are exported as clock_naive_time, and datetime with timezone values are exported as clock_zoned_time. For this case, the clock package must be installed. Accuracy will be maintained.

ambiguous

Determine how to deal with ambiguous datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. Character vector or expression containing the followings:

  • "raise" (default): Throw an error

  • "earliest": Use the earliest datetime

  • "latest": Use the latest datetime

  • "null": Return a NA value

non_existent

Determine how to deal with non-existent datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. One of the followings:

  • "raise" (default): Throw an error

  • "null": Return a NA value

Value

A tibble

See Also

Examples

# Polars DataFrame may have empty column name
df <- pl$DataFrame(x = 1:2, c("a", "b"))
df

# Without checking or repairing the column names
tibble::as_tibble(df, .name_repair = "minimal")
tibble::as_tibble(df$lazy(), .name_repair = "minimal")

# You can make that unique
tibble::as_tibble(df, .name_repair = "unique")
tibble::as_tibble(df$lazy(), .name_repair = "unique")

Export the polars object as an R DataFrame

Description

This S3 method is a shortcut for as_polars_df(x, ...)$to_struct()$to_r_vector(ensure_vector = FALSE, struct = "dataframe").

Usage

## S3 method for class 'polars_data_frame'
as.data.frame(
  x,
  ...,
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

## S3 method for class 'polars_lazy_frame'
as.data.frame(
  x,
  ...,
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

Arguments

x

A polars object

...

Passed to as_polars_df().

int64

Determine how to convert Polars' Int64, UInt32, or UInt64 type values to R type. One of the followings:

date

Determine how to convert Polars' Date type values to R class. One of the followings:

time

Determine how to convert Polars' Time type values to R class. One of the followings:

decimal

Determine how to convert Polars' Decimal type values to R type. One of the followings:

  • "double" (default): Convert to the R's double type.

  • "character": Convert to the R's character type.

as_clock_class

A logical value indicating whether to export datetimes and duration as the clock package's classes.

  • FALSE (default): Duration values are exported as difftime and datetime values are exported as POSIXct. Accuracy may be degraded.

  • TRUE: Duration values are exported as clock_duration, datetime without timezone values are exported as clock_naive_time, and datetime with timezone values are exported as clock_zoned_time. For this case, the clock package must be installed. Accuracy will be maintained.

ambiguous

Determine how to deal with ambiguous datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. Character vector or expression containing the followings:

  • "raise" (default): Throw an error

  • "earliest": Use the earliest datetime

  • "latest": Use the latest datetime

  • "null": Return a NA value

non_existent

Determine how to deal with non-existent datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. One of the followings:

  • "raise" (default): Throw an error

  • "null": Return a NA value

Value

An R data frame

Examples

df <- as_polars_df(list(a = 1:3, b = 4:6))

as.data.frame(df)
as.data.frame(df$lazy())

Export the polars object as an R list

Description

This S3 method calls as_polars_df(x, ...)$get_columns() or as_polars_df(x, ...)$to_struct()$to_r_vector(ensure_vector = TRUE) depending on the as_series argument.

Usage

## S3 method for class 'polars_data_frame'
as.list(
  x,
  ...,
  as_series = FALSE,
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  struct = c("dataframe", "tibble"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

## S3 method for class 'polars_lazy_frame'
as.list(
  x,
  ...,
  as_series = FALSE,
  int64 = c("double", "character", "integer", "integer64"),
  date = c("Date", "IDate"),
  time = c("hms", "ITime"),
  struct = c("dataframe", "tibble"),
  decimal = c("double", "character"),
  as_clock_class = FALSE,
  ambiguous = c("raise", "earliest", "latest", "null"),
  non_existent = c("raise", "null")
)

Arguments

x

A polars object

...

Passed to as_polars_df().

as_series

Whether to convert each column to an R vector or a Series. If TRUE, return a list of Series, otherwise a list of vectors (default).

int64

Determine how to convert Polars' Int64, UInt32, or UInt64 type values to R type. One of the followings:

date

Determine how to convert Polars' Date type values to R class. One of the followings:

time

Determine how to convert Polars' Time type values to R class. One of the followings:

struct

Determine how to convert Polars' Struct type values to R class. One of the followings:

  • "dataframe" (default): Convert to the R's data.frame class.

  • "tibble": Convert to the tibble class. If the tibble package is not installed, a warning will be shown.

decimal

Determine how to convert Polars' Decimal type values to R type. One of the followings:

  • "double" (default): Convert to the R's double type.

  • "character": Convert to the R's character type.

as_clock_class

A logical value indicating whether to export datetimes and duration as the clock package's classes.

  • FALSE (default): Duration values are exported as difftime and datetime values are exported as POSIXct. Accuracy may be degraded.

  • TRUE: Duration values are exported as clock_duration, datetime without timezone values are exported as clock_naive_time, and datetime with timezone values are exported as clock_zoned_time. For this case, the clock package must be installed. Accuracy will be maintained.

ambiguous

Determine how to deal with ambiguous datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. Character vector or expression containing the followings:

  • "raise" (default): Throw an error

  • "earliest": Use the earliest datetime

  • "latest": Use the latest datetime

  • "null": Return a NA value

non_existent

Determine how to deal with non-existent datetimes. Only applicable when as_clock_class is set to FALSE and datetime without timezone values are exported as POSIXct. One of the followings:

  • "raise" (default): Throw an error

  • "null": Return a NA value

Details

Arguments other than x and as_series are passed to <Series>$to_r_vector(), so they are ignored when as_series=TRUE.

Value

A list

See Also

Examples

df <- as_polars_df(list(a = 1:3, b = 4:6))

as.list(df, as_series = TRUE)
as.list(df, as_series = FALSE)

as.list(df$lazy(), as_series = TRUE)
as.list(df$lazy(), as_series = FALSE)

Check if the object is a polars object

Description

Functions to check if the object is a polars object. ⁠is_*⁠ functions return TRUE of FALSE depending on the class of the object. ⁠check_*⁠ functions throw an informative error if the object is not the correct class. Suffixes are corresponding to the polars object classes:

Usage

is_polars_dtype(x)

is_polars_df(x)

is_polars_expr(x, ...)

is_polars_lf(x)

is_polars_selector(x, ...)

is_polars_series(x)

is_list_of_polars_dtype(x, n = NULL)

check_polars_dtype(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_polars_df(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_polars_expr(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_polars_lf(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_polars_selector(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_polars_series(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

check_list_of_polars_dtype(
  x,
  ...,
  allow_null = FALSE,
  arg = caller_arg(x),
  call = caller_env()
)

Arguments

x

An object to check.

...

Arguments passed to rlang::abort().

n

Expected length of a vector.

allow_null

If TRUE, NULL is allowed as a valid input.

arg

An argument name as a string. This argument will be mentioned in error messages as the input that is at the origin of a problem.

call

The execution environment of a currently running function, e.g. caller_env(). The function will be mentioned in error messages as the source of the error. See the call argument of abort() for more information.

Details

⁠check_polars_*⁠ functions are derived from the standalone-types-check functions from the rlang package (Can be installed with usethis::use_standalone("r-lib/rlang", file = "types-check")).

Value

  • ⁠is_polars_*⁠ functions return TRUE or FALSE.

  • ⁠check_polars_*⁠ functions return NULL invisibly if the input is valid.

Examples

is_polars_df(as_polars_df(mtcars))
is_polars_df(mtcars)

# Use `check_polars_*` functions in a function
# to ensure the input is a polars object
sample_func <- function(x) {
  check_polars_df(x)
  TRUE
}

sample_func(as_polars_df(mtcars))
try(sample_func(mtcars))

Polars column selector function namespace

Description

cs is an environment class object that stores all selector functions of the R Polars API which mimics the Python Polars API. It is intended to work the same way in Python as if you had imported Python Polars Selectors with ⁠import polars.selectors as cs⁠.

Usage

cs

Format

An object of class polars_object of length 29.

Supported operators

There are 4 supported operators for selectors:

  • & to combine conditions with AND, e.g. select columns that contain "oo" and end with "t" with cs$contains("oo") & cs$ends_with("t");

  • | to combine conditions with OR, e.g. select columns that contain "oo" or end with "t" with cs$contains("oo") | cs$ends_with("t");

  • - to substract conditions, e.g. select all columns that have alphanumeric names except those that contain "a" with cs$alphanumeric() - cs$contains("a");

  • ! to invert the selection, e.g. select all columns that are not of data type String with !cs$string().

Note that Python Polars uses ~ instead of ! to invert selectors.

Examples

cs

# How many members are in the `cs` environment?
length(cs)

Select all columns

Description

Select all columns

Usage

cs__all()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(dt = as.Date(c("2000-1-1")), value = 10)

# Select all columns, casting them to string:
df$select(cs$all()$cast(pl$String))

# Select all columns except for those matching the given dtypes:
df$select(cs$all() - cs$numeric())

Select all columns with alphabetic names (e.g. only letters)

Description

Select all columns with alphabetic names (e.g. only letters)

Usage

cs__alpha(ascii_only = FALSE, ..., ignore_spaces = FALSE)

Arguments

ascii_only

Indicate whether to consider only ASCII alphabetic characters, or the full Unicode range of valid letters (accented, idiographic, etc).

...

These dots are for future extensions and must be empty.

ignore_spaces

Indicate whether to ignore the presence of spaces in column names; if so, only the other (non-space) characters are considered.

Details

Matching column names cannot contain any non-alphabetic characters. Note that the definition of “alphabetic” consists of all valid Unicode alphabetic characters (⁠p{Alphabetic}⁠) by default; this can be changed by setting ascii_only = TRUE.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  no1 = c(100, 200, 300),
  café = c("espresso", "latte", "mocha"),
  `t or f` = c(TRUE, FALSE, NA),
  hmm = c("aaa", "bbb", "ccc"),
  都市 = c("東京", "大阪", "京都")
)

# Select columns with alphabetic names; note that accented characters and
# kanji are recognised as alphabetic here:
df$select(cs$alpha())

# Constrain the definition of “alphabetic” to ASCII characters only:
df$select(cs$alpha(ascii_only = TRUE))
df$select(cs$alpha(ascii_only = TRUE, ignore_spaces = TRUE))

# Select all columns except for those with alphabetic names:
df$select(!cs$alpha())
df$select(!cs$alpha(ignore_spaces = TRUE))

Select all columns with alphanumeric names (e.g. only letters and the digits 0-9)

Description

Select all columns with alphanumeric names (e.g. only letters and the digits 0-9)

Usage

cs__alphanumeric(ascii_only = FALSE, ..., ignore_spaces = FALSE)

Arguments

ascii_only

Indicate whether to consider only ASCII alphabetic characters, or the full Unicode range of valid letters (accented, idiographic, etc).

...

These dots are for future extensions and must be empty.

ignore_spaces

Indicate whether to ignore the presence of spaces in column names; if so, only the other (non-space) characters are considered.

Details

Matching column names cannot contain any non-alphabetic characters. Note that the definition of “alphabetic” consists of all valid Unicode alphabetic characters (⁠p{Alphabetic}⁠) and digit characters (d) by default; this can be changed by setting ascii_only = TRUE.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  `1st_col` = c(100, 200, 300),
  flagged = c(TRUE, FALSE, TRUE),
  `00prefix` = c("01:aa", "02:bb", "03:cc"),
  `last col` = c("x", "y", "z")
)

# Select columns with alphanumeric names:
df$select(cs$alphanumeric())
df$select(cs$alphanumeric(ignore_spaces = TRUE))

# Select all columns except for those with alphanumeric names:
df$select(!cs$alphanumeric())
df$select(!cs$alphanumeric(ignore_spaces = TRUE))

Select all binary columns

Description

Select all binary columns

Usage

cs__binary()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  a = charToRaw("hello"),
  b = "world",
  c = charToRaw("!"),
  d = ":"
)

# Select binary columns:
df$select(cs$binary())

# Select all columns except for those that are binary:
df$select(!cs$binary())

Select all boolean columns

Description

Select all boolean columns

Usage

cs__boolean()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  a = 1:4,
  b = c(FALSE, TRUE, FALSE, TRUE)
)

# Select and invert boolean columns:
df$with_columns(inverted = cs$boolean()$not())

# Select all columns except for those that are boolean:
df$select(!cs$boolean())

Select all columns matching the given dtypes

Description

Select all columns matching the given dtypes

Usage

cs__by_dtype(...)

Arguments

...

<dynamic-dots> Data types to select.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  dt = as.Date(c("1999-12-31", "2024-1-1", "2010-7-5")),
  value = c(1234500, 5000555, -4500000),
  other = c("foo", "bar", "foo")
)

# Select all columns with date or string dtypes:
df$select(cs$by_dtype(pl$Date, pl$String))

# Select all columns that are not of date or string dtype:
df$select(!cs$by_dtype(pl$Date, pl$String))

# Group by string columns and sum the numeric columns:
df$group_by(cs$string())$agg(cs$numeric()$sum())$sort("other")

Select all columns matching the given indices (or range objects)

Description

Select all columns matching the given indices (or range objects)

Usage

cs__by_index(indices)

Arguments

indices

One or more column indices (or ranges). Negative indexing is supported.

Details

Matching columns are returned in the order in which their indexes appear in the selector, not the underlying schema order.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

vals <- as.list(0.5 * 0:100)
names(vals) <- paste0("c", 0:100)
df <- pl$DataFrame(!!!vals)
df

# Select columns by index (the two first/last columns):
df$select(cs$by_index(c(0, 1, -2, -1)))

# Use seq()
df$select(cs$by_index(c(0, seq(1, 101, 20))))
df$select(cs$by_index(c(0, seq(101, 0, -25))))

# Select only odd-indexed columns:
df$select(!cs$by_index(seq(0, 100, 2)))

Select all columns matching the given names

Description

Select all columns matching the given names

Usage

cs__by_name(..., require_all = TRUE)

Arguments

...

<dynamic-dots> Column names to select.

require_all

Whether to match all names (the default) or any of the names.

Details

Matching columns are returned in the order in which their indexes appear in the selector, not the underlying schema order.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select columns by name:
df$select(cs$by_name("foo", "bar"))

# Match any of the given columns by name:
df$select(cs$by_name("baz", "moose", "foo", "bear", require_all = FALSE))

# Match all columns except for those given:
df$select(!cs$by_name("foo", "bar"))

Select all categorical columns

Description

Select all categorical columns

Usage

cs__categorical()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("xx", "yy"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  .schema_overrides = list(foo = pl$Categorical()),
)

# Select categorical columns:
df$select(cs$categorical())

# Select all columns except for those that are categorical:
df$select(!cs$categorical())

Select columns whose names contain the given literal substring(s)

Description

Select columns whose names contain the given literal substring(s)

Usage

cs__contains(...)

Arguments

...

<dynamic-dots> Substring(s) that matching column names should contain.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select columns that contain the substring "ba":
df$select(cs$contains("ba"))

# Select columns that contain the substring "ba" or the letter "z":
df$select(cs$contains("ba", "z"))

# Select all columns except for those that contain the substring "ba":
df$select(!cs$contains("ba"))

Select all date columns

Description

Select all date columns

Usage

cs__date()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  dtm = as.POSIXct(c("2001-5-7 10:25", "2031-12-31 00:30")),
  dt = as.Date(c("1999-12-31", "2024-8-9"))
)

# Select date columns:
df$select(cs$date())

# Select all columns except for those that are dates:
df$select(!cs$date())

Select all datetime columns

Description

Select all datetime columns

Usage

cs__datetime(time_unit = c("ms", "us", "ns"), time_zone = list("*", NULL))

Arguments

time_unit

One (or more) of the allowed time unit precision strings, "ms", "us", and "ns". Default is to select columns with any valid timeunit.

time_zone

One of the followings. The value or each element of the vector will be passed to the time_zone argument of the pl$Datetime() function:

  • A character vector of one or more timezone strings, as defined in OlsonNames().

  • NULL to select Datetime columns that do not have a timezone.

  • "*" to select Datetime columns that have any timezone.

  • A list of single timezone strings , "*", and NULL to select Datetime columns that do not have a timezone or have the (specific) timezone. For example, the default value list("*", NULL) selects all Datetime columns.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

chr_vec <- c("1999-07-21 05:20:16.987654", "2000-05-16 06:21:21.123456")
df <- pl$DataFrame(
  tstamp_tokyo = as.POSIXlt(chr_vec, tz = "Asia/Tokyo"),
  tstamp_utc = as.POSIXct(chr_vec, tz = "UTC"),
  tstamp = as.POSIXct(chr_vec),
  dt = as.Date(chr_vec),
)

# Select all datetime columns:
df$select(cs$datetime())

# Select all datetime columns that have "ms" precision:
df$select(cs$datetime("ms"))

# Select all datetime columns that have any timezone:
df$select(cs$datetime(time_zone = "*"))

# Select all datetime columns that have a specific timezone:
df$select(cs$datetime(time_zone = "UTC"))

# Select all datetime columns that have NO timezone:
df$select(cs$datetime(time_zone = NULL))

# Select all columns except for datetime columns:
df$select(!cs$datetime())

Select all decimal columns

Description

Select all decimal columns

Usage

cs__decimal()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c("2.0005", "-50.5555"),
  .schema_overrides = list(
    bar = pl$Decimal(),
    baz = pl$Decimal(scale = 5, precision = 10)
  )
)

# Select decimal columns:
df$select(cs$decimal())

# Select all columns except for those that are decimal:
df$select(!cs$decimal())

Select all columns having names consisting only of digits

Description

Select all columns having names consisting only of digits

Usage

cs__digit(ascii_only = FALSE)

Arguments

ascii_only

Indicate whether to consider only ASCII alphabetic characters, or the full Unicode range of valid letters (accented, idiographic, etc).

Details

Matching column names cannot contain any non-digit characters. Note that the definition of "digit" consists of all valid Unicode digit characters (d) by default; this can be changed by setting ascii_only = TRUE.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  key = c("aaa", "bbb"),
  `2001` = 1:2,
  `2025` = 3:4
)

# Select columns with digit names:
df$select(cs$digit())

# Select all columns except for those with digit names:
df$select(!cs$digit())

# Demonstrate use of ascii_only flag (by default all valid unicode digits
# are considered, but this can be constrained to ascii 0-9):
df <- pl$DataFrame(`१९९९` = 1999, `२०७७` = 2077, `3000` = 3000)
df$select(cs$digit())
df$select(cs$digit(ascii_only = TRUE))

Select all duration columns, optionally filtering by time unit

Description

Select all duration columns, optionally filtering by time unit

Usage

cs__duration(time_unit = c("ms", "us", "ns"))

Arguments

time_unit

One (or more) of the allowed time unit precision strings, "ms", "us", and "ns". Default is to select columns with any valid timeunit.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  dtm = as.POSIXct(c("2001-5-7 10:25", "2031-12-31 00:30")),
  dur_ms = clock::duration_milliseconds(1:2),
  dur_us = clock::duration_microseconds(1:2),
  dur_ns = clock::duration_nanoseconds(1:2),
)

# Select duration columns:
df$select(cs$duration())

# Select all duration columns that have "ms" precision:
df$select(cs$duration("ms"))

# Select all duration columns that have "ms" OR "ns" precision:
df$select(cs$duration(c("ms", "ns")))

# Select all columns except for those that are duration:
df$select(!cs$duration())

Select columns that end with the given substring(s)

Description

Select columns that end with the given substring(s)

Usage

cs__ends_with(...)

Arguments

...

<dynamic-dots> Substring(s) that matching column names should end with.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select columns that end with the substring "z":
df$select(cs$ends_with("z"))

# Select columns that end with either the letter "z" or "r":
df$select(cs$ends_with("z", "r"))

# Select all columns except for those that end with the substring "z":
df$select(!cs$ends_with("z"))

Select all columns except those matching the given columns, datatypes, or selectors

Description

Select all columns except those matching the given columns, datatypes, or selectors

Usage

cs__exclude(...)

Arguments

...

<dynamic-dots> Column names to exclude.

Details

If excluding a single selector it is simpler to write as !selector instead.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  aa = 1:3,
  ba = c("a", "b", NA),
  cc = c(NA, 2.5, 1.5)
)

# Exclude by column name(s):
df$select(cs$exclude("ba", "xx"))

# Exclude using a column name, a selector, and a dtype:
df$select(cs$exclude("aa", cs$string(), pl$Int32))

Select the first column in the current scope

Description

Select the first column in the current scope

Usage

cs__first()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123L, 456L),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select the first column:
df$select(cs$first())

# Select everything except for the first column:
df$select(!cs$first())

Select all float columns.

Description

Select all float columns.

Usage

cs__float()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123L, 456L),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE),
  .schema_overrides = list(baz = pl$Float32, zap = pl$Float64),
)

# Select all float columns:
df$select(cs$float())

# Select all columns except for those that are float:
df$select(!cs$float())

Select all integer columns.

Description

Select all integer columns.

Usage

cs__integer()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123L, 456L),
  baz = c(2.0, 5.5),
  zap = 0:1
)

# Select all integer columns:
df$select(cs$integer())

# Select all columns except for those that are integer:
df$select(!cs$integer())

Select the last column in the current scope

Description

Select the last column in the current scope

Usage

cs__last()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123L, 456L),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select the last column:
df$select(cs$last())

# Select everything except for the last column:
df$select(!cs$last())

Select all columns that match the given regex pattern

Description

Select all columns that match the given regex pattern

Usage

cs__matches(pattern)

Arguments

pattern

A valid regular expression pattern, compatible with the ⁠regex crate <https://docs.rs/regex/latest/regex/>⁠_.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  zap = c(0, 1)
)

# Match column names containing an "a", preceded by a character that is not
# "z":
df$select(cs$matches("[^z]a"))

# Do not match column names ending in "R" or "z" (case-insensitively):
df$select(!cs$matches(r"((?i)R|z$)"))

Select all numeric columns.

Description

Select all numeric columns.

Usage

cs__numeric()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123L, 456L),
  baz = c(2.0, 5.5),
  zap = 0:1,
  .schema_overrides = list(bar = pl$Int16, baz = pl$Float32, zap = pl$UInt8),
)

# Select all numeric columns:
df$select(cs$numeric())

# Select all columns except for those that are numeric:
df$select(!cs$numeric())

Select all signed integer columns

Description

Select all signed integer columns

Usage

cs__signed_integer()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c(-123L, -456L),
  bar = c(3456L, 6789L),
  baz = c(7654L, 4321L),
  zap = c("ab", "cd"),
  .schema_overrides = list(bar = pl$UInt32, baz = pl$UInt64),
)

# Select signed integer columns:
df$select(cs$signed_integer())

# Select all columns except for those that are signed integer:
df$select(!cs$signed_integer())

# Select all integer columns (both signed and unsigned):
df$select(cs$integer())

Select columns that start with the given substring(s)

Description

Select columns that start with the given substring(s)

Usage

cs__starts_with(...)

Arguments

...

<dynamic-dots> Substring(s) that matching column names should end with.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c("x", "y"),
  bar = c(123, 456),
  baz = c(2.0, 5.5),
  zap = c(FALSE, TRUE)
)

# Select columns that start with the substring "b":
df$select(cs$starts_with("b"))

# Select columns that start with either the letter "b" or "z":
df$select(cs$starts_with("b", "z"))

# Select all columns except for those that start with the substring "b":
df$select(!cs$starts_with("b"))

Select all String (and, optionally, Categorical) string columns.

Description

Select all String (and, optionally, Categorical) string columns.

Usage

cs__string(..., include_categorical = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

include_categorical

If TRUE, also select categorical columns.

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  w = c("xx", "yy", "xx", "yy", "xx"),
  x = c(1, 2, 1, 4, -2),
  y = c(3.0, 4.5, 1.0, 2.5, -2.0),
  z = c("a", "b", "a", "b", "b")
)$with_columns(
  z = pl$col("z")$cast(pl$Categorical())
)

# Group by all string columns, sum the numeric columns, then sort by the
# string cols:
df$group_by(cs$string())$agg(cs$numeric()$sum())$sort(cs$string())

# Group by all string and categorical columns:
df$
  group_by(cs$string(include_categorical = TRUE))$
  agg(cs$numeric()$sum())$
  sort(cs$string(include_categorical = TRUE))

Select all temporal columns

Description

Select all temporal columns

Usage

cs__temporal()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  dtm = as.POSIXct(c("2001-5-7 10:25", "2031-12-31 00:30")),
  dt = as.Date(c("1999-12-31", "2024-8-9")),
  value = 1:2
)

# Match all temporal columns:
df$select(cs$temporal())

# Match all temporal columns except for time columns:
df$select(cs$temporal() - cs$datetime())

# Match all columns except for temporal columns:
df$select(!cs$temporal())

Select all time columns

Description

Select all time columns

Usage

cs__time()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  dtm = as.POSIXct(c("2001-5-7 10:25", "2031-12-31 00:30")),
  dt = as.Date(c("1999-12-31", "2024-8-9")),
  tm = hms::parse_hms(c("0:0:0", "23:59:59"))
)

# Select time columns:
df$select(cs$time())

# Select all columns except for those that are time:
df$select(!cs$time())

Select all unsigned integer columns

Description

Select all unsigned integer columns

Usage

cs__unsigned_integer()

Value

A Polars selector

See Also

cs for the documentation on operators supported by Polars selectors.

Examples

df <- pl$DataFrame(
  foo = c(-123L, -456L),
  bar = c(3456L, 6789L),
  baz = c(7654L, 4321L),
  zap = c("ab", "cd"),
  .schema_overrides = list(bar = pl$UInt32, baz = pl$UInt64),
)

# Select unsigned integer columns:
df$select(cs$unsigned_integer())

# Select all columns except for those that are unsigned integer:
df$select(!cs$unsigned_integer())

# Select all integer columns (both unsigned and unsigned):
df$select(cs$integer())

Cast DataFrame column(s) to the specified dtype

Description

Cast DataFrame column(s) to the specified dtype

Usage

dataframe__cast(..., .strict = TRUE)

Value

A polars DataFrame

Examples

df <- pl$DataFrame(
  foo = 1:3,
  bar = c(6, 7, 8),
  ham = as.Date(c("2020-01-02", "2020-03-04", "2020-05-06"))
)

# Cast only some columns
df$cast(foo = pl$Float32, bar = pl$UInt8)

# Cast all columns to the same type
df$cast(pl$String)

Clone a DataFrame

Description

This is a cheap operation that does not copy data. Assigning does not copy the DataFrame (environment object). This is because environment objects have reference semantics. Calling $clone() creates a new environment, which can be useful when dealing with attributes (see examples).

Usage

dataframe__clone()

Value

A polars DataFrame

Examples

df1 <- as_polars_df(iris)

# Assigning does not copy the DataFrame (environment object), calling
# $clone() creates a new environment.
df2 <- df1
df3 <- df1$clone()
rlang::env_label(df1)
rlang::env_label(df2)
rlang::env_label(df3)

# Cloning can be useful to add attributes to data used in a function without
# adding those attributes to the original object.

# Make a function to take a DataFrame, add an attribute, and return a
# DataFrame:
give_attr <- function(data) {
  attr(data, "created_on") <- "2024-01-29"
  data
}
df2 <- give_attr(df1)

# Problem: the original DataFrame also gets the attribute while it shouldn't
attributes(df1)

# Use $clone() inside the function to avoid that
give_attr <- function(data) {
  data <- data$clone()
  attr(data, "created_on") <- "2024-01-29"
  data
}
df1 <- as_polars_df(iris)
df2 <- give_attr(df1)

# now, the original DataFrame doesn't get this attribute
attributes(df1)

Drop columns of a DataFrame

Description

Drop columns of a DataFrame

Usage

dataframe__drop(..., strict = TRUE)

Arguments

...

<dynamic-dots> Characters of column names to drop. Passed to pl$col().

strict

Validate that all column names exist in the schema and throw an exception if a column name does not exist in the schema.

Value

A polars DataFrame

Examples

as_polars_df(mtcars)$drop(c("mpg", "hp"))

# equivalent
as_polars_df(mtcars)$drop("mpg", "hp")

Check whether the DataFrame is equal to another DataFrame

Description

Check whether the DataFrame is equal to another DataFrame

Usage

dataframe__equals(other, ..., null_equal = TRUE)

Arguments

other

DataFrame to compare with.

Value

A logical value

Examples

dat1 <- as_polars_df(iris)
dat2 <- as_polars_df(iris)
dat3 <- as_polars_df(mtcars)
dat1$equals(dat2)
dat1$equals(dat3)

Filter rows of a DataFrame

Description

Filter rows of a DataFrame

Usage

dataframe__filter(...)

Value

A polars DataFrame

Examples

df <- as_polars_df(iris)

df$filter(pl$col("Sepal.Length") > 5)

# This is equivalent to
# df$filter(pl$col("Sepal.Length") > 5 & pl$col("Petal.Width") < 1)
df$filter(pl$col("Sepal.Length") > 5, pl$col("Petal.Width") < 1)

# rows where condition is NA are dropped
iris2 <- iris
iris2[c(1, 3, 5), "Species"] <- NA
df <- as_polars_df(iris2)

df$filter(pl$col("Species") == "setosa")

Get the DataFrame as a list of Series

Description

Get the DataFrame as a list of Series

Usage

dataframe__get_columns()

Value

A list of Series

See Also

Examples

df <- pl$DataFrame(foo = c(1, 2, 3), bar = c(4, 5, 6))
df$get_columns()

df <- pl$DataFrame(
  a = 1:4,
  b = c(0.5, 4, 10, 13),
  c = c(TRUE, TRUE, FALSE, TRUE)
)
df$get_columns()

Group a DataFrame

Description

Group a DataFrame

Usage

dataframe__group_by(..., .maintain_order = FALSE)

Details

Within each group, the order of the rows is always preserved, regardless of the maintain_order argument.

Value

GroupBy (a DataFrame with special groupby methods like ⁠$agg()⁠)

See Also

  • <DataFrame>$partition_by()

Examples

df <- pl$DataFrame(
  a = c("a", "b", "a", "b", "c"),
  b = c(1, 2, 1, 3, 3),
  c = c(5, 4, 3, 2, 1)
)

df$group_by("a")$agg(pl$col("b")$sum())

# Set `maintain_order = TRUE` to ensure the order of the groups is
# consistent with the input.
df$group_by("a", maintain_order = TRUE)$agg(pl$col("c"))

# Group by multiple columns by passing a list of column names.
df$group_by(c("a", "b"))$agg(pl$max("c"))

# Or pass some arguments to group by multiple columns in the same way.
# Expressions are also accepted.
df$group_by("a", pl$col("b") %/% 2)$agg(
  pl$col("c")$mean()
)

# The columns will be renamed to the argument names.
df$group_by(d = "a", e = pl$col("b") %/% 2)$agg(
  pl$col("c")$mean()
)

Convert an existing DataFrame to a LazyFrame

Description

Start a new lazy query from a DataFrame.

Usage

dataframe__lazy()

Value

A polars LazyFrame

Examples

pl$DataFrame(a = 1:2, b = c(NA, "a"))$lazy()

Get number of chunks used by the ChunkedArrays of this DataFrame

Description

Get number of chunks used by the ChunkedArrays of this DataFrame

Usage

dataframe__n_chunks(strategy = c("first", "all"))

Arguments

strategy

Return the number of chunks of the "first" column, or "all" columns in this DataFrame.

Value

An integer vector.

Examples

df <- pl$DataFrame(
  a = c(1, 2, 3, 4),
  b = c(0.5, 4, 10, 13),
  c = c(TRUE, TRUE, FALSE, TRUE)
)

df$n_chunks()
df$n_chunks(strategy = "all")

Rechunk the data in this DataFrame to a contiguous allocation

Description

This will make sure all subsequent operations have optimal and predictable performance.

Usage

dataframe__rechunk()

Value

A polars DataFrame


Select and modify columns of a DataFrame

Description

Select and perform operations on a subset of columns only. This discards unmentioned columns (like .() in data.table and contrarily to dplyr::mutate()).

One cannot use new variables in subsequent expressions in the same ⁠$select()⁠ call. For instance, if you create a variable x, you will only be able to use it in another ⁠$select()⁠ or ⁠$with_columns()⁠ call.

Usage

dataframe__select(...)

Arguments

...

<dynamic-dots> Name-value pairs of objects to be converted to polars expressions by the as_polars_expr() function. Characters are parsed as column names, other non-expression inputs are parsed as literals. Each name will be used as the expression name.

Value

A polars DataFrame

Examples

as_polars_df(iris)$select(
  abs_SL = pl$col("Sepal.Length")$abs(),
  add_2_SL = pl$col("Sepal.Length") + 2
)

Get a slice of the DataFrame.

Description

Get a slice of the DataFrame.

Usage

dataframe__slice(offset, length = NULL)

Arguments

offset

Start index, can be a negative value. This is 0-indexed, so offset = 1 skips the first row.

length

Length of the slice. If NULL (default), all rows starting at the offset will be selected.

Value

A polars DataFrame

Examples

# skip the first 2 rows and take the 4 following rows
as_polars_df(mtcars)$slice(2, 4)

# this is equivalent to:
mtcars[3:6, ]

Sort a DataFrame

Description

Sort a DataFrame

Usage

dataframe__sort(
  ...,
  descending = FALSE,
  nulls_last = FALSE,
  multithreaded = TRUE,
  maintain_order = FALSE
)

Value

A polars DataFrame

Examples

df <- mtcars
df$mpg[1] <- NA
df <- as_polars_df(df)
df$sort("mpg")
df$sort("mpg", nulls_last = TRUE)
df$sort("cyl", "mpg")
df$sort(c("cyl", "mpg"))
df$sort(c("cyl", "mpg"), descending = TRUE)
df$sort(c("cyl", "mpg"), descending = c(TRUE, FALSE))
df$sort(pl$col("cyl"), pl$col("mpg"))

Select column as Series at index location

Description

Select column as Series at index location

Usage

dataframe__to_series(index = 0)

Arguments

index

Index of the column to return as Series. Defaults to 0, which is the first column.

Value

Series or NULL

Examples

df <- as_polars_df(iris[1:10, ])

# default is to extract the first column
df$to_series()

# Polars is 0-indexed, so we use index = 1 to extract the *2nd* column
df$to_series(index = 1)

# doesn't error if the column isn't there
df$to_series(index = 8)

Convert a DataFrame to a Series of type Struct

Description

Convert a DataFrame to a Series of type Struct

Usage

dataframe__to_struct(name = "")

Arguments

name

A character. Name for the struct Series.

Value

A Series of the struct type

See Also

Examples

df <- pl$DataFrame(
  a = 1:5,
  b = c("one", "two", "three", "four", "five"),
)
df$to_struct("nums")

Modify/append column(s) of a DataFrame

Description

Add columns or modify existing ones with expressions. This is similar to dplyr::mutate() as it keeps unmentioned columns (unlike ⁠$select()⁠).

However, unlike dplyr::mutate(), one cannot use new variables in subsequent expressions in the same ⁠$with_columns()⁠call. For instance, if you create a variable x, you will only be able to use it in another ⁠$with_columns()⁠ or ⁠$select()⁠ call.

Usage

dataframe__with_columns(...)

Arguments

...

<dynamic-dots> Name-value pairs of objects to be converted to polars expressions by the as_polars_expr() function. Characters are parsed as column names, other non-expression inputs are parsed as literals. Each name will be used as the expression name.

Value

A polars DataFrame

Examples

as_polars_df(iris)$with_columns(
  abs_SL = pl$col("Sepal.Length")$abs(),
  add_2_SL = pl$col("Sepal.Length") + 2
)

# same query
l_expr <- list(
  pl$col("Sepal.Length")$abs()$alias("abs_SL"),
  (pl$col("Sepal.Length") + 2)$alias("add_2_SL")
)
as_polars_df(iris)$with_columns(l_expr)

as_polars_df(iris)$with_columns(
  SW_add_2 = (pl$col("Sepal.Width") + 2),
  # unnamed expr will keep name "Sepal.Length"
  pl$col("Sepal.Length")$abs()
)

Compute absolute values

Description

Compute absolute values

Usage

expr__abs()

Value

A polars expression

Examples

df <- pl$DataFrame(a = -1:2)
df$with_columns(abs = pl$col("a")$abs())

Add two expressions

Description

Method equivalent of addition operator expr + other.

Usage

expr__add(other)

Arguments

other

Element to add. Can be a string (only if expr is a string), a numeric value or an other expression.

Value

A polars expression

See Also

  • Arithmetic operators

Examples

df <- pl$DataFrame(x = 1:5)

df$with_columns(
  `x+int` = pl$col("x")$add(2L),
  `x+expr` = pl$col("x")$add(pl$col("x")$cum_prod())
)

df <- pl$DataFrame(
  x = c("a", "d", "g"),
  y = c("b", "e", "h"),
  z = c("c", "f", "i")
)

df$with_columns(
  pl$col("x")$add(pl$col("y"))$add(pl$col("z"))$alias("xyz")
)

Get the group indexes of the group by operation

Description

Should be used in aggregation context only.

Usage

expr__agg_groups()

Value

A polars expression

Examples

df <- pl$DataFrame(
  group = rep(c("one", "two"), each = 3),
  value = c(94, 95, 96, 97, 97, 99)
)

df$group_by("group", maintain_order = TRUE)$agg(pl$col("value")$agg_groups())

Rename the expression

Description

Rename the expression

Usage

expr__alias(name)

Arguments

name

The new name.

Value

A polars expression

Examples

# Rename an expression to avoid overwriting an existing column
df <- pl$DataFrame(a = 1:3, b = c("x", "y", "z"))
df$with_columns(
  pl$col("a") + 10,
  pl$col("b")$str$to_uppercase()$alias("c")
)

# Overwrite the default name of literal columns to prevent errors due to
# duplicate column names.
df$with_columns(
  pl$lit(TRUE)$alias("c"),
  pl$lit(4)$alias("d")
)

Check if all boolean values in a column are true

Description

This method is an expression - not to be confused with pl$all() which is a function to select all columns.

Usage

expr__all(..., ignore_nulls = TRUE)

Arguments

...

These dots are for future extensions and must be empty.

ignore_nulls

If TRUE (default), ignore null values. If FALSE, Kleene logic is used to deal with nulls: if the column contains any null values and no TRUE values, the output is null.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(TRUE, TRUE),
  b = c(TRUE, FALSE),
  c = c(NA, TRUE),
  d = c(NA, NA)
)

# By default, ignore null values. If there are only nulls, then all() returns
# TRUE.
df$select(pl$col("*")$all())

# If we set ignore_nulls = FALSE, then we don't know if all values in column
# "c" are TRUE, so it returns null
df$select(pl$col("*")$all(ignore_nulls = FALSE))

Apply logical AND on two expressions

Description

Combine two boolean expressions with AND.

Usage

expr__and(other)

Arguments

other

Element to add. Can be a string (only if expr is a string), a numeric value or an other expression.

Value

A polars expression

Examples

pl$lit(TRUE) & TRUE
pl$lit(TRUE)$and(pl$lit(TRUE))

Check if any boolean value in a column is true

Description

Check if any boolean value in a column is true

Usage

expr__any(..., ignore_nulls = TRUE)

Arguments

...

These dots are for future extensions and must be empty.

ignore_nulls

If TRUE (default), ignore null values. If FALSE, Kleene logic is used to deal with nulls: if the column contains any null values and no TRUE values, the output is null.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(TRUE, FALSE),
  b = c(FALSE, FALSE),
  c = c(NA, FALSE)
)

df$select(pl$col("*")$any())

# If we set ignore_nulls = FALSE, then we don't know if any values in column
# "c" is TRUE, so it returns null
df$select(pl$col("*")$any(ignore_nulls = FALSE))

Append expressions

Description

Append expressions

Usage

expr__append(other, ..., upcast = TRUE)

Arguments

other

Expression to append.

...

These dots are for future extensions and must be empty.

upcast

If TRUE (default), cast both Series to the same supertype.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 8:10, b = c(NA, 4, 4))
df$select(pl$all()$head(1)$append(pl$all()$tail(1)))

Approximate count of unique values

Description

This is done using the HyperLogLog++ algorithm for cardinality estimation.

Usage

expr__approx_n_unique()

Value

A polars expression

Examples

df <- pl$DataFrame(n = c(1, 1, 2))
df$select(pl$col("n")$approx_n_unique())

df <- pl$DataFrame(n = 0:1000)
df$select(
  exact = pl$col("n")$n_unique(),
  approx = pl$col("n")$approx_n_unique()
)

Compute inverse cosine

Description

Compute inverse cosine

Usage

expr__arccos()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, cos(0.5), 0, 1, NA))$
  with_columns(arccos = pl$col("a")$arccos())

Compute inverse hyperbolic cosine

Description

Compute inverse hyperbolic cosine

Usage

expr__arccosh()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, cosh(0.5), 0, 1, NA))$
  with_columns(arccosh = pl$col("a")$arccosh())

Compute inverse sine

Description

Compute inverse sine

Usage

expr__arcsin()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, sin(0.5), 0, 1, NA))$
  with_columns(arcsin = pl$col("a")$arcsin())

Compute inverse hyperbolic sine

Description

Compute inverse hyperbolic sine

Usage

expr__arcsinh()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, sinh(0.5), 0, 1, NA))$
  with_columns(arcsinh = pl$col("a")$arcsinh())

Compute inverse tangent

Description

Compute inverse tangent

Usage

expr__arctan()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, tan(0.5), 0, 1, NA_real_))$
  with_columns(arctan = pl$col("a")$arctan())

Compute inverse hyperbolic tangent

Description

Compute inverse hyperbolic tangent

Usage

expr__arctanh()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, tanh(0.5), 0, 1, NA))$
  with_columns(arctanh = pl$col("a")$arctanh())

Get the index of the maximal value

Description

Get the index of the maximal value

Usage

expr__arg_max()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(20, 10, 30))
df$select(pl$col("a")$arg_max())

Get the index of the minimal value

Description

Get the index of the minimal value

Usage

expr__arg_min()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(20, 10, 30))
df$select(pl$col("a")$arg_min())

Index of a sort

Description

Get the index values that would sort this column.

Usage

expr__arg_sort(..., descending = FALSE, nulls_last = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

descending

Sort in descending order.

nulls_last

Place null values last.

Value

A polars expression

See Also

pl$arg_sort_by() to find the row indices that would sort multiple columns.

Examples

pl$DataFrame(
  a = c(6, 1, 0, NA, Inf, NaN)
)$with_columns(arg_sorted = pl$col("a")$arg_sort())

Return indices where expression is true

Description

Return indices where expression is true

Usage

expr__arg_true()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 1))
df$select((pl$col("a") == 1)$arg_true())

Get the index of the first unique value

Description

Get the index of the first unique value

Usage

expr__arg_unique()

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3, b = c(NA, 4, 4))
df$select(pl$col("a")$arg_unique())
df$select(pl$col("b")$arg_unique())

Fill missing values with the next non-null value

Description

Fill missing values with the next non-null value

Usage

expr__backward_fill(limit = NULL)

Arguments

fill

The number of consecutive null values to backward fill.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA),
  b = c(4, NA, 6),
  c = c(NA, NA, 2)
)
df$select(pl$all()$backward_fill())
df$select(pl$all()$backward_fill(limit = 1))

Return the k smallest elements

Description

Non-null elements are always preferred over null elements. The output is not guaranteed to be in any particular order, call $sort() after this function if you wish the output to be sorted. This has time complexity O(n)O(n).

Usage

expr__bottom_k(k = 5)

Arguments

k

Number of elements to return.

Value

A polars expression

Examples

df <- pl$DataFrame(value = c(1, 98, 2, 3, 99, 4))
df$select(
  top_k = pl$col("value")$top_k(k = 3),
  bottom_k = pl$col("value")$bottom_k(k = 3)
)

Return the elements corresponding to the k smallest elements of the by column(s)

Description

Non-null elements are always preferred over null elements. The output is not guaranteed to be in any particular order, call $sort() after this function if you wish the output to be sorted. This has time complexity O(n)O(n).

Usage

expr__bottom_k_by(by, k = 5, ..., reverse = FALSE)

Arguments

by

Column(s) used to determine the smallest elements. Accepts expression input. Strings are parsed as column names.

k

Number of elements to return.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = 1:6,
  b = 6:1,
  c = c("Apple", "Orange", "Apple", "Apple", "Banana", "Banana")
)

# Get the bottom 2 rows by column a or b:
df$select(
  pl$all()$bottom_k_by("a", 2)$name$suffix("_btm_by_a"),
  pl$all()$bottom_k_by("b", 2)$name$suffix("_btm_by_b")
)

# Get the bottom 2 rows by multiple columns with given order.
df$select(
  pl$all()$
    bottom_k_by(c("c", "a"), 2, reverse = c(FALSE, TRUE))$
    name$suffix("_btm_by_ca"),
  pl$all()$
    bottom_k_by(c("c", "b"), 2, reverse = c(FALSE, TRUE))$
    name$suffix("_btm_by_cb"),
)

# Get the bottom 2 rows by column a in each group
df$group_by("c", maintain_order = TRUE)$agg(
  pl$all()$bottom_k_by("a", 2)
)$explode(pl$all()$exclude("c"))

Cast between DataType

Description

Cast between DataType

Usage

expr__cast(dtype, ..., strict = TRUE, wrap_numerical = FALSE)

Arguments

dtype

DataType to cast to.

...

These dots are for future extensions and must be empty.

strict

If TRUE (default), an error will be thrown if cast failed at resolve time.

wrap_numerical

If TRUE, numeric casts wrap overflowing values instead of marking the cast as invalid.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3, b = c(1, 2, 3))
df$with_columns(
  pl$col("a")$cast(pl$dtypes$Float64),
  pl$col("b")$cast(pl$dtypes$Int32)
)

# strict FALSE, inserts null for any cast failure
pl$lit(c(100, 200, 300))$cast(pl$dtypes$UInt8, strict = FALSE)$to_series()

# strict TRUE, raise any failure as an error when query is executed.
tryCatch(
  {
    pl$lit("a")$cast(pl$dtypes$Float64, strict = TRUE)$to_series()
  },
  error = function(e) e
)

Compute cube root

Description

Compute cube root

Usage

expr__cbrt()

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4))$
  with_columns(cbrt = pl$col("a")$cbrt())

Rounds up to the nearest integer value

Description

This only works on floating point Series.

Usage

expr__ceil()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(0.3, 0.5, 1.0, 1.1))
df$with_columns(
  ceil = pl$col("a")$ceil()
)

Set values outside the given boundaries to the boundary value

Description

This method only works for numeric and temporal columns. To clip other data types, consider writing a when-then-otherwise expression.

Usage

expr__clip(lower_bound = NULL, upper_bound = NULL)

Arguments

lower_bound

Lower bound. Accepts expression input. Non-expression inputs are parsed as literals.

upper_bound

Upper bound. Accepts expression input. Non-expression inputs are parsed as literals.

Details

This method only works for numeric and temporal columns. To clip other data types, consider writing a when-then-otherwise expression.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(-50, 5, 50, NA))

# Specifying both a lower and upper bound:
df$with_columns(
  clip = pl$col("a")$clip(1, 10)
)

# Specifying only a single bound:
df$with_columns(
  clip = pl$col("a")$clip(upper_bound = 10)
)

Compute cosine

Description

Compute cosine

Usage

expr__cos()

Value

A polars expression

Examples

pl$DataFrame(a = c(0, pi / 2, pi, NA))$
  with_columns(cosine = pl$col("a")$cos())

Compute hyperbolic cosine

Description

Compute hyperbolic cosine

Usage

expr__cosh()

Value

A polars expression

Examples

pl$DataFrame(a = c(-1, acosh(2), 0, 1, NA))$
  with_columns(cosh = pl$col("a")$cosh())

Compute cotangent

Description

Compute cotangent

Usage

expr__cot()

Value

A polars expression

Examples

pl$DataFrame(a = c(0, pi / 2, -5, NA))$
  with_columns(cotangent = pl$col("a")$cot())

Get the number of non-null elements in the column

Description

Get the number of non-null elements in the column

Usage

expr__count()

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3, b = c(NA, 4, 4))
df$select(pl$all()$count())

Return the cumulative count of the non-null values in the column

Description

Return the cumulative count of the non-null values in the column

Usage

expr__cum_count(..., reverse = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

reverse

If TRUE, reverse the count.

Value

A polars expression

Examples

pl$DataFrame(a = 1:4)$with_columns(
  cum_count = pl$col("a")$cum_count(),
  cum_count_reversed = pl$col("a")$cum_count(reverse = TRUE)
)

Return the cumulative max computed at every element.

Description

Return the cumulative max computed at every element.

Usage

expr__cum_max(..., reverse = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

reverse

If TRUE, start from the last value.

Details

The Dtypes Int8, UInt8, Int16 and UInt16 are cast to Int64 before summing to prevent overflow issues.

Value

A polars expression

Examples

pl$DataFrame(a = c(1:4, 2L))$with_columns(
  cum_max = pl$col("a")$cum_max(),
  cum_max_reversed = pl$col("a")$cum_max(reverse = TRUE)
)

Return the cumulative min computed at every element.

Description

Return the cumulative min computed at every element.

Usage

expr__cum_min(..., reverse = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

reverse

If TRUE, start from the last value.

Details

The Dtypes Int8, UInt8, Int16 and UInt16 are cast to Int64 before summing to prevent overflow issues.

Value

A polars expression

Examples

pl$DataFrame(a = c(1:4, 2L))$with_columns(
  cum_min = pl$col("a")$cum_min(),
  cum_min_reversed = pl$col("a")$cum_min(reverse = TRUE)
)

Return the cumulative product computed at every element.

Description

Return the cumulative product computed at every element.

Usage

expr__cum_prod(..., reverse = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

reverse

If TRUE, start with the total product of elements and divide each row one by one.

Details

The Dtypes Int8, UInt8, Int16 and UInt16 are cast to Int64 before summing to prevent overflow issues.

Value

A polars expression

Examples

pl$DataFrame(a = 1:4)$with_columns(
  cum_prod = pl$col("a")$cum_prod(),
  cum_prod_reversed = pl$col("a")$cum_prod(reverse = TRUE)
)

Return the cumulative sum computed at every element.

Description

Return the cumulative sum computed at every element.

Usage

expr__cum_sum(..., reverse = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

reverse

If TRUE, start with the total sum of elements and substract each row one by one.

Details

The Dtypes Int8, UInt8, Int16 and UInt16 are cast to Int64 before summing to prevent overflow issues.

Value

A polars expression

Examples

pl$DataFrame(a = 1:4)$with_columns(
  cum_sum = pl$col("a")$cum_sum(),
  cum_sum_reversed = pl$col("a")$cum_sum(reverse = TRUE)
)

Return the cumulative count of the non-null values in the column

Description

Return the cumulative count of the non-null values in the column

Usage

expr__cumulative_eval(expr, ..., min_periods = 1, parallel = FALSE)

Arguments

expr

Expression to evaluate.

...

These dots are for future extensions and must be empty.

min_periods

Number of valid values (i.e. length - null_count) there should be in the window before the expression is evaluated.

parallel

Run in parallel. Don’t do this in a group by or another operation that already has much parallelization.

Details

This can be really slow as it can have O(n^2) complexity. Don’t use this for operations that visit all elements.

Value

A polars expression

Examples

df <- pl$DataFrame(values = 1:5)
df$with_columns(
  pl$col("values")$cumulative_eval(
    pl$element()$first() - pl$element()$last()**2
  )
)

Bin continuous values into discrete categories

Description

[Experimental]

Usage

expr__cut(
  breaks,
  ...,
  labels = NULL,
  left_closed = FALSE,
  include_breaks = FALSE
)

Arguments

breaks

List of unique cut points.

...

These dots are for future extensions and must be empty.

labels

Names of the categories. The number of labels must be equal to the number of cut points plus one.

left_closed

Set the intervals to be left-closed instead of right-closed.

include_breaks

Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Value

A polars expression

Examples

# Divide a column into three categories.
df <- pl$DataFrame(foo = -2:2)
df$with_columns(
  cut = pl$col("foo")$cut(c(-1, 1), labels = c("a", "b", "c"))
)

# Add both the category and the breakpoint.
df$with_columns(
  cut = pl$col("foo")$cut(c(-1, 1), include_breaks = TRUE)
)$unnest()

Convert from radians to degrees

Description

Convert from radians to degrees

Usage

expr__degrees()

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4) * pi)$
  with_columns(degrees = pl$col("a")$degrees())

Calculate the n-th discrete difference between elements

Description

Calculate the n-th discrete difference between elements

Usage

expr__diff(n = 1, null_behavior = c("ignore", "drop"))

Arguments

n

Integer indicating the number of slots to shift.

null_behavior

How to handle null values. Must be "ignore" (default), or "drop".

Value

A polars expression

Examples

pl$DataFrame(a = c(20, 10, 30, 25, 35))$with_columns(
  diff_default = pl$col("a")$diff(),
  diff_2_ignore = pl$col("a")$diff(2, "ignore")
)

Compute the dot/inner product between two Expressions

Description

Compute the dot/inner product between two Expressions

Usage

expr__dot(expr)

Arguments

other

Expression to compute dot product with.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 3, 5), b = c(2, 4, 6))
df$select(pl$col("a")$dot(pl$col("b")))

Drop all floating point NaN values

Description

The original order of the remaining elements is preserved. A NaN value is not the same as a null value. To drop null values, use $drop_nulls().

Usage

expr__drop_nans()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, 3, NaN))
df$select(pl$col("a")$drop_nans())

Drop all floating point null values

Description

The original order of the remaining elements is preserved. A null value is not the same as a NaN value. To drop NaN values, use $drop_nans().

Usage

expr__drop_nulls()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, 3, NaN))
df$select(pl$col("a")$drop_nulls())

Compute entropy

Description

Uses the formula ⁠-sum(pk * log(pk)⁠ where pk are discrete probabilities.

Usage

expr__entropy(base = exp(1), ..., normalize = TRUE)

Arguments

base

Numeric value used as base, defaults to exp(1).

...

These dots are for future extensions and must be empty.

normalize

Normalize pk if it doesn’t sum to 1.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$entropy(base = 2))
df$select(pl$col("a")$entropy(base = 2, normalize = FALSE))

Check equality

Description

This propagates null values, i.e. any comparison involving null will return null. Use $eq_missing() to consider null values as equal.

Usage

expr__eq(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

See Also

expr__eq_missing

Examples

df <- pl$DataFrame(x = c(NA, FALSE, TRUE), y = c(TRUE, TRUE, TRUE))
df$with_columns(
  eq = pl$col("x")$eq(pl$col("y")),
  eq_missing = pl$col("x")$eq_missing(pl$col("y"))
)

Check equality without null propagation

Description

This considers that null values are equal. It differs from $eq() where null values are propagated.

Usage

expr__eq_missing(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

See Also

expr__eq

Examples

df <- pl$DataFrame(x = c(NA, FALSE, TRUE), y = c(TRUE, TRUE, TRUE))
df$with_columns(
  eq = pl$col("x")$eq("y"),
  eq_missing = pl$col("x")$eq_missing("y")
)

Compute exponentially-weighted moving mean

Description

Compute exponentially-weighted moving mean

Usage

expr__ewm_mean(
  ...,
  com,
  span,
  half_life,
  alpha,
  adjust = TRUE,
  min_periods = 1,
  ignore_nulls = FALSE
)

Arguments

...

These dots are for future extensions and must be empty.

com

Specify decay in terms of center of mass, γ\gamma, with

α=11+γ    γ0\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0

.

span

Specify decay in terms of span, θ\theta, with

α=2θ+1    θ1\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1

half_life

Specify decay in terms of half-life, λ\lambda, with

α=1exp{ln(2)λ}    λ>0\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0

alpha

Specify smoothing factor alpha directly, 0<α10 < \alpha \leq 1.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings:

  • when TRUE (default), the EW function is calculated using weights wi=(1α)iw_i = (1 - \alpha)^i;

  • when FALSE, the EW function is calculated recursively by

    y0=x0y_0 = x_0

    yt=(1α)yt1+αxty_t = (1 - \alpha)y_{t - 1} + \alpha x_t

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

ignore_nulls

Ignore missing values when calculating weights.

  • when FALSE (default), weights are based on absolute positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are (1α)2(1-\alpha)^2 and 11 if adjust = TRUE, and (1α)2(1-\alpha)^2 and α\alpha if adjust = FALSE.

  • when TRUE, weights are based on relative positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are 1α1-\alpha and 11 if adjust = TRUE, and 1α1-\alpha and α\alpha if adjust = FALSE.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$ewm_mean(com = 1, ignore_nulls = FALSE))

Compute time-based exponentially weighted moving average

Description

Given observations x0x_0, x1x_1, ..., xn1x_{n-1} at times t0t_0, t1t_1, ..., tn1t_{n-1}, the EWMA is calculated as

y0=x0y_0 = x_0

αi=1exp{ln(2)(titi1)τ}\alpha_i = 1 - \exp \left\{ \frac{ -\ln(2)(t_i-t_{i-1}) } { \tau } \right\}

yi=αixi+(1αi)yi1;i>0y_i = \alpha_i x_i + (1 - \alpha_i) y_{i-1}; \quad i > 0

where τ\tau is the half_life.

Usage

expr__ewm_mean_by(by, ..., half_life)

Arguments

by

Times to calculate average by. Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type.

half_life

Unit over which observation decays to half its value. Can be created either from a timedelta, or by using the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

Value

A polars expression

Examples

df <- pl$DataFrame(
  values = c(0, 1, 2, NA, 4),
  times = as.Date(
    c("2020-01-01", "2020-01-03", "2020-01-10", "2020-01-15", "2020-01-17")
  )
)
df$with_columns(
  result = pl$col("values")$ewm_mean_by("times", half_life = "4d")
)

Compute exponentially-weighted moving standard deviation

Description

Compute exponentially-weighted moving standard deviation

Usage

expr__ewm_std(
  ...,
  com,
  span,
  half_life,
  alpha,
  adjust = TRUE,
  bias = FALSE,
  min_periods = 1,
  ignore_nulls = FALSE
)

Arguments

...

These dots are for future extensions and must be empty.

com

Specify decay in terms of center of mass, γ\gamma, with

α=11+γ    γ0\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0

.

span

Specify decay in terms of span, θ\theta, with

α=2θ+1    θ1\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1

half_life

Specify decay in terms of half-life, λ\lambda, with

α=1exp{ln(2)λ}    λ>0\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0

alpha

Specify smoothing factor alpha directly, 0<α10 < \alpha \leq 1.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings:

  • when TRUE (default), the EW function is calculated using weights wi=(1α)iw_i = (1 - \alpha)^i;

  • when FALSE, the EW function is calculated recursively by

    y0=x0y_0 = x_0

    yt=(1α)yt1+αxty_t = (1 - \alpha)y_{t - 1} + \alpha x_t

bias

If FALSE (default), apply a correction to make the estimate statistically unbiased.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

ignore_nulls

Ignore missing values when calculating weights.

  • when FALSE (default), weights are based on absolute positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are (1α)2(1-\alpha)^2 and 11 if adjust = TRUE, and (1α)2(1-\alpha)^2 and α\alpha if adjust = FALSE.

  • when TRUE, weights are based on relative positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are 1α1-\alpha and 11 if adjust = TRUE, and 1α1-\alpha and α\alpha if adjust = FALSE.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$ewm_std(com = 1, ignore_nulls = FALSE))

Compute exponentially-weighted moving variance

Description

Compute exponentially-weighted moving variance

Usage

expr__ewm_var(
  ...,
  com,
  span,
  half_life,
  alpha,
  adjust = TRUE,
  bias = FALSE,
  min_periods = 1,
  ignore_nulls = FALSE
)

Arguments

...

These dots are for future extensions and must be empty.

com

Specify decay in terms of center of mass, γ\gamma, with

α=11+γ    γ0\alpha = \frac{1}{1 + \gamma} \; \forall \; \gamma \geq 0

.

span

Specify decay in terms of span, θ\theta, with

α=2θ+1    θ1\alpha = \frac{2}{\theta + 1} \; \forall \; \theta \geq 1

half_life

Specify decay in terms of half-life, λ\lambda, with

α=1exp{ln(2)λ}    λ>0\alpha = 1 - \exp \left\{ \frac{ -\ln(2) }{ \lambda } \right\} \; \forall \; \lambda > 0

alpha

Specify smoothing factor alpha directly, 0<α10 < \alpha \leq 1.

adjust

Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings:

  • when TRUE (default), the EW function is calculated using weights wi=(1α)iw_i = (1 - \alpha)^i;

  • when FALSE, the EW function is calculated recursively by

    y0=x0y_0 = x_0

    yt=(1α)yt1+αxty_t = (1 - \alpha)y_{t - 1} + \alpha x_t

bias

If FALSE (default), apply a correction to make the estimate statistically unbiased.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

ignore_nulls

Ignore missing values when calculating weights.

  • when FALSE (default), weights are based on absolute positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are (1α)2(1-\alpha)^2 and 11 if adjust = TRUE, and (1α)2(1-\alpha)^2 and α\alpha if adjust = FALSE.

  • when TRUE, weights are based on relative positions. For example, the weights of x0x_0 and x2x_2 used in calculating the final weighted average of (x0x_0, null, x2x_2) are 1α1-\alpha and 11 if adjust = TRUE, and 1α1-\alpha and α\alpha if adjust = FALSE.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$ewm_var(com = 1, ignore_nulls = FALSE))

Exclude columns from a multi-column expression.

Description

Exclude columns from a multi-column expression.

Usage

expr__exclude(...)

Arguments

...

The name or datatype of the column(s) to exclude. Accepts regular expression input. Regular expressions should start with ^ and end with $.

Value

A polars expression

Examples

df <- pl$DataFrame(aa = 1:2, ba = c("a", NA), cc = c(NA, 2.5))
df

# Exclude by column name(s):
df$select(pl$all()$exclude("ba"))

# Exclude by regex, e.g. removing all columns whose names end with the
# letter "a":
df$select(pl$all()$exclude("^.*a$"))

# Exclude by dtype(s), e.g. removing all columns of type Int64 or Float64:
df$select(pl$all()$exclude(pl$Int64, pl$Float64))

Compute the exponential

Description

Compute the exponential

Usage

expr__exp()

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4))$
  with_columns(exp = pl$col("a")$exp())

Explode a list expression

Description

This means that every item is expanded to a new row.

Usage

expr__explode()

Value

A polars expression

Examples

df <- pl$DataFrame(
  groups = c("a", "b"),
  values = list(1:2, 3:4)
)

df$select(pl$col("values")$explode())

Extend the Series with n copies of a value

Description

Extend the Series with n copies of a value

Usage

expr__extend_constant(value, n)

Arguments

value

A constant literal value or a unit expression with which to extend the expression result Series. This can be NA to extend with nulls.

n

The number of additional values that will be added.

Value

A polars expression

Examples

df <- pl$DataFrame(values = 1:3)
df$select(pl$col("values")$extend_constant(99, n = 2))

Fill floating point NaN value with a fill value

Description

Fill floating point NaN value with a fill value

Usage

expr__fill_nan(value)

Arguments

value

Value used to fill NaN values.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, 2, NaN))
df$with_columns(
  filled_nan = pl$col("a")$fill_nan(99)
)

Fill floating point null value with a fill value

Description

Fill floating point null value with a fill value

Usage

expr__fill_null(value, strategy = NULL, limit = NULL)

Arguments

value

Value used to fill null values. Can be missing if strategy is specified. Accepts expression input, strings are parsed as column names.

strategy

Strategy used to fill null values. Must be one of "forward", "backward", "min", "max", "mean", "zero", "one".

limit

Number of consecutive null values to fill when using the "forward" or "backward" strategy.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, 2, NaN))
df$with_columns(
  filled_null_zero = pl$col("a")$fill_null(strategy = "zero"),
  filled_null_99 = pl$col("a")$fill_null(99),
  filled_null_forward = pl$col("a")$fill_null(strategy = "forward"),
  filled_null_expr = pl$col("a")$fill_null(pl$col("a")$median())
)

Filter the expression based on one or more predicate expressions

Description

Elements where the filter does not evaluate to TRUE are discarded, including nulls. This is mostly useful in an aggregation context. If you want to filter on a DataFrame level, use DataFrame$filter() or LazyFrame$filter().

Usage

expr__filter(...)

Arguments

...

<dynamic-dots> Expression(s) that evaluate to a boolean Series.

Value

A polars expression

Examples

df <- pl$DataFrame(
  group_col = c("g1", "g1", "g2"),
  b = c(1, 2, 3)
)
df

df$group_by("group_col")$agg(
  lt = pl$col("b")$filter(pl$col("b") < 2),
  gte = pl$col("b")$filter(pl$col("b") >= 2)
)

Get the first value

Description

Get the first value

Usage

expr__first()

Value

A polars expression

Examples

pl$DataFrame(x = 3:1)$with_columns(first = pl$col("x")$first())

Flatten a list or string column

Description

This is an alias for $explode().

Usage

expr__flatten()

Value

A polars expression

Examples

df <- pl$DataFrame(
  group = c("a", "b", "b"),
  values = list(1:2, 2:3, 4)
)

df$group_by("group")$agg(pl$col("values")$flatten())

Rounds down to the nearest integer value

Description

This only works on floating point Series.

Usage

expr__floor()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(0.3, 0.5, 1.0, 1.1))
df$with_columns(
  floor = pl$col("a")$floor()
)

Floor divide using two expressions

Description

Method equivalent of floor division operator expr %/% other. ⁠$floordiv()⁠ is an alias for ⁠$floor_div()⁠, which exists for compatibility with Python Polars.

Usage

expr__floor_div(other)

expr__floordiv(other)

Arguments

other

Numeric literal or expression value.

Value

A polars expression

See Also

Examples

df <- pl$DataFrame(x = 1:5)

df$with_columns(
  `x/2` = pl$col("x")$true_div(2),
  `x%/%2` = pl$col("x")$floor_div(2)
)

Fill missing values with the last non-null value

Description

Fill missing values with the last non-null value

Usage

expr__forward_fill(limit = NULL)

Arguments

fill

The number of consecutive null values to forward fill.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA),
  b = c(4, NA, 6),
  c = c(2, NA, NA)
)
df$select(pl$all()$forward_fill())
df$select(pl$all()$forward_fill(limit = 1))

Take values by index

Description

Take values by index

Usage

expr__gather(indices)

Arguments

indices

An expression that leads to a UInt32 dtyped Series.

Value

A polars expression

Examples

df <- pl$DataFrame(
  group = c("one", "one", "one", "two", "two", "two"),
  value = c(1, 98, 2, 3, 99, 4)
)
df$group_by("group", maintain_order = TRUE)$agg(
  pl$col("value")$gather(c(2, 1))
)

Take every n-th value in the Series and return as a new Series

Description

Take every n-th value in the Series and return as a new Series

Usage

expr__gather_every(n, offset = 0)

Arguments

n

Gather every n-th row.

offset

Starting index.

Value

A polars expression

Examples

df <- pl$DataFrame(foo = 1:9)
df$select(pl$col("foo")$gather_every(3))
df$select(pl$col("foo")$gather_every(3, offset = 1))

Check greater or equal inequality

Description

Check greater or equal inequality

Usage

expr__ge(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

Examples

df <- pl$DataFrame(x = 1:3)
df$with_columns(
  with_ge = pl$col("x")$ge(pl$lit(2)),
  with_symbol = pl$col("x") >= pl$lit(2)
)

Return a single value by index

Description

Return a single value by index

Usage

expr__get(index)

Arguments

index

An expression that leads to a UInt32 dtyped Series.

Value

A polars expression

Examples

df <- pl$DataFrame(
  group = c("one", "one", "one", "two", "two", "two"),
  value = c(1, 98, 2, 3, 99, 4)
)
df$group_by("group", maintain_order = TRUE)$agg(
  pl$col("value")$get(1)
)

Check greater or equal inequality

Description

Check greater or equal inequality

Usage

expr__gt(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

Examples

df <- pl$DataFrame(x = 1:3)
df$with_columns(
  with_gt = pl$col("x")$gt(pl$lit(2)),
  with_symbol = pl$col("x") > pl$lit(2)
)

Check whether the expression contains one or more null values

Description

Check whether the expression contains one or more null values

Usage

expr__has_nulls()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(NA, 1, NA),
  b = c(10, NA, 300),
  c = c(350, 650, 850)
)
df$select(pl$all()$has_nulls())

Hash elements

Description

Hash elements

Usage

expr__hash(seed = 0, seed_1 = NULL, seed_2 = NULL, seed_3 = NULL)

Arguments

seed

Integer, random seed parameter. Defaults to 0.

seed_1, seed_2, seed_3

Integer, random seed parameters. Default to seed if not set.

Details

This implementation of hash does not guarantee stable results across different Polars versions. Its stability is only guaranteed within a single version.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 2, NA), b = c("x", NA, "z"))
df$with_columns(pl$all()$hash(10, 20, 30, 40))

Get the first n elements

Description

Get the first n elements

Usage

expr__head(n = 10)

Arguments

n

Number of elements to take.

Value

A polars expression

Examples

pl$DataFrame(x = 1:11)$select(pl$col("x")$head(3))

Bin values into buckets and count their occurrences

Description

[Experimental]

Usage

expr__hist(
  bins = NULL,
  ...,
  bin_count = NULL,
  include_category = FALSE,
  include_breakpoint = FALSE
)

Arguments

bins

Discretizations to make. If NULL (default), we determine the boundaries based on the data.

...

These dots are for future extensions and must be empty.

bin_count

If no bins provided, this will be used to determine the distance of the bins.

include_category

Include a column that shows the intervals as categories.

include_breakpoint

Include a column that indicates the upper breakpoint.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 3, 8, 8, 2, 1, 3))
df$select(pl$col("a")$hist(bins = 1:3))
df$select(
  pl$col("a")$hist(
    bins = 1:3, include_category = TRUE, include_breakpoint = TRUE
  )
)

Aggregate values into a list

Description

Aggregate values into a list

Usage

expr__implode()

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3, b = 4:6)
df$with_columns(pl$col("a")$implode())

Fill null values using interpolation

Description

Fill null values using interpolation

Usage

expr__interpolate(method = c("linear", "nearest"))

Arguments

method

Interpolation method. Must be one of "linear" or "nearest".

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, 3), b = c(1, NaN, 3))
df$with_columns(
  a_interpolated = pl$col("a")$interpolate(),
  b_interpolated = pl$col("b")$interpolate()
)

Fill null values using interpolation based on another column

Description

Fill null values using interpolation based on another column

Usage

expr__interpolate_by(by)

Arguments

by

Column to interpolate values based on.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, NA, NA, 3), b = c(1, 2, 7, 8))
df$with_columns(
  a_interpolated = pl$col("a")$interpolate_by("b")
)

Check if an expression is between the given lower and upper bounds

Description

Check if an expression is between the given lower and upper bounds

Usage

expr__is_between(
  lower_bound,
  upper_bound,
  closed = c("both", "left", "right", "none")
)

Arguments

lower_bound

Lower bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

upper_bound

Upper bound value. Accepts expression input. Strings are parsed as column names, other non-expression inputs are parsed as literals.

closed

Define which sides of the interval are closed (inclusive). Must be one of "left", "right", "both" or "none".

Details

If the value of the lower_bound is greater than that of the upper_bound then the result will be FALSE, as no value can satisfy the condition.

Value

A polars expression

Examples

df <- pl$DataFrame(num = 1:5)
df$with_columns(
  is_between = pl$col("num")$is_between(2, 4)
)

# Use the closed argument to include or exclude the values at the bounds:
df$with_columns(
  is_between = pl$col("num")$is_between(2, 4, closed = "left")
)

# You can also use strings as well as numeric/temporal values (note: ensure
# that string literals are wrapped with lit so as not to conflate them with
# column names):
df <- pl$DataFrame(a = letters[1:5])
df$with_columns(
  is_between = pl$col("a")$is_between(pl$lit("a"), pl$lit("c"))
)

# Use column expressions as lower/upper bounds, comparing to a literal value:
df <- pl$DataFrame(a = 1:5, b = 5:1)
df$with_columns(
  between_ab = pl$lit(3)$is_between(pl$col("a"), pl$col("b"))
)

Return a boolean mask indicating duplicated values

Description

Return a boolean mask indicating duplicated values

Usage

expr__is_duplicated()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 3, 2))
df$select(pl$col("a")$is_duplicated())

Check if elements are finite

Description

Check if elements are finite

Usage

expr__is_finite()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 2), b = c(3, Inf))
df$with_columns(
  a_finite = pl$col("a")$is_finite(),
  b_finite = pl$col("b")$is_finite()
)

Return a boolean mask indicating the first occurrence of each distinct value

Description

Return a boolean mask indicating the first occurrence of each distinct value

Usage

expr__is_first_distinct()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 3, 2))
df$with_columns(
  is_first_distinct = pl$col("a")$is_first_distinct()
)

Check if elements of an expression are present in another expression

Description

Check if elements of an expression are present in another expression

Usage

expr__is_in(other)

Arguments

other

Accepts expression input. Strings are parsed as column names.

Value

A polars expression

Examples

df <- pl$DataFrame(
  sets = list(1:3, 1:2, 9:10),
  optional_members = 1:3
)
df$with_columns(
  contains = pl$col("optional_members")$is_in("sets")
)

Check if elements are infinite

Description

Check if elements are infinite

Usage

expr__is_infinite()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 2), b = c(3, Inf))
df$with_columns(
  a_infinite = pl$col("a")$is_infinite(),
  b_infinite = pl$col("b")$is_infinite()
)

Return a boolean mask indicating the last occurrence of each distinct value

Description

Return a boolean mask indicating the last occurrence of each distinct value

Usage

expr__is_last_distinct()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 3, 2))
df$with_columns(
  is_last_distinct = pl$col("a")$is_last_distinct()
)

Check if elements are NaN

Description

Floating point NaN (Not A Number) should not be confused with missing data represented as NA (in R) or null (in Polars).

Usage

expr__is_nan()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA, 1, 5),
  b = c(1, 2, NaN, 1, 5)
)
df$with_columns(
  a_nan = pl$col("a")$is_nan(),
  b_nan = pl$col("b")$is_nan()
)

Check if elements are not NaN

Description

Floating point NaN (Not A Number) should not be confused with missing data represented as NA (in R) or null (in Polars).

Usage

expr__is_not_nan()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA, 1, 5),
  b = c(1, 2, NaN, 1, 5)
)
df$with_columns(
  a_not_nan = pl$col("a")$is_not_nan(),
  b_not_nan = pl$col("b")$is_not_nan()
)

Check if elements are not NULL

Description

Check if elements are not NULL

Usage

expr__is_not_null()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA, 1, 5),
  b = c(1, 2, NaN, 1, 5)
)
df$with_columns(
  a_not_null = pl$col("a")$is_not_null(),
  b_not_null = pl$col("b")$is_not_null()
)

Check if elements are NULL

Description

Check if elements are NULL

Usage

expr__is_null()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, NA, 1, 5),
  b = c(1, 2, NaN, 1, 5)
)
df$with_columns(
  a_null = pl$col("a")$is_null(),
  b_null = pl$col("b")$is_null()
)

Return a boolean mask indicating unique values

Description

Return a boolean mask indicating unique values

Usage

expr__is_unique()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 3, 2))
df$select(pl$col("a")$is_unique())

Compute the kurtosis (Fisher or Pearson)

Description

Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution. If bias is FALSE then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators.

Usage

expr__kurtosis(..., fisher = TRUE, bias = TRUE)

Arguments

...

These dots are for future extensions and must be empty.

fisher

If TRUE (default), Fisher’s definition is used (normal ==> 0.0). If FALSE, Pearson’s definition is used (normal ==> 3.0).

bias

If FALSE, the calculations are corrected for statistical bias.

Value

A polars expression

Examples

df <- pl$DataFrame(x = c(1, 2, 3, 2, 1))
df$select(pl$col("x")$kurtosis())

Get the last value

Description

Get the last value

Usage

expr__last()

Value

A polars expression

Examples

pl$DataFrame(x = 3:1)$with_columns(last = pl$col("x")$last())

Check lower or equal inequality

Description

Check lower or equal inequality

Usage

expr__le(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

Examples

df <- pl$DataFrame(x = 1:3)
df$with_columns(
  with_le = pl$col("x")$le(pl$lit(2)),
  with_symbol = pl$col("x") <= pl$lit(2)
)

Return the number of elements in the column

Description

Null values are counted in the total.

Usage

expr__len()

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3, b = c(NA, 4, 4))
df$select(pl$all()$len())

Get the first n rows

Description

This is an alias for $head().

Usage

expr__limit(n = 10)

Arguments

n

Number of rows to return.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:9)
df$select(pl$col("a")$limit(3))

Compute the logarithm

Description

Compute the logarithm

Usage

expr__log(base = exp(1))

Arguments

base

Numeric value used as base, defaults to exp(1).

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4))$
  with_columns(
  log = pl$col("a")$log(),
  log_base_2 = pl$col("a")$log(base = 2)
)

Compute the base-10 logarithm

Description

Compute the base-10 logarithm

Usage

expr__log10()

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4))$
  with_columns(log10 = pl$col("a")$log10())

Compute the natural logarithm plus one

Description

This computes log(1 + x) but is more numerically stable for x close to zero.

Usage

expr__log1p()

Value

A polars expression

Examples

pl$DataFrame(a = c(1, 2, 4))$
  with_columns(log1p = pl$col("a")$log1p())

Calculate the lower bound

Description

Returns a unit Series with the lowest value possible for the dtype of this expression.

Usage

expr__lower_bound()

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$lower_bound())

Check strictly lower inequality

Description

Check strictly lower inequality

Usage

expr__lt(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

Examples

df <- pl$DataFrame(x = 1:3)
df$with_columns(
  with_lt = pl$col("x")$lt(pl$lit(2)),
  with_symbol = pl$col("x") < pl$lit(2)
)

Get the maximum value

Description

Get the maximum value

Usage

expr__max()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, NaN, 3))$
  with_columns(max = pl$col("x")$max())

Get mean value

Description

Get mean value

Usage

expr__mean()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, 3, 4, NA))$
  with_columns(mean = pl$col("x")$mean())

Get median value

Description

Get median value

Usage

expr__median()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, 3, 4, NA))$
  with_columns(median = pl$col("x")$median())

Get the minimum value

Description

Get the minimum value

Usage

expr__min()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, NaN, 3))$
  with_columns(min = pl$col("x")$min())

Modulo using two expressions

Description

Method equivalent of modulus operator expr %% other.

Usage

expr__mod(other)

Arguments

other

Numeric literal or expression value.

Value

A polars expression

See Also

Examples

df <- pl$DataFrame(x = -5L:5L)

df$with_columns(
  `x%%2` = pl$col("x")$mod(2)
)

Compute the most occurring value(s)

Description

Compute the most occurring value(s)

Usage

expr__mode()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 3), b = c(1, 1, 2, 2))
df$select(pl$col("a")$mode())
df$select(pl$col("b")$mode())

Multiply two expressions

Description

Method equivalent of multiplication operator expr * other.

Usage

expr__mul(other)

Arguments

other

Numeric literal or expression value.

Value

A polars expression

See Also

  • Arithmetic operators

Examples

df <- pl$DataFrame(x = c(1, 2, 4, 8, 16))

df$with_columns(
  `x*2` = pl$col("x")$mul(2),
  `x * xlog2` = pl$col("x")$mul(pl$col("x")$log(2))
)

Count unique values

Description

null is considered to be a unique value for the purposes of this operation.

Usage

expr__n_unique()

Value

A polars expression

Examples

df <- pl$DataFrame(
  x = c(1, 1, 2, 2, 3),
  y = c(1, 1, 1, NA, NA)
)
df$select(
  x_unique = pl$col("x")$n_unique(),
  y_unique = pl$col("y")$n_unique()
)

Get the maximum value with NaN

Description

This returns NaN if there are any.

Usage

expr__nan_max()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, NA, 3, NaN, Inf))$
  with_columns(nan_max = pl$col("x")$nan_max())

Get the minimum value with NaN

Description

This returns NaN if there are any.

Usage

expr__nan_min()

Value

A polars expression

Examples

pl$DataFrame(x = c(1, NA, 3, NaN, Inf))$
  with_columns(nan_min = pl$col("x")$nan_min())

Check inequality

Description

This propagates null values, i.e. any comparison involving null will return null. Use $ne_missing() to consider null values as equal.

Usage

expr__ne(other)

Arguments

other

A literal or expression value to compare with.

Value

A polars expression

See Also

expr__ne_missing

Examples

df <- pl$DataFrame(x = c(NA, FALSE, TRUE), y = c(TRUE, TRUE, TRUE))
df$with_columns(
  ne = pl$col("x")$ne(pl$col("y")),
  ne_missing = pl$col("x")$ne_missing(pl$col("y"))
)

Check inequality without null propagation

Description

Method equivalent of addition operator expr + other.

Usage

expr__ne_missing(other)

Arguments

other

Element to add. Can be a string (only if expr is a string), a numeric value or an other expression.

Value

A polars expression

See Also

expr__ne

Examples

df <- pl$DataFrame(x = c(NA, FALSE, TRUE), y = c(TRUE, TRUE, TRUE))
df$with_columns(
  ne = pl$col("x")$ne("y"),
  ne_missing = pl$col("x")$ne_missing("y")
)

Negate a boolean expression

Description

Negate a boolean expression

Usage

expr__not()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(TRUE, FALSE, FALSE, NA))

df$with_columns(a_not = pl$col("a")$not())

# Same result with "!"
df$with_columns(a_not = !pl$col("a"))

Count null values

Description

Count null values

Usage

expr__null_count()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(NA, 1, NA),
  b = c(10, NA, 300),
  c = c(1, 2, 2)
)
df$select(pl$all()$null_count())

Apply logical OR on two expressions

Description

Combine two boolean expressions with OR.

Usage

expr__or(other)

Arguments

other

Element to add. Can be a string (only if expr is a string), a numeric value or an other expression.

Value

A polars expression

Examples

pl$lit(TRUE) | FALSE
pl$lit(TRUE)$or(pl$lit(TRUE))

Compute expressions over the given groups

Description

This expression is similar to performing a group by aggregation and joining the result back into the original DataFrame. The outcome is similar to how window functions work in PostgreSQL.

Usage

expr__over(
  ...,
  order_by = NULL,
  mapping_strategy = c("group_to_rows", "join", "explode")
)

Arguments

...

dynamic-dots> Column(s) to group by. Accepts expression input. Characters are parsed as column names.

order_by

Order the window functions/aggregations with the partitioned groups by the result of the expression passed to order_by. Accepts expression input. Strings are parsed as column names.

mapping_strategy

One of the following:

  • "group_to_rows" (default): if the aggregation results in multiple values, assign them back to their position in the DataFrame. This can only be done if the group yields the same elements before aggregation as after.

  • "join": join the groups as ⁠List<group_dtype>⁠ to the row positions. Note that this can be memory intensive.

  • "explode": don’t do any mapping, but simply flatten the group. This only makes sense if the input data is sorted.

Value

A polars expression

Examples

# Pass the name of a column to compute the expression over that column.
df <- pl$DataFrame(
  a = c("a", "a", "b", "b", "b"),
  b = c(1, 2, 3, 5, 3),
  c = c(5, 4, 2, 1, 3)
)

df$with_columns(
  pl$col("c")$max()$over("a")$name$suffix("_max")
)

# Expression input is supported.
df$with_columns(
  pl$col("c")$max()$over(pl$col("b") %/% 2)$name$suffix("_max")
)

# Group by multiple columns by passing several column names a or list of
# expressions.
df$with_columns(
  pl$col("c")$min()$over("a", "b")$name$suffix("_min")
)

group_vars <- list(pl$col("a"), pl$col("b"))
df$with_columns(
  pl$col("c")$min()$over(!!!group_vars)$name$suffix("_min")
)

# Or use positional arguments to group by multiple columns in the same way.
df$with_columns(
  pl$col("c")$min()$over("a", pl$col("b") %% 2)$name$suffix("_min")
)

# Alternative mapping strategy: join values in a list output
df$with_columns(
  top_2 = pl$col("c")$top_k(2)$over("a", mapping_strategy = "join")
)

# order_by specifies how values are sorted within a group, which is
# essential when the operation depends on the order of values
df <- pl$DataFrame(
  g = c(1, 1, 1, 1, 2, 2, 2, 2),
  t = c(1, 2, 3, 4, 4, 1, 2, 3),
  x = c(10, 20, 30, 40, 10, 20, 30, 40)
)

# without order_by, the first and second values in the second group would
# be inverted, which would be wrong
df$with_columns(
  x_lag = pl$col("x")$shift(1)$over("g", order_by = "t")
)

Computes percentage change between values

Description

Computes the percentage change (as fraction) between current element and most-recent non-null element at least n period(s) before the current element. By default it computes the change from the previous row.

Usage

expr__pct_change(n = 1)

Arguments

n

Integer or Expr indicating the number of periods to shift for forming percent change.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(10:12, NA, 12))
df$with_columns(
  pct_change = pl$col("a")$pct_change()
)

Get a boolean mask of the local maximum peaks

Description

Get a boolean mask of the local maximum peaks

Usage

expr__peak_max()

Value

A polars expression

Examples

df <- pl$DataFrame(x = c(1, 2, 3, 2, 3, 4, 5, 2))
df$with_columns(peak_max = pl$col("x")$peak_max())

Get a boolean mask of the local minimum peaks

Description

Get a boolean mask of the local minimum peaks

Usage

expr__peak_min()

Value

A polars expression

Examples

df <- pl$DataFrame(x = c(1, 2, 3, 2, 3, 4, 5, 2))
df$with_columns(peak_min = pl$col("x")$peak_min())

Exponentiation using two expressions

Description

Method equivalent of exponentiation operator expr ^ exponent.

Usage

expr__pow(other)

Arguments

exponent

Numeric literal or expression value.

Value

A polars expression

See Also

  • Arithmetic operators

Examples

df <- pl$DataFrame(x = c(1, 2, 4, 8))

df$with_columns(
  cube = pl$col("x")$pow(3),
  `x^xlog2` = pl$col("x")$pow(pl$col("x")$log(2))
)

Compute the product of an expression.

Description

Compute the product of an expression.

Usage

expr__product()

Value

A polars expression

Examples

pl$DataFrame(a = 1:3, b = c(NA, 4, 4))$
  select(pl$all()$product())

Bin continuous values into discrete categories based on their quantiles

Description

[Experimental]

Usage

expr__qcut(
  quantiles,
  ...,
  labels = NULL,
  left_closed = FALSE,
  allow_duplicates = FALSE,
  include_breaks = FALSE
)

Arguments

quantiles

Either a vector of quantile probabilities between 0 and 1 or a positive integer determining the number of bins with uniform probability.

...

These dots are for future extensions and must be empty.

labels

Names of the categories. The number of labels must be equal to the number of categories.

left_closed

Set the intervals to be left-closed instead of right-closed.

allow_duplicates

If TRUE, duplicates in the resulting quantiles are dropped, rather than raising an error. This can happen even with unique probabilities, depending on the data.

include_breaks

Include a column with the right endpoint of the bin each observation falls in. This will change the data type of the output from a Categorical to a Struct.

Value

A polars expression

Examples

# Divide a column into three categories according to pre-defined quantile
# probabilities.
df <- pl$DataFrame(foo = -2:2)
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), labels = c("a", "b", "c"))
)

# Divide a column into two categories using uniform quantile probabilities.
df$with_columns(
  qcut = pl$col("foo")$qcut(2, labels = c("low", "high"), left_closed = TRUE)
)

# Add both the category and the breakpoint.
df$with_columns(
  qcut = pl$col("foo")$qcut(c(0.25, 0.75), include_breaks = TRUE)
)$unnest()

Get quantile value(s)

Description

Get quantile value(s)

Usage

expr__quantile(
  quantile,
  interpolation = c("nearest", "higher", "lower", "midpoint", "linear")
)

Arguments

quantile

Quantile between 0.0 and 1.0.

interpolation

Interpolation method. Must be one of "nearest", "higher", "lower", "midpoint", "linear".

Value

A polars expression

Examples

df <- pl$DataFrame(a = 0:5)
df$select(pl$col("a")$quantile(0.3))
df$select(pl$col("a")$quantile(0.3, interpolation = "higher"))
df$select(pl$col("a")$quantile(0.3, interpolation = "lower"))
df$select(pl$col("a")$quantile(0.3, interpolation = "midpoint"))
df$select(pl$col("a")$quantile(0.3, interpolation = "linear"))

Convert from degrees to radians

Description

Convert from degrees to radians

Usage

expr__radians()

Value

A polars expression

Examples

pl$DataFrame(a = c(-720, -540, -360, -180, 0, 180, 360, 540, 720))$
  with_columns(radians = pl$col("a")$radians())

Assign ranks to data, dealing with ties appropriately

Description

Assign ranks to data, dealing with ties appropriately

Usage

expr__rank(
  method = c("average", "min", "max", "dense", "ordinal", "random"),
  ...,
  descending = FALSE,
  seed = NULL
)

Arguments

method

The method used to assign ranks to tied elements. Must be one of the following:

  • "average" (default): The average of the ranks that would have been assigned to all the tied values is assigned to each value.

  • "min": The minimum of the ranks that would have been assigned to all the tied values is assigned to each value. (This is also referred to as "competition" ranking.)

  • "max" : The maximum of the ranks that would have been assigned to all the tied values is assigned to each value.

  • "dense": Like 'min', but the rank of the next highest element is assigned the rank immediately after those assigned to the tied elements.

  • "ordinal" : All values are given a distinct rank, corresponding to the order that the values occur in the Series.

  • "random" : Like 'ordinal', but the rank for ties is not dependent on the order that the values occur in the Series.

...

These dots are for future extensions and must be empty.

descending

Rank in descending order.

seed

Integer. Only used if method = "random".

Value

A polars expression

Examples

# Default is to use the "average" method to break ties
df <- pl$DataFrame(a = c(3, 6, 1, 1, 6))
df$with_columns(rank = pl$col("a")$rank())

# Ordinal method
df$with_columns(rank = pl$col("a")$rank("ordinal"))

# Use "rank" with "over" to rank within groups:
df <- pl$DataFrame(
  a = c(1, 1, 2, 2, 2),
  b = c(6, 7, 5, 14, 11)
)
df$with_columns(
  rank = pl$col("b")$rank()$over("a")
)

Create a single chunk of memory for this Series

Description

Create a single chunk of memory for this Series

Usage

expr__rechunk()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2))

# Create a Series with 3 nulls, append column a then rechunk
df$select(pl$repeat(NA, 3)$append(pl$col("a"))$rechunk())

Reinterpret the underlying bits as a signed/unsigned integer

Description

This operation is only allowed for 64-bit integers. For lower bits integers, you can safely use the $cast() operation.

Usage

expr__reinterpret(..., signed = TRUE)

Arguments

...

These dots are for future extensions and must be empty.

signed

If TRUE (default), reinterpret as pl$Int64. Otherwise, reinterpret as pl$UInt64.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2))$cast(pl$UInt64)

# Create a Series with 3 nulls, append column a then rechunk
df$with_columns(
  reinterpreted = pl$col("a")$reinterpret()
)

Repeat the elements in this Series as specified in the given expression

Description

The repeated elements are expanded into a List dtype.

Usage

expr__repeat_by(by)

Arguments

by

Numeric column that determines how often the values will be repeated. The column will be coerced to UInt32. Give this dtype to make the coercion a no-op. Accepts expression input, strings are parsed as column names.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c("x", "y", "z"), n = 1:3)

df$with_columns(
  repeated = pl$col("a")$repeat_by("n")
)

Replace the given values by different values of the same data type.

Description

This allows one to recode values in a column, leaving all other values unchanged. See $replace_strict() to give a default value to all other values and to specify the output datatype.

Usage

expr__replace(old, new)

Arguments

old

Value or vector of values to replace. Accepts expression input. Vectors are parsed as Series, other non-expression inputs are parsed as literals. Also accepts a list of values like list(old = new).

new

Value or vector of values to replace by. Accepts expression input. Vectors are parsed as Series, other non-expression inputs are parsed as literals. Length must match the length of old or have length 1.

Details

The global string cache must be enabled when replacing categorical values.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 2, 2, 3))

# "old" and "new" can take vectors of length 1 or of same length
df$with_columns(replaced = pl$col("a")$replace(2, 100))
df$with_columns(replaced = pl$col("a")$replace(c(2, 3), c(100, 200)))

# "old" can be a named list where names are values to replace, and values are
# the replacements
mapping <- list(`2` = 100, `3` = 200)
df$with_columns(replaced = pl$col("a")$replace(mapping))

# The original data type is preserved when replacing by values of a
# different data type. Use $replace_strict() to replace and change the
# return data type.
df <- pl$DataFrame(a = c("x", "y", "z"))
mapping <- list(x = 1, y = 2, z = 3)
df$with_columns(replaced = pl$col("a")$replace(mapping))

# "old" and "new" can take Expr
df <- pl$DataFrame(a = c(1, 2, 2, 3), b = c(1.5, 2.5, 5, 1))
df$with_columns(
  replaced = pl$col("a")$replace(
    old = pl$col("a")$max(),
    new = pl$col("b")$sum()
  )
)

Replace all values by different values

Description

This changes all the values in a column, either using a specific replacement or a default one. See $replace() to replace only a subset of values.

Usage

expr__replace_strict(old, new, ..., default = NULL, return_dtype = NULL)

Arguments

old

Value or vector of values to replace. Accepts expression input. Vectors are parsed as Series, other non-expression inputs are parsed as literals. Also accepts a list of values like list(old = new).

new

Value or vector of values to replace by. Accepts expression input. Vectors are parsed as Series, other non-expression inputs are parsed as literals. Length must match the length of old or have length 1.

...

These dots are for future extensions and must be empty.

default

Set values that were not replaced to this value. If NULL (default), an error is raised if any values were not replaced. Accepts expression input. Non-expression inputs are parsed as literals.

return_dtype

The data type of the resulting expression. If NULL (default), the data type is determined automatically based on the other inputs.

Details

The global string cache must be enabled when replacing categorical values.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 2, 2, 3))

# "old" and "new" can take vectors of length 1 or of same length
df$with_columns(replaced = pl$col("a")$replace_strict(2, 100, default = 1))
df$with_columns(
  replaced = pl$col("a")$replace_strict(c(2, 3), c(100, 200), default = 1)
)

# "old" can be a named list where names are values to replace, and values are
# the replacements
mapping <- list(`2` = 100, `3` = 200)
df$with_columns(replaced = pl$col("a")$replace_strict(mapping, default = -1))

# By default, an error is raised if any non-null values were not replaced.
# Specify a default to set all values that were not matched.
tryCatch(
  df$with_columns(replaced = pl$col("a")$replace_strict(mapping)),
  error = function(e) print(e)
)

# one can specify the data type to return instead of automatically
# inferring it
df$with_columns(
  replaced = pl$col("a")$replace_strict(
    mapping, default = 1, return_dtype = pl$Int32
  )
)

# "old", "new", and "default" can take Expr
df <- pl$DataFrame(a = c(1, 2, 2, 3), b = c(1.5, 2.5, 5, 1))
df$with_columns(
  replaced = pl$col("a")$replace_strict(
    old = pl$col("a")$max(),
    new = pl$col("b")$sum(),
    default = pl$col("b"),
  )
)

Reshape this Expr to a flat Series or a Series of Lists

Description

Reshape this Expr to a flat Series or a Series of Lists

Usage

expr__reshape(dimensions)

Arguments

dimensions

A integer vector of length of the dimension size. If -1 is used in any of the dimensions, that dimension is inferred. Currently, more than two dimensions not supported.

nested_type

The nested data type to create. List only supports 2 dimensions, whereas Array supports an arbitrary number of dimensions.

Details

If a single dimension is given, results in an expression of the original data type. If a multiple dimensions are given, results in an expression of data type List with shape equal to the dimensions.

Value

A polars expression

Examples

df <- pl$DataFrame(foo = 1:9)

df$select(pl$col("foo")$reshape(9))
df$select(pl$col("foo")$reshape(c(3, 3)))

# Use `-1` to infer the other dimension
df$select(pl$col("foo")$reshape(c(-1, 3)))
df$select(pl$col("foo")$reshape(c(3, -1)))

# One can specify more than 2 dimensions by using the Array type
df <- pl$DataFrame(foo = 1:12)
df$select(
  pl$col("foo")$reshape(c(3, 2, 2), nested_type = pl$Array(pl$Float32, 2))
)

Reverse an expression

Description

Reverse an expression

Usage

expr__reverse()

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = 1:5,
  fruits = c("banana", "banana", "apple", "apple", "banana"),
  b = 5:1
)

df$with_columns(
  pl$all()$reverse()$name$suffix("_reverse")
)

Compress the column data using run-length encoding

Description

Run-length encoding (RLE) encodes data by storing each run of identical values as a single value and its length.

Usage

expr__rle()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 1, 2, 1, NA, 1, 3, 3))

df$select(pl$col("a")$rle())$unnest("a")

Get a distinct integer ID for each run of identical values

Description

The ID starts at 0 and increases by one each time the value of the column changes.

Usage

expr__rle_id()

Details

This functionality is especially useful for defining a new group for every time a column’s value changes, rather than for every distinct value of that column.

Value

A polars expression

Examples

df <- pl$DataFrame(
  a = c(1, 2, 1, 1, 1),
  b = c("x", "x", NA, "y", "y")
)

df$with_columns(
  rle_id_a = pl$col("a")$rle_id(),
  rle_id_ab = pl$struct("a", "b")$rle_id()
)

Create rolling groups based on a temporal or integer column

Description

If you have a time series ⁠<t_0, t_1, ..., t_n>⁠, then by default the windows created will be:

  • ⁠(t_0 - period, t_0]⁠

  • ⁠(t_1 - period, t_1]⁠

  • ⁠(t_n - period, t_n]⁠

whereas if you pass a non-default offset, then the windows will be:

  • ⁠(t_0 + offset, t_0 + offset + period]⁠

  • ⁠(t_1 + offset, t_1 + offset + period]⁠

  • ⁠(t_n + offset, t_n + offset + period]⁠

Usage

expr__rolling(index_column, ..., period, offset = NULL, closed = "right")

Arguments

index_column

Character. Name of the column used to group based on the time window. Often of type Date/Datetime. This column must be sorted in ascending order. In case of a rolling group by on indices, dtype needs to be one of UInt32, UInt64, Int32, Int64. Note that the first three get cast to Int64, so if performance matters use an Int64 column.

...

These dots are for future extensions and must be empty.

period

Length of the window - must be non-negative.

offset

Offset of the window. Default is -period.

closed

Define which sides of the range are closed (inclusive). One of the following: "both" (default), "left", "right", "none".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

dates <- as.POSIXct(
  c(
    "2020-01-01 13:45:48", "2020-01-01 16:42:13", "2020-01-01 16:45:09",
    "2020-01-02 18:12:48", "2020-01-03 19:45:32","2020-01-08 23:16:43"
  )
)
df <- pl$DataFrame(dt = dates, a = c(3, 7, 5, 9, 2, 1))

df$with_columns(
  sum_a = pl$col("a")$sum()$rolling(index_column = "dt", period = "2d"),
  min_a = pl$col("a")$min()$rolling(index_column = "dt", period = "2d"),
  max_a = pl$col("a")$max()$rolling(index_column = "dt", period = "2d")
)

Apply a rolling max over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_max(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_max = pl$col("a")$rolling_max(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_max = pl$col("a")$rolling_max(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_max = pl$col("a")$rolling_max(window_size = 3, center = TRUE)
)

Apply a rolling max based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_max_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling max with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_max = pl$col("index")$rolling_max_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling max with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_max = pl$col("index")$rolling_max_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling mean over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_mean(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_mean = pl$col("a")$rolling_mean(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_mean = pl$col("a")$rolling_mean(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_mean = pl$col("a")$rolling_mean(window_size = 3, center = TRUE)
)

Apply a rolling mean based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_mean_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling mean with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_mean = pl$col("index")$rolling_mean_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling mean with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_mean = pl$col("index")$rolling_mean_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling median over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_median(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_median = pl$col("a")$rolling_median(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_median = pl$col("a")$rolling_median(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_median = pl$col("a")$rolling_median(window_size = 3, center = TRUE)
)

Apply a rolling median based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_median_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling median with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_median = pl$col("index")$rolling_median_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling median with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_median = pl$col("index")$rolling_median_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling min over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_min(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_min = pl$col("a")$rolling_min(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_min = pl$col("a")$rolling_min(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_min = pl$col("a")$rolling_min(window_size = 3, center = TRUE)
)

Apply a rolling min based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_min_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling min with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_min = pl$col("index")$rolling_min_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling min with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_min = pl$col("index")$rolling_min_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling quantile over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_quantile(
  quantile,
  interpolation = c("nearest", "higher", "lower", "midpoint", "linear"),
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

quantile

Quantile between 0.0 and 1.0.

interpolation

Interpolation method. Must be one of "nearest", "higher", "lower", "midpoint", "linear".

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_quantile = pl$col("a")$rolling_quantile(
    quantile = 0.25, window_size = 4
  )
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_quantile = pl$col("a")$rolling_quantile(
    quantile = 0.25, window_size = 4, weights = c(0.2, 0.4, 0.4, 0.2)
  )
)

# Specify weights and interpolation method:
df$with_columns(
  rolling_quantile = pl$col("a")$rolling_quantile(
    quantile = 0.25, window_size = 4, weights = c(0.2, 0.4, 0.4, 0.2),
    interpolation = "linear"
  )
)

# Center the values in the window
df$with_columns(
  rolling_quantile = pl$col("a")$rolling_quantile(
    quantile = 0.25, window_size = 5, center = TRUE
  )
)

Apply a rolling quantile based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_quantile_by(
  by,
  window_size,
  ...,
  quantile,
  interpolation = c("nearest", "higher", "lower", "midpoint", "linear"),
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

quantile

Quantile between 0.0 and 1.0.

interpolation

Interpolation method. Must be one of "nearest", "higher", "lower", "midpoint", "linear".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling quantile with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_quantile = pl$col("index")$rolling_quantile_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling quantile with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_quantile = pl$col("index")$rolling_quantile_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling skew over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_skew(window_size, ..., bias = TRUE)

Arguments

window_size

The length of the window in number of elements.

...

These dots are for future extensions and must be empty.

bias

If FALSE, the calculations are corrected for statistical bias.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(1, 4, 2, 9))
df$with_columns(
  rolling_skew = pl$col("a")$rolling_skew(3)
)

Apply a rolling standard deviation over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_std(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE,
  ddof = 1
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_std = pl$col("a")$rolling_std(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_std = pl$col("a")$rolling_std(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_std = pl$col("a")$rolling_std(window_size = 3, center = TRUE)
)

Apply a rolling standard deviation based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_std_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none"),
  ddof = 1
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling std with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_std = pl$col("index")$rolling_std_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling std with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_std = pl$col("index")$rolling_std_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling sum over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_sum(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_sum = pl$col("a")$rolling_sum(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_sum = pl$col("a")$rolling_sum(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_sum = pl$col("a")$rolling_sum(window_size = 3, center = TRUE)
)

Apply a rolling sum based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_sum_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none")
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling sum with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_sum = pl$col("index")$rolling_sum_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling sum with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_sum = pl$col("index")$rolling_sum_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Apply a rolling variance over values

Description

[Experimental]

A window of length window_size will traverse the array. The values that fill this window will (optionally) be multiplied with the weights given by the weights vector. The resulting values will be aggregated.

The window at a given row will include the row itself, and the window_size - 1 elements before it.

Usage

expr__rolling_var(
  window_size,
  weights = NULL,
  ...,
  min_periods = NULL,
  center = FALSE,
  ddof = 1
)

Arguments

window_size

The length of the window in number of elements.

weights

An optional slice with the same length as the window that will be multiplied elementwise with the values in the window.

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

center

If TRUE, set the labels at the center of the window.

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:6)
df$with_columns(
  rolling_var = pl$col("a")$rolling_var(window_size = 2)
)

# Specify weights to multiply the values in the window with:
df$with_columns(
  rolling_var = pl$col("a")$rolling_var(
    window_size = 2, weights = c(0.25, 0.75)
  )
)

# Center the values in the window
df$with_columns(
  rolling_var = pl$col("a")$rolling_var(window_size = 3, center = TRUE)
)

Apply a rolling variance based on another column

Description

[Experimental]

Given a by column ⁠<t_0, t_1, ..., t_n>⁠, then closed = "right" (the default) means the windows will be:

  • ⁠(t_0 - window_size, t_0]⁠

  • ⁠(t_1 - window_size, t_1]⁠

  • ⁠(t_n - window_size, t_n]⁠

Usage

expr__rolling_var_by(
  by,
  window_size,
  ...,
  min_periods = 1,
  closed = c("right", "both", "left", "none"),
  ddof = 1
)

Arguments

by

Should be DateTime, Date, UInt64, UInt32, Int64, or Int32 data type after conversion by as_polars_expr(). Note that the integer ones require using "i" in window_size. Accepts expression input. Strings are parsed as column names.

window_size

The length of the window. Can be a dynamic temporal size indicated by a timedelta or the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 calendar day)

  • 1w (1 calendar week)

  • 1mo (1 calendar month)

  • 1q (1 calendar quarter)

  • 1y (1 calendar year)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".

min_periods

The number of values in the window that should be non-null before computing a result. If NULL (default), it will be set equal to window_size.

closed

Define which sides of the interval are closed (inclusive). Default is "right".

Details

If you want to compute multiple aggregation statistics over the same dynamic window, consider using $rolling() - this method can cache the window size computation.

Value

A polars expression

Examples

df_temporal <- pl$select(
  index = 0:24,
  date = pl$datetime_range(
    as.POSIXct("2001-01-01"),
    as.POSIXct("2001-01-02"),
    "1h"
  )
)

# Compute the rolling var with the temporal windows closed on the right
# (default)
df_temporal$with_columns(
  rolling_row_var = pl$col("index")$rolling_var_by(
    "date",
    window_size = "2h"
  )
)

# Compute the rolling var with the closure of windows on both sides
df_temporal$with_columns(
  rolling_row_var = pl$col("index")$rolling_var_by(
    "date",
    window_size = "2h",
    closed = "both"
  )
)

Round underlying floating point data by decimals digits

Description

Round underlying floating point data by decimals digits

Usage

expr__round(decimals)

Arguments

decimals

Number of decimals to round by.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(0.33, 0.52, 1.02, 1.17))

df$with_columns(
  rounded = pl$col("a")$round(1)
)

Round to a number of significant figures

Description

Round to a number of significant figures

Usage

expr__round_sig_figs(digits)

Arguments

digits

Number of significant figures to round to.

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(0.01234, 3.333, 1234))

df$with_columns(
  rounded = pl$col("a")$round_sig_figs(2)
)

Sample from this expression

Description

Sample from this expression

Usage

expr__sample(
  n = NULL,
  ...,
  fraction = NULL,
  with_replacement = FALSE,
  shuffle = FALSE,
  seed = NULL
)

Arguments

n

Number of items to return. Cannot be used with fraction. Defaults to 1 if fraction is NULL.

...

These dots are for future extensions and must be empty.

fraction

Fraction of items to return. Cannot be used with n.

with_replacement

Allow values to be sampled more than once.

shuffle

Shuffle the order of sampled data points.

seed

Seed for the random number generator. If NULL (default), a random seed is generated for each sample operation.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$sample(
  fraction = 1, with_replacement = TRUE, seed = 1
))

Find indices where elements should be inserted to maintain order

Description

This returns -1 if x is lower than 0, 0 if x == 0, and 1 if x is greater than 0.

Usage

expr__search_sorted(element, side = c("any", "left", "right"))

Arguments

element

Expression or scalar value.

side

Must be one of the following:

  • "any": the index of the first suitable location found is given;

  • "left": the index of the leftmost suitable location found is given;

  • "right": the index the rightmost suitable location found is given.

Value

A polars expression

Examples

df <- pl$DataFrame(values = c(1, 2, 3, 5))
df$select(
  zero = pl$col("values")$search_sorted(0),
  three = pl$col("values")$search_sorted(3),
  six = pl$col("values")$search_sorted(6),
)

Flags the expression as "sorted"

Description

Enables downstream code to user fast paths for sorted arrays.

Warning: This can lead to incorrect results if the data is NOT sorted!! Use with care!

Usage

expr__set_sorted(..., descending = FALSE)

Arguments

...

These dots are for future extensions and must be empty.

descending

Whether the Series order is descending.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$select(pl$col("a")$set_sorted()$max())

Shift values by the given number of indices

Description

Shift values by the given number of indices

Usage

expr__shift(n = 1, ..., fill_value = NULL)

Arguments

n

Number of indices to shift forward. If a negative value is passed, values are shifted in the opposite direction instead.

...

These dots are for future extensions and must be empty.

fill_value

Fill the resulting null values with this value.

Value

A polars expression

Examples

# By default, values are shifted forward by one index.
df <- pl$DataFrame(a = 1:4)
df$with_columns(shift = pl$col("a")$shift())

# Pass a negative value to shift in the opposite direction instead.
df$with_columns(shift = pl$col("a")$shift(-2))

# Specify fill_value to fill the resulting null values.
df$with_columns(shift = pl$col("a")$shift(-2, fill_value = 100))

Shrink numeric columns to the minimal required datatype

Description

Shrink to the dtype needed to fit the extrema of this Series. This can be used to reduce memory pressure.

Usage

expr__shrink_dtype()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(-112, 2, 112))$cast(pl$Int64)
df$with_columns(
  shrunk = pl$col("a")$shrink_dtype()
)

Shuffle the contents of this expression

Description

Note this is shuffled independently of any other column or Expression. If you want each row to stay the same use df$sample(shuffle = TRUE).

Usage

expr__shuffle(seed = NULL)

Arguments

seed

Integer indicating the seed for the random number generator. If NULL (default), a random seed is generated each time the shuffle is called.

Value

A polars expression

Examples

df <- pl$DataFrame(a = 1:3)
df$with_columns(
  shuffled = pl$col("a")$shuffle(seed = 1)
)

Compute the sign

Description

This returns -1 if x is lower than 0, 0 if x == 0, and 1 if x is greater than 0.

Usage

expr__sign()

Value

A polars expression

Examples

df <- pl$DataFrame(a = c(-9, 0, 0, 4, NA))
df$with_columns(sign = pl$col("a")$sign())

Compute sine

Description

Compute sine

Usage

expr__sin()

Value

A polars expression

Examples

pl$DataFrame(a = c(0, pi / 2<