You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of tidyr is to help you create tidy data. Tidy data is data
where:
Each variable is a column; each column is a variable.
Each observation is a row; each row is an observation.
Each value is a cell; each cell is a single value.
Tidy data describes a standard way of storing data that is used wherever
possible throughout the tidyverse. If you
ensure that your data is tidy, you’ll spend less time fighting with the
tools and more time working on your analysis. Learn more about tidy data
in vignette("tidy-data").
Installation
# The easiest way to get tidyr is to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, install just tidyr:
install.packages("tidyr")
# Or the development version from GitHub:# install.packages("pak")pak::pak("tidyverse/tidyr")
Cheatsheet
Getting started
library(tidyr)
tidyr functions fall into five main categories:
“Pivoting” which converts between long and wide forms. tidyr 1.0.0
introduces pivot_longer() and pivot_wider(), replacing the older
spread() and gather() functions. See vignette("pivot") for more
details.
“Rectangling”, which turns deeply nested lists (as from JSON) into
tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), and
vignette("rectangle") for more details.
Nesting converts grouped data to a form where each group becomes a
single row containing a nested data frame, and unnesting does the
opposite. See nest(), unnest(), and vignette("nest") for more
details.
Splitting and combining character columns. Use
separate_wider_delim(), separate_wider_position(), and
separate_wider_regex() to pull a single character column into
multiple columns; use unite() to combine multiple columns into a
single character column.
Make implicit missing values explicit with complete(); make explicit
missing values implicit with drop_na(); replace missing values with
next/previous value with fill(), or a known value with
replace_na().
Related work
tidyr
supersedes
reshape2 (2010-2014) and reshape (2005-2010). Somewhat
counterintuitively, each iteration of the package has done less. tidyr
is designed specifically for tidying data, not general reshaping
(reshape2), or the general aggregation (reshape).
data.table provides
high-performance implementations of melt() and dcast()
If you’d like to read more about data reshaping from a CS perspective,
I’d recommend the following three papers: