We’re delighted to announce that tidyr 1.1.0 is now available from CRAN. tidyr provides a set of tools for transforming data frames to and from tidy data, where each variable is a column and each observation is a row. Tidy data is a convention for matching the semantics and structure of your data that makes using the rest of the tidyverse (and many other R packages) much easier.
You can install install tidyr with:
install.packages("tidyr")
This release doesn’t include any major new excitement but it includes a whole passel of minor improvements building on the major changes in tidyr 1.0.0, and generally making everything easier to use and a bit more flexible. In this blog post, I’ll give a quick run down on new pivoting features; see the full release announcement for the details of other changes.
library(tidyr)
pivot_longer()
-
pivot_longer()
gains a newnames_transform
argument that allows you to transform column names before they turn into data. For example, you can use this new argument along withreadr::parse_number()
to parse column names that really should be numbers:df <- tibble(id = 1, wk1 = 0, wk2 = 4, wk3 = 9, wk4 = 25) df %>% pivot_longer( cols = starts_with("wk"), names_to = "week", names_transform = list(week = readr::parse_number), ) #> # A tibble: 4 x 3 #> id week value #> <dbl> <dbl> <dbl> #> 1 1 1 0 #> 2 1 2 4 #> 3 1 3 9 #> 4 1 4 25
-
pivot_longer()
can now discard uninformative column names by settingnames_to = character()
, thanks to idea and implementation from Mitch O’Hara Wild:df <- tibble(id = 1:2, fruitful_panda = 3:4, angry_aardvark = 5:6) df %>% pivot_longer(-id, names_to = character()) #> # A tibble: 4 x 2 #> id value #> <int> <int> #> 1 1 3 #> 2 1 5 #> 3 2 4 #> 4 2 6
-
pivot_longer()
no longer creates a.copy
variable in the presence of duplicate column names. This makes it more consistent with the handling of non-unique pivot specifications.df <- tibble(id = 1:3, x = 1:3, x = 4:6, .name_repair = "minimal") df %>% pivot_longer(-id) #> # A tibble: 6 x 3 #> id name value #> <int> <chr> <int> #> 1 1 x 1 #> 2 1 x 4 #> 3 2 x 2 #> 4 2 x 5 #> 5 3 x 3 #> 6 3 x 6
-
pivot_longer()
automatically disambiguates non-unique outputs, which can occur when the input variables include some additional component that you don’t care about and want to discard. You can discard parts of column names either withnames_pattern
or withNA
innames_to
.df <- tibble(id = 1:3, x_1 = 1:3, y_2 = 4:6, y_3 = 9:11) df %>% pivot_longer(-id, names_pattern = "(.)_.") #> # A tibble: 9 x 3 #> id name value #> <int> <chr> <int> #> 1 1 x 1 #> 2 1 y 4 #> 3 1 y 9 #> 4 2 x 2 #> 5 2 y 5 #> 6 2 y 10 #> 7 3 x 3 #> 8 3 y 6 #> 9 3 y 11 df %>% pivot_longer(-id, names_sep = "_", names_to = c("name", NA)) #> # A tibble: 9 x 3 #> id name value #> <int> <chr> <int> #> 1 1 x 1 #> 2 1 y 4 #> 3 1 y 9 #> 4 2 x 2 #> 5 2 y 5 #> 6 2 y 10 #> 7 3 x 3 #> 8 3 y 6 #> 9 3 y 11 df %>% pivot_longer(-id, names_sep = "_", names_to = c(".value", NA)) #> # A tibble: 6 x 3 #> id x y #> <int> <int> <int> #> 1 1 1 4 #> 2 1 NA 9 #> 3 2 2 5 #> 4 2 NA 10 #> 5 3 3 6 #> 6 3 NA 11
pivot_wider()
-
pivot_wider()
gains anames_sort
argument which allows you to sort column names in order. The default,FALSE
, orders columns by their first appearance. I’m considering changing the default value toTRUE
in a future version.df <- tibble( day_int = c(4, 3, 5, 1, 2), day_fac = factor(day_int, labels = c("Mon", "Tue", "Wed", "Thu", "Fri")) ) df %>% pivot_wider( names_from = day_fac, values_from = day_int ) #> # A tibble: 1 x 5 #> Thu Wed Fri Mon Tue #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 4 3 5 1 2 df %>% pivot_wider( names_from = day_fac, names_sort = TRUE, values_from = day_int ) #> # A tibble: 1 x 5 #> Mon Tue Wed Thu Fri #> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 2 3 4 5
-
pivot_wider()
gains anames_glue
argument that allows you to construct output column names with a glue specification whennames_to
includes multiple columns.df <- tibble( first = "a", second = "1", third = "X", val = 1 ) df %>% pivot_wider( names_from = c(first, second, third), values_from = val, names_glue = "{first}.{second}_{third}" ) #> # A tibble: 1 x 1 #> a.1_X #> <dbl> #> 1 1
-
pivot_wider()
argumentsvalues_fn
andvalues_fill
can now be single values; you now only need to use a named list if you want to use different values for different value columns. You’ll also get better errors if they’re not of the correct type. -
Finally, both
pivot_wider()
andpivot_longer()
are considerably more performant, thanks largely to improvements in the underlying vctrs code by Davis Vaughn.
Acknowledgements
Thanks to all 135 people who contributed to this version of tidyr by discussing issues and suggesting new code! @abichat, @abiyug, @adisarid, @ahmohamed, @akikirinrin, @albertotb, @alex-pax, @amirmazmi, @andtheWings, @ashiklom, @atusy, @batpigandme, @bertrandh, @BillBlanc, @billdenney, @BrianDiggs, @bushdanielkwajaffa, @cderv, @CGMossa, @cgoo4, @charliejhadley, @chester-gan, @cimentadaj, @cjvanlissa, @cloversleaves, @colearendt, @dah33, @DanOvando, @dapperjapper, @daranzolin, @davidhunterwalsh, @davisadamw, @DavisVaughan, @dchiu911, @dpastoor, @dpeterson71, @dpprdan, @eantworth, @earcanal, @echasnovski, @enixam, @ericgunnink, @florianm, @fmmattioni, @franzbischoff, @GegznaV, @geotheory, @ggrothendieck, @gregorp, @hadley, @HanOostdijk, @henry090, @iago-pssjd, @ifellows, @infotroph, @jam1015, @jannikbuhr, @jasonpcasey, @jeffreypullin, @jennybc, @jenren, @JenspederM, @jeonghyunwoo, @jjnote, @jmh530, @JohnCoene, @joshua-theisen, @JosiahParry, @jthomasmock, @jwilliman, @kaneplusplus, @kaybenleroll, @kent37, @kiernann, @krlmlr, @lionel-, @Ljupch0, @lymanmark, @maelle, @majazaloznik, @mattantaliss, @mattwarkentin, @maurolepore, @md0u80c9, @mgirlich, @MikeEdinger, @mikemahoney218, @mikmart, @mitchelloharawild, @moodymudskipper, @msberends, @msgoussi, @mstackhouse, @MyKo101, @nacnudus, @namelessjon, @ndrewGele, @Nicktz, @npjc, @osorensen, @PathosEthosLogos, @philipp-baumann, @PMSeitzer, @psychelzh, @randomgambit, @riinuots, @romagnolid, @romainfrancois, @rvino, @salim-b, @shanepiesik, @shannonpileggi, @sharleenw, @siddharthprabhu, @simazhi, @skr5k, @skydavis435, @smingerson, @smithjd, @srnnkls, @stragu, @stufield, @tangcxx, @tdhock, @the-Zian, @tomhopper, @topepo, @wgrundlingh, @wibeasley, @william3031, @wmoldham, @wolski, @xkdog, @xtimbeau, and @yusuzech.