R语言第二章数据处理⑥dplyr包(1)列选取
=========================================
注意:所有代码都将作为管道的一部分呈现,即使它们中的任何一个都不是完整的管道。 在某些情况下,我添加了一个
glimpse()
语句,允许您查看输出tibble中选择的列,而不必每次都打印所有数据。
数据集
library(tidyverse)#built-in R dataset glimpse(msleep)## Observations: 83## Variables: 11## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Grea... ## $ genus <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bo...## $ vore <chr> "carni", "omni", "herbi", "omni", "herbi", "herbi... ## $ order <chr> "Carnivora", "Primates", "Rodentia", "Soricomorph...## $ conservation <chr> "lc", NA, "nt", "lc", "domesticated", NA, "vu", N...## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1...## $ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0....## $ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.38...## $ awake <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9,...## $ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0....## $ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.4...
选取列
选取列:基础部分
如果目的是选择其中几列,只需在select语句中添加列的名称即可。 添加它们的顺序将决定它们在output中的显示顺序。
msleep %>% select(name, genus, sleep_total, awake) %>% glimpse()## Observations: 83## Variables: 4## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Great... ## $ genus <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bos...## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1,...## $ awake <dbl> 11.9, 7.0, 9.6, 9.1, 20.0, 9.6, 15.3, 17.0, 13.9, ...
如果你想添加很多列,可以通过使用:提高工作效率,取消选择甚至取消选择列并重新添加它来进行选择。同时可以请使用start_col:end_col
语法选择某些列:
msleep %>% select(name:order, sleep_total:sleep_cycle) %>% glimpse## Observations: 83## Variables: 7## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Great... ## $ genus <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bos...## $ vore <chr> "carni", "omni", "herbi", "omni", "herbi", "herbi"...## $ order <chr> "Carnivora", "Primates", "Rodentia", "Soricomorpha... ## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1,... ## $ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6... ## $ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.383...
另一种方法是通过在列名称前添加减号来取消选择列。 还可以通过此操作取消选择某些列。
msleep %>% select(-conservation, -(sleep_total:awake)) %>% glimpse## Observations: 83## Variables: 6## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Greater s... ## $ genus <chr> "Acinonyx", "Aotus", "Aplodontia", "Blarina", "Bos", "...## $ vore <chr> "carni", "omni", "herbi", "omni", "herbi", "herbi", "c... ## $ order <chr> "Carnivora", "Primates", "Rodentia", "Soricomorpha", "...## $ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.07000...## $ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.490, 0...
甚至可以取消所有列,然后重新添加其中某列。下面的示例代码取消选择从name到awake的所有列,但重新添加列'conservation',即使它是取消选择的列的一部分。 但这只适用于在同一select()
语句中。
msleep %>% select(-(name:awake), conservation) %>% glimpse## Observations: 83## Variables: 3## $ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0....## $ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.4...## $ conservation <chr> "lc", NA, "nt", "lc", "domesticated", NA, "vu", N...
根据列名特点选择列
如果你有很多具有类似列名的列,你可以通过在select语句中添加starts_with()
,ends_with()
或contains()
来使用匹配。
msleep %>% select(name, starts_with("sleep")) %>% glimpse## Observations: 83## Variables: 4## $ name <chr> "Cheetah", "Owl monkey", "Mountain beaver", "Great...## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1,...## $ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6...## $ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.383...msleep %>% select(contains("eep"), ends_with("wt")) %>% glimpse## Observations: 83## Variables: 5## $ sleep_total <dbl> 12.1, 17.0, 14.4, 14.9, 4.0, 14.4, 8.7, 7.0, 10.1,...## $ sleep_rem <dbl> NA, 1.8, 2.4, 2.3, 0.7, 2.2, 1.4, NA, 2.9, NA, 0.6...## $ sleep_cycle <dbl> NA, NA, NA, 0.1333333, 0.6666667, 0.7666667, 0.383...## $ brainwt <dbl> NA, 0.01550, NA, 0.00029, 0.42300, NA, NA, NA, 0.0...## $ bodywt <dbl> 50.000, 0.480, 1.350, 0.019, 600.000, 3.850, 20.49...
根据正则表达式选择列
以上的辅助函数都是使用精确的模式匹配。 如果你有列名模式并不精确相同,你可以在matches()
中使用任何正则表达式。下面的示例代码将添加任何包含“o”的列,后跟一个或多个其他字母,以及“er”。
#selecting based on regexmsleep %>% select(matches("o.+er")) %>% glimpse## Observations: 83## Variables: 2## $ order <chr> "Carnivora", "Primates", "Rodentia", "Soricomorph... ## $ conservation <chr> "lc", NA, "nt", "lc", "domesticated", NA, "vu", N...
根据预先确定的列名选择列
还有另一个选项可以避免连续重新输入列名:one_of()
。 您可以预先设置列名,然后在select()
语句中通过将它们包装在one_of()
中或使用!!
运算符来引用它们。
classification <- c("name", "genus", "vore", "order", "conservation") msleep %>% select(!!classification)## # A tibble: 83 x 5## name genus vore order conservation## <chr> <chr> <chr> <chr> <chr> ## 1 Cheetah Acinonyx carni Carnivora lc ## 2 Owl monkey Aotus omni Primates <NA> ## 3 Mountain beaver Aplodontia herbi Rodentia nt ## 4 Greater short-tailed shrew Blarina omni Soricomorpha lc ## 5 Cow Bos herbi Artiodactyla domesticated## 6 Three-toed sloth Bradypus herbi Pilosa <NA> ## 7 Northern fur seal Callorhinus carni Carnivora vu ## 8 Vesper mouse Calomys <NA> Rodentia <NA> ## 9 Dog Canis carni Carnivora domesticated## 10 Roe deer Capreolus herbi Artiodactyla lc ## # ... with 73 more rows
作者:赛乾
链接:https://www.jianshu.com/p/4b37580a6c31
共同学习,写下你的评论
评论加载中...
作者其他优质文章