用dplyr总结多列?我有点纠结于dplyr语法。我有一个具有不同变量和一个分组变量的数据框架。现在,我想使用R中的dplyr计算每一组中每列的平均值。df <- data.frame(
a = sample(1:5, n, replace = TRUE),
b = sample(1:5, n, replace = TRUE),
c = sample(1:5, n, replace = TRUE),
d = sample(1:5, n, replace = TRUE),
grp = sample(1:3, n, replace = TRUE))df %>% group_by(grp) %>% summarise(mean(a))这给出了“GRP”表示的每个组的“a”列的平均值。我的问题是:是否有可能一次获得每个组中每一列的方法?还是我要重复一遍df %>% group_by(grp) %>% summarise(mean(a))每一列?我想要的是df %>% group_by(grp) %>% summarise(mean(a:d)) # "mean(a:d)" does not work
3 回答
蓝山帝景
TA贡献1843条经验 获得超7个赞
summarize_at
, summarize_all
summarize_if
dplyr 0.7.4
vars
funs
dplyr 0.7.4
, summarise_each
mutate_each
options(scipen = 100, dplyr.width = Inf, dplyr.print_max = Inf)library(dplyr)packageVersion("dplyr")# [1] ‘0.7.4’set.seed(123)df <- data_frame( a = sample(1:5, 10, replace=T), b = sample(1:5, 10, replace=T), c = sample(1:5, 10, replace=T), d = sample(1:5, 10, replace=T), grp = as.character(sample(1:3, 10, replace=T)) # For convenience, specify character type)df %>% group_by(grp) %>% summarise_each(.vars = letters[1:4], .funs = c(mean="mean"))# `summarise_each()` is deprecated.# Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.# To map `funs` over a selection of variables, use `summarise_at()`# Error: Strings must match column names. Unknown columns: mean
# summarise_atdf %>% group_by(grp) %>% summarise_at(.vars = letters[1:4], .funs = c(mean="mean"))df %>% group_by(grp) %>% summarise_at(.vars = names(.)[1:4], .funs = c(mean="mean"))df %>% group_by(grp) %>% summarise_at(.vars = vars(a,b,c,d), .funs = c(mean="mean"))# summarise_alldf %>% group_by(grp) %>% summarise_all(.funs = c(mean="mean"))# summarise_ifdf %>% group_by(grp) %>% summarise_if(.predicate = function(x) is.numeric(x), .funs = funs(mean="mean"))# A tibble: 3 x 5# grp a_mean b_mean c_mean d_mean# <chr> <dbl> <dbl> <dbl> <dbl># 1 1 2.80 3.00 3.6 3.00# 2 2 4.25 2.75 4.0 3.75# 3 3 3.00 5.00 1.0 2.00
df %>% group_by(grp) %>%
summarise_at(.vars = letters[1:2],
.funs = c(Mean="mean", Sd="sd"))
# A tibble: 3 x 5
# grp a_Mean b_Mean a_Sd b_Sd
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 2.80 3.00 1.4832397 1.870829
# 2 2 4.25 2.75 0.9574271 1.258306
# 3 3 3.00 5.00 NA NA
- 3 回答
- 0 关注
- 817 浏览
添加回答
举报
0/150
提交
取消