5 回答
TA贡献1831条经验 获得超10个赞
使用aggregate:
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
Category x
1 First 30
2 Second 5
3 Third 34
在上面的示例中,可以在中指定多个维度list。可以通过cbind以下方式合并相同数据类型的多个聚合度量标准:
aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...
(嵌入@thelatemail评论),aggregate也有一个公式界面
aggregate(Frequency ~ Category, x, sum)
或者,如果要聚合多个列,可以使用.表示法(也适用于一列)
aggregate(. ~ Category, x, sum)
或者tapply:
tapply(x$Frequency, x$Category, FUN=sum)
First Second Third
30 5 34
使用此数据:
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
TA贡献1780条经验 获得超1个赞
最近,您还可以使用dplyr包来实现此目的:
library(dplyr)
x %>%
group_by(Category) %>%
summarise(Frequency = sum(Frequency))
#Source: local data frame [3 x 2]
#
# Category Frequency
#1 First 30
#2 Second 5
#3 Third 34
或者,对于多个汇总列(也适用于一列):
x %>%
group_by(Category) %>%
summarise_each(funs(sum))
更新dplyr> = 0.5: summarise_each已取代summarise_all,summarise_at和summarise_if家族的功能dplyr。
或者,如果您有多个要分组的列,则可以group_by使用逗号分隔所有这些列:
mtcars %>%
group_by(cyl, gear) %>% # multiple group columns
summarise(max_hp = max(hp), mean_mpg = mean(mpg)) # multiple summary columns
有关更多信息,包括%>%
运算符,请参阅dplyr简介。
TA贡献1887条经验 获得超5个赞
rcs提供的答案很简单。但是,如果您正在处理更大的数据集并需要提高性能,那么可以采用更快的替代方案:
library(data.table)
data = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"),
Frequency=c(10,15,5,2,14,20,3))
data[, sum(Frequency), by = Category]
# Category V1
# 1: First 30
# 2: Second 5
# 3: Third 34
system.time(data[, sum(Frequency), by = Category] )
# user system elapsed
# 0.008 0.001 0.009
让我们使用data.frame和上面的内容将它与同一个东西进行比较:
data = data.frame(Category=c("First","First","First","Second","Third", "Third", "Second"),
Frequency=c(10,15,5,2,14,20,3))
system.time(aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum))
# user system elapsed
# 0.008 0.000 0.015
如果你想保留列,这就是语法:
data[,list(Frequency=sum(Frequency)),by=Category]
# Category Frequency
# 1: First 30
# 2: Second 5
# 3: Third 34
对于较大的数据集,差异将变得更加明显,如下面的代码所示:
data = data.table(Category=rep(c("First", "Second", "Third"), 100000),
Frequency=rnorm(100000))
system.time( data[,sum(Frequency),by=Category] )
# user system elapsed
# 0.055 0.004 0.059
data = data.frame(Category=rep(c("First", "Second", "Third"), 100000),
Frequency=rnorm(100000))
system.time( aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum) )
# user system elapsed
# 0.287 0.010 0.296
对于多个聚合,您可以组合lapply并按.SD如下方式进行组合
data[, lapply(.SD, sum), by = Category]
# Category Frequency
# 1: First 30
# 2: Second 5
# 3: Third 34
TA贡献1828条经验 获得超4个赞
几年后,只是为了添加另一个简单的基础R解决方案,由于某种原因,这里不存在 - xtabs
xtabs(Frequency ~ Category, df)
# Category
# First Second Third
# 30 5 34
或者如果你想data.frame回来
as.data.frame(xtabs(Frequency ~ Category, df))
# Category Freq
# 1 First 30
# 2 Second 5
# 3 Third 34
- 5 回答
- 0 关注
- 869 浏览
添加回答
举报