3 回答
TA贡献1801条经验 获得超16个赞
您可以使用以下命令定义“累积粘贴”功能Reduce:
cumpaste = function(x, .sep = " ")
Reduce(function(x1, x2) paste(x1, x2, sep = .sep), x, accumulate = TRUE)
cumpaste(letters[1:3], "; ")
#[1] "a" "a; b" "a; b; c"
Reduce的循环避免了从一开始就重新串联元素,因为它通过下一个元素延长了先前的串联。
按组应用:
ave(as.character(testdf$content), testdf$id, FUN = cumpaste)
#[1] "A" "A B" "A B A" "B" "B C" "B C B"
另一个想法是,可以在开始时依次连接整个向量,然后逐步地substring:
cumpaste2 = function(x, .sep = " ")
{
concat = paste(x, collapse = .sep)
substring(concat, 1L, cumsum(c(nchar(x[[1L]]), nchar(x[-1L]) + nchar(.sep))))
}
cumpaste2(letters[1:3], " ;@-")
#[1] "a" "a ;@-b" "a ;@-b ;@-c"
这似乎也更快一些:
set.seed(077)
X = replicate(1e3, paste(sample(letters, sample(0:5, 1), TRUE), collapse = ""))
identical(cumpaste(X, " --- "), cumpaste2(X, " --- "))
#[1] TRUE
microbenchmark::microbenchmark(cumpaste(X, " --- "), cumpaste2(X, " --- "), times = 30)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# cumpaste(X, " --- ") 21.19967 21.82295 26.47899 24.83196 30.34068 39.86275 30 b
# cumpaste2(X, " --- ") 14.41291 14.92378 16.87865 16.03339 18.56703 23.22958 30 a
...使其成为cumpaste_faster。
TA贡献1780条经验 获得超5个赞
您也可以尝试 dplyr
library(dplyr)
res <- testdf%>%
mutate(n=row_number()) %>%
group_by(id) %>%
mutate(n1=n[1L]) %>%
rowwise() %>%
do(data.frame(cont_concat= paste(content[.$n1:.$n],collapse=" "),stringsAsFactors=F))
res$cont_concat
#[1] "A" "A B" "A B A" "B" "B C" "B C B"
TA贡献2065条经验 获得超14个赞
这是ddply一种使用sapply和子集逐步粘贴在一起的方法:
library(plyr)
ddply(testdf, .(id), mutate, content_concatenated = sapply(seq_along(content), function(x) paste(content[seq(x)], collapse = " ")))
id content content_concatenated
1 a A A
2 a B A B
3 a A A B A
4 b B B
5 b C B C
6 b B B C B
- 3 回答
- 0 关注
- 651 浏览
添加回答
举报