首页猿问累积粘贴（连接）由另一个变量分组的值

累积粘贴（连接）由另一个变量分组的值

R语言

大话西游666 2019-12-06 11:17:28

我在处理R中的数据帧时遇到问题。我想根据另一列中单元格的值将不同行中单元格的内容粘贴到一起。我的问题是我希望输出逐渐（累积）打印。输出向量的长度必须与输入向量的长度相同。这是一个与我正在处理的样本表相似的样本表：id <- c("a", "a", "a", "b", "b", "b")content <- c("A", "B", "A", "B", "C", "B")(testdf <- data.frame(id, content, stringsAsFactors=FALSE))# id content#1 a A#2 a B#3 a A#4 b B#5 b C#6 b B这就是我希望结果看起来像这样：result <- c("A", "A B", "A B A", "B", "B C", "B C B") result#[1] "A" "A B" "A B A" "B" "B C" "B C B"我不需要这样的东西：ddply(testdf, .(id), summarize, content_concatenated = paste(content, collapse = " "))# id content_concatenated#1 a A B A#2 b B C B

查看完整描述

3 回答

侃侃尔雅

TA贡献1801条经验获得超16个赞

您可以使用以下命令定义“累积粘贴”功能Reduce：

cumpaste = function(x, .sep = " ")

Reduce(function(x1, x2) paste(x1, x2, sep = .sep), x, accumulate = TRUE)

cumpaste(letters[1:3], "; ")

#[1] "a" "a; b" "a; b; c"

Reduce的循环避免了从一开始就重新串联元素，因为它通过下一个元素延长了先前的串联。

按组应用：

ave(as.character(testdf$content), testdf$id, FUN = cumpaste)

#[1] "A" "A B" "A B A" "B" "B C" "B C B"

另一个想法是，可以在开始时依次连接整个向量，然后逐步地substring：

cumpaste2 = function(x, .sep = " ")

{

concat = paste(x, collapse = .sep)

substring(concat, 1L, cumsum(c(nchar(x[[1L]]), nchar(x[-1L]) + nchar(.sep))))

}

cumpaste2(letters[1:3], " ;@-")

#[1] "a" "a ;@-b" "a ;@-b ;@-c"

这似乎也更快一些：

set.seed(077)

X = replicate(1e3, paste(sample(letters, sample(0:5, 1), TRUE), collapse = ""))

identical(cumpaste(X, " --- "), cumpaste2(X, " --- "))

#[1] TRUE

microbenchmark::microbenchmark(cumpaste(X, " --- "), cumpaste2(X, " --- "), times = 30)

#Unit: milliseconds

# expr min lq mean median uq max neval cld

# cumpaste(X, " --- ") 21.19967 21.82295 26.47899 24.83196 30.34068 39.86275 30 b

# cumpaste2(X, " --- ") 14.41291 14.92378 16.87865 16.03339 18.56703 23.22958 30 a

...使其成为cumpaste_faster。

反对回复 2019-12-06

翻阅古今

TA贡献1780条经验获得超5个赞

您也可以尝试 dplyr

library(dplyr)

res <- testdf%>%

mutate(n=row_number()) %>%

group_by(id) %>%

mutate(n1=n[1L]) %>%

rowwise() %>%

do(data.frame(cont_concat= paste(content[.$n1:.$n],collapse=" "),stringsAsFactors=F))

res$cont_concat

#[1] "A" "A B" "A B A" "B" "B C" "B C B"

反对回复 2019-12-06

翻翻过去那场雪

TA贡献2065条经验获得超14个赞

这是ddply一种使用sapply和子集逐步粘贴在一起的方法：

library(plyr)

ddply(testdf, .(id), mutate, content_concatenated = sapply(seq_along(content), function(x) paste(content[seq(x)], collapse = " ")))

id content content_concatenated

1 a A A

2 a B A B

3 a A A B A

4 b B B

5 b C B C

6 b B B C B

反对回复 2019-12-06

3 回答
0 关注
654 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

累积粘贴（连接）由另一个变量分组的值

累积粘贴（连接）由另一个变量分组的值

3 回答

添加回答