2 回答
TA贡献1856条经验 获得超17个赞
这是一个基本的 R 答案。
第一match列Commenid与Parentid. 创建一个数据集,其中Author列和Reply作者的列之前匹配。保留所有没有NA值的行,并将 ( merge) 与原始数据连接起来以获得其他列。
i <- with(df1, match(Commenid, Parentid))
res <- data.frame(Author = df1$Author, Reply = df1$Author[i])
res <- res[complete.cases(res), ]
merge(res, df1)
# Author Reply Commenid Parentid Submissionid
#1 User1 User2 333c 222b 111b
#2 User3 User1 222b 555d 23er
#3 User4 User3 555d 666f 111b
一种dplyr解决方案可能是
library(dplyr)
df1 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(Author, Reply, everything(vars = -i))
数据
df1 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid
User1 , 333c , 222b , 111b
User2 , 444c , 333c , 5hdc
User3 , 222b , 555d , 23er
User4 , 555d , 666f , 111b
")
df1[] <- lapply(df1, trimws)
编辑
有了评论中描述的新数据和问题,这里有一个dplyr解决方案。在与上面基本相同之后,它将结果与原始数据集连接起来并对列重新排序。
library(dplyr)
df2 %>%
mutate(i = match(Commenid, Parentid),
Reply = Author[i]) %>%
filter(!is.na(i)) %>%
select(-i) %>%
select(Author, Score, Stance, Reply, everything()) %>%
left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%
select(-matches("id$"), everything(), matches("id$"))
新数据
df2 <- read.csv(text = "
Author,Commenid,Parentid,Submissionid, Score, Stance
User1 , 333c , 222b , 111b , 10 , Positive
User2 , 444c , 333c , 5hdc , 15 , Neutral
User3 , 222b , 555d , 23er , 20 , Negative
User4 , 555d , 666f , 111b , 11 , Positive
")
names(df1) <- trimws(names(df1))
df1[] <- lapply(df1, trimws)
TA贡献1719条经验 获得超6个赞
您可以将每个用户与其他用户进行比较,如果commentid相等parentid则您可以打印它,下面是您如何在 Python 中执行此操作:
for u1 in dataset :
for u2 in dataset :
if u1['parentid'] == u2['commentid'] :
print( u1['Author'],' had comment of ',u2['Author'] )
添加回答
举报