首页猿问我们是否有任何功能可以在 R 或...

我们是否有任何功能可以在 R 或 Python 中过滤数据

Python

胡子哥哥 2023-01-04 13:33:51

我是 R 的新手，我无法弄清楚如何根据需要过滤数据下面是数据（326 行和 6 列）数据集这是一个小例子：Author,Commenid,Parentid,Submissionid Score StanceUser1 , 333c , 222b , 111b , 10 , Positive User2 , 444c , 333c , 5hdc , 15 , NeutralUser3 , 222b , 555d , 23er , 20 , NegativeUser4 , 555d , 666f , 111b , 11 , Positive这里user1的意思是，他已经回复了user2 user3 had replied to user1 user4 had replied to user3我想过滤为具有相同 commentid 和 parentid 的用户，对于上面的示例，我们将过滤为数据Author Score Stance Reply Score StanceUser2 15 Neutral User1 10 Positive User1 10 Positive User3 20 Negative User3 20 Negative User4 11 Positive我尝试了很多但我无法弄清楚，任何人都可以帮助我如何准确地做到这一点（R 或 Python）。

查看完整描述

2 回答

慕慕森

TA贡献1856条经验获得超17个赞

这是一个基本的 R 答案。

第一match列Commenid与Parentid. 创建一个数据集，其中Author列和Reply作者的列之前匹配。保留所有没有NA值的行，并将 ( merge) 与原始数据连接起来以获得其他列。

i <- with(df1, match(Commenid, Parentid))

res <- data.frame(Author = df1$Author, Reply = df1$Author[i])

res <- res[complete.cases(res), ]

merge(res, df1)

# Author Reply Commenid Parentid Submissionid

#1 User1 User2 333c 222b 111b

#2 User3 User1 222b 555d 23er

#3 User4 User3 555d 666f 111b

一种dplyr解决方案可能是

library(dplyr)

df1 %>%

mutate(i = match(Commenid, Parentid),

Reply = Author[i]) %>%

filter(!is.na(i)) %>%

select(Author, Reply, everything(vars = -i))

数据

df1 <- read.csv(text = "

Author,Commenid,Parentid,Submissionid

User1 , 333c , 222b , 111b

User2 , 444c , 333c , 5hdc

User3 , 222b , 555d , 23er

User4 , 555d , 666f , 111b

df1[] <- lapply(df1, trimws)

编辑

有了评论中描述的新数据和问题，这里有一个dplyr解决方案。在与上面基本相同之后，它将结果与原始数据集连接起来并对列重新排序。

library(dplyr)

df2 %>%

mutate(i = match(Commenid, Parentid),

Reply = Author[i]) %>%

filter(!is.na(i)) %>%

select(-i) %>%

select(Author, Score, Stance, Reply, everything()) %>%

left_join(df2 %>% select(Author, Score, Stance), by = c("Reply" = "Author")) %>%

select(-matches("id$"), everything(), matches("id$"))

新数据

df2 <- read.csv(text = "

Author,Commenid,Parentid,Submissionid, Score, Stance

User1 , 333c , 222b , 111b , 10 , Positive

User2 , 444c , 333c , 5hdc , 15 , Neutral

User3 , 222b , 555d , 23er , 20 , Negative

User4 , 555d , 666f , 111b , 11 , Positive

names(df1) <- trimws(names(df1))

df1[] <- lapply(df1, trimws)

反对回复 2023-01-04

慕侠2389804

TA贡献1719条经验获得超6个赞

您可以将每个用户与其他用户进行比较，如果commentid相等parentid则您可以打印它，下面是您如何在 Python 中执行此操作：

for u1 in dataset :

for u2 in dataset :

if u1['parentid'] == u2['commentid'] :

print( u1['Author'],' had comment of ',u2['Author'] )

反对回复 2023-01-04

2 回答
0 关注
92 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

我们是否有任何功能可以在 R 或 Python 中过滤数据

我们是否有任何功能可以在 R 或 Python 中过滤数据

2 回答

添加回答