2 回答
TA贡献1848条经验 获得超10个赞
agg在您的情况下,将标记一列作为源,您可以在之前创建另一列groupby
df['New'] = np.where(df['is_main_video'], df['file_size'], 0)
summary_df = df.groupby(['provider', 'id']).agg(
title =('title', 'first'),
file_size = ('New', 'sum')
).reset_index()
更新
summary_df = df.assign(New = np.where(df['is_main_video'], df['file_size'], 0)).groupby(['provider', 'id']).agg(
title =('title', 'first'),
file_size = ('New', 'sum')
).reset_index()
TA贡献1858条经验 获得超8个赞
您可以Series.where暂时“忽略”您的 file_sizes,其中“is_main_video”为 False,然后执行 groupby 操作来对剩余内容进行求和:
import pandas as pd
df = pd.DataFrame({
"provider": ["A", "A", "A", "B", "B"],
"title": ["hello", "world", "pandas", "example", "here"],
"is_main_video": [True, False, True, True, False],
"file_size": [10, 12, 20, 19, 10]
})
print(df)
provider title is_main_video file_size
0 A hello True 10
1 A world False 12
2 A pandas True 20
3 B example True 19
4 B here False 10
aggregated_df = (df.assign(file_size=df["file_size"].where(df["is_main_video"]))
.groupby("provider", as_index=False)
.agg(
title=("title", "first"),
file_size=("file_size", "sum"))
)
print(aggregated_df)
provider title file_size
0 A hello 30.0
1 B example 19.0
添加回答
举报