首页猿问如何根据某些条件迭代 Pandas...

如何根据某些条件迭代 Pandas DataFrame 以创建新的 DateFrame

Python

慕田峪4524236 2023-05-16 14:22:48

我已将一个 csv 文件导入到带有销售渠道数据的 Pandas DataFrame 中。每行代表一个机会，包括潜在客户名称、产品信息、管道阶段、概率、预期交易规模、预期结束日期、持续时间等。现在我想将其转换为销售预测，我想通过将交易规模除以持续时间乘以概率来计算每个时期的平均收入。然后根据预期的截止日期和持续时间为所有可能的时间段创建一条线。我创建了一个简化的示例来支持我的问题：import pandas as pdpipeline_data = [{'Client': 'A', 'Stage': 'suspect', 'Probability': '0.25', 'Dealsize': '1200', 'Duration': 6, 'Start_period': '2020-08'}, {'Client': 'B', 'Stage': 'prospect', 'Probability': '0.60', 'Dealsize': '1000', 'Duration': 4, 'Start_period': '2020-10'}]df = pd.DataFrame(pipeline_data)df输出： Client Stage Probability Dealsize Duration Start_period0 A suspect 0.25 1200 6 2020-081 B prospect 0.60 1000 4 2020-10因此，客户每月的平均收入为 1200 / 6 * 0.25 = 50。收入将在 2020-08 至 2021-01 期间下降（即从 2020 年 8 月到 2021 年 1 月）。首选输出将是： Client Stage Probability Dealsize Duration Start_period Weighted_revenue Period0 A suspect 0.25 1200 6 2020-08 50 2020-081 A suspect 0.25 1200 6 2020-08 50 2020-092 A suspect 0.25 1200 6 2020-08 50 2020-10 3 A suspect 0.25 1200 6 2020-08 50 2020-114 A suspect 0.25 1200 6 2020-08 50 2020-125 A suspect 0.25 1200 6 2020-08 50 2021-016 B prospect 0.60 1000 4 2020-10 150 2020-107 B prospect 0.60 1000 4 2020-10 150 2020-118 B prospect 0.60 1000 4 2020-10 150 2020-129 B prospect 0.60 1000 4 2020-10 150 2021-01我已经将 Start_period 转换为 Period 类型，因此它可用于计算/迭代。我对编码很陌生。我试图在这个网站和其他网站上找到答案，但直到现在都没有成功。我可以想象使用某种嵌套循环和追加函数来解决这个问题，但我不知道如何在 Pandas 中使用它......任何帮助将不胜感激！

查看完整描述

1 回答

慕尼黑5688855

TA贡献1848条经验获得超2个赞

您可以尝试使用列表理解，pd.date_range并且explode

df['Weighted_revenue']=(df['Dealsize'].astype(float)/df['Duration'].astype(float))*df['Probability'].astype(float)

df['Period']=[pd.date_range(x, periods=y, freq="M").strftime('%Y-%m') for x,y in zip(df["Start_period"], df["Duration"])]

df=df.explode('Period')

输出：

Client Stage Probability Dealsize Duration Start_period Weighted_revenue Period

0 A suspect 0.25 1200 6 2020-08 50.0 2020-08

0 A suspect 0.25 1200 6 2020-08 50.0 2020-09

0 A suspect 0.25 1200 6 2020-08 50.0 2020-10

0 A suspect 0.25 1200 6 2020-08 50.0 2020-11

0 A suspect 0.25 1200 6 2020-08 50.0 2020-12

0 A suspect 0.25 1200 6 2020-08 50.0 2021-01

1 B prospect 0.60 1000 4 2020-10 150.0 2020-10

1 B prospect 0.60 1000 4 2020-10 150.0 2020-11

1 B prospect 0.60 1000 4 2020-10 150.0 2020-12

1 B prospect 0.60 1000 4 2020-10 150.0 2021-01

细节：

首先，我们'Weighted_revenue'使用您描述的公式创建列：

df['Weighted_revenue']=(df['Dealsize'].astype(float)/df['Duration'].astype(float))*df['Probability'].astype(float)

Client Stage Probability Dealsize Duration Start_period Weighted_revenue

0 A suspect 0.25 1200 6 2020-08 50.0

1 B prospect 0.60 1000 4 2020-10 150.0

然后，我们使用列表推导 withzip来创建基于'Start_period'和'Duration'列的日期范围

df['Period']=[pd.date_range(x, periods=y, freq="M").strftime('%Y-%m') for x,y in zip(df["Start_period"], df["Duration"])]

Client Stage Probability Dealsize Duration Start_period Weighted_revenue Period

0 A suspect 0.25 1200 6 2020-08 50.0 [2020-08, 2020-09, 2020-10, 2020-11, 2020-12, 2021-01]

1 B prospect 0.60 1000 4 2020-10 150.0 [2020-10, 2020-11, 2020-12, 2021-01]

最后我们使用explode扩展列表：

df=df.explode('Period')

Client Stage Probability Dealsize Duration Start_period Weighted_revenue Period

0 A suspect 0.25 1200 6 2020-08 50.0 2020-08

0 A suspect 0.25 1200 6 2020-08 50.0 2020-09

0 A suspect 0.25 1200 6 2020-08 50.0 2020-10

0 A suspect 0.25 1200 6 2020-08 50.0 2020-11

0 A suspect 0.25 1200 6 2020-08 50.0 2020-12

0 A suspect 0.25 1200 6 2020-08 50.0 2021-01

1 B prospect 0.60 1000 4 2020-10 150.0 2020-10

1 B prospect 0.60 1000 4 2020-10 150.0 2020-11

1 B prospect 0.60 1000 4 2020-10 150.0 2020-12

1 B prospect 0.60 1000 4 2020-10 150.0 2021-01

反对回复 2023-05-16

1 回答
0 关注
121 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何根据某些条件迭代 Pandas DataFrame 以创建新的 DateFrame

如何根据某些条件迭代 Pandas DataFrame 以创建新的 DateFrame

1 回答

添加回答