2 回答
TA贡献1775条经验 获得超8个赞
我只需将 3 个参数传递给您的函数,即 df、date1 和 date2,然后调用您的函数 3 次。
def stat_generator(df,date1,date2):
"..."
return statement
然后将您的数据作为列表列表或类似的内容传递。例如:
data = [[df,date1,date2],[df2,date3,date4],[df4,date5,date6]]
for lists in data:
stat_generator(*lists)
TA贡献1788条经验 获得超4个赞
维持现有形式
df
将中的参数更改stat_generator
为df1
,因此df
可以在 中使用for-loop
。将每个数据帧的数据分组在一起
创建一个
statements
列表,待返回date1
anddate2
改为d1
andd2
在循环中更新
statement1
为使用更易于阅读的f-string
.我认为这些更新需要对整体代码进行最少的更改。
可选:
更改
mask
为mask = df['Announcement Date'].between(d1, d2, inclusive=True)
def stat_generator(df1, date1 ,date2 ,df2 ,date3 ,date4 ,df4 ,date5 ,date6):
##First Date Filter for First Dataframe, and calculations for first dataframe
# create groups
groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]
# create a statements list for each statement
statements = list()
# iterate through each group
for (df, d1, d2) in groups:
df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])
mask = ((df['Announcement Date'] >= d1) & (df['Announcement Date'] <= d2))
df_new = df.loc[mask]
total = len(df_new)
better = df_new[(df_new['performance'] == 'better')]
better_perc = round(((len(better)/total)*100),2)
worse = df_new[(df_new['performance'] == 'worse')]
worse_perc = round(((len(worse)/total)*100),2)
statement1 = f"During the time period between {d1} and {d2}, {better_perc}% of the students performed better. {worse_perc}% of the students performed worse"
# append the statement of the dataframe
statements.append(statement1)
# return a list of all the statements
return statements
完全重写
该函数最好只做一件事,即提取并返回数据。
负责将多个数据帧传递到函数外部的函数,并将结果收集在一个
list
或多个数据print
帧中。better
为和创建新的数据框效率不高worse
。使用
.value_counts()
withnormalize=True
来获取百分比。
def stat_generator(df: pd.DataFrame, d1: str, d2: str) -> str:
df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])
# create the mask
mask = df['Announcement Date'].between(d1, d2, inclusive=True)
# apply the mask
df_new = df.loc[mask]
# calculate the percentage
per = (df_new.performance.value_counts(normalize=True) * 100).round(2)
return f"During the time period between {d1} and {d2}, {per['better']}% of the students performed better. {per['worse']}% of the students performed worse"
groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]
statements = list()
for group in groups:
statements.append(stat_generator(*group))
添加回答
举报