为了账号安全,请及时绑定邮箱和手机立即绑定

对多个数据帧和返回语句进行计算的更好方法?

对多个数据帧和返回语句进行计算的更好方法?

叮当猫咪 2023-06-27 17:34:06
我的函数查看 3 个数据帧,在不同日期之间进行过滤,并创建一个语句。正如您所看到的,该函数一遍又一遍地重复使用相同的步骤,我想减少它们。我相信使用 afor-loop会有所帮助,但我不确定如何return像现在这样在一小段中做出陈述def stat_generator(df,date1,date2,df2,date3,date4,df4,date5,date6):     ##First Date Filter for First Dataframe, and calculations for first dataframe        df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])    mask = ((df['Announcement Date'] >= date1) & (df['Announcement Date'] <= date2))    df_new = df.loc[mask]    total = len(df_new)    better = df_new[(df_new['performance'] == 'better')]    better_perc = round(((len(better)/total)*100),2)    worse = df_new[(df_new['performance'] == 'worse')]    worse_perc = round(((len(worse)/total)*100),2)    statement1 = "During the time period between {} and {}, {} % of the students performed better. {} %     of the students performed worse" .format(date1,date2,better_perc,worse_perc)        ##Second Date Filter for Second Dataframe, and calculations for second dataframe        df2['Announcement Date'] = pd.to_datetime(df2['Announcement Date'])    mask2 = ((df2['Announcement Date'] >= date3) & (df2['Announcement Date'] <= date4))    df_new2 = df2.loc[mask2]    total2 = len(df_new2)    better2 = df_new2[(df_new2['performance'] == 'better')]    better_perc2 = round(((len(better2)/total2)*100),2)    worse2 = df_new2[(df_new2['performance'] == 'worse')]    worse_perc2 = round(((len(worse2)/total2)*100),2)    statement2 = "During the time period between {} and {}, {} % of the students performed better. {} %     of the students performed worse" .format(date3,date4,better_perc2,worse_perc2)        ##Third Date Filter for Third Dataframe, and calculations for third dataframe   
查看完整描述

2 回答

?
www说

TA贡献1775条经验 获得超8个赞

我只需将 3 个参数传递给您的函数,即 df、date1 和 date2,然后调用您的函数 3 次。


def stat_generator(df,date1,date2):

    "..."

    return statement

然后将您的数据作为列表列表或类似的内容传递。例如:


data = [[df,date1,date2],[df2,date3,date4],[df4,date5,date6]]


for lists in data:

    stat_generator(*lists)


查看完整回答
反对 回复 2023-06-27
?
尚方宝剑之说

TA贡献1788条经验 获得超4个赞

维持现有形式

  • df将中的参数更改stat_generatordf1,因此df可以在 中使用for-loop

  • 将每个数据帧的数据分组在一起

  • 创建一个statements列表,待返回

  • date1anddate2改为d1andd2在循环中

  • 更新statement1为使用更易于阅读的f-string.

  • 我认为这些更新需要对整体代码进行最少的更改。

  • 可选:

    • 更改maskmask = df['Announcement Date'].between(d1, d2, inclusive=True)

def stat_generator(df1, date1 ,date2 ,df2 ,date3 ,date4 ,df4 ,date5 ,date6): 

    ##First Date Filter for First Dataframe, and calculations for first dataframe

    

    # create groups

    groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]

    

    # create a statements list for each statement

    statements = list()

    

    # iterate through each group

    for (df, d1, d2) in groups:

    

        df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])

        mask = ((df['Announcement Date'] >= d1) & (df['Announcement Date'] <= d2))

        df_new = df.loc[mask]

        total = len(df_new)

        better = df_new[(df_new['performance'] == 'better')]

        better_perc = round(((len(better)/total)*100),2)

        worse = df_new[(df_new['performance'] == 'worse')]

        worse_perc = round(((len(worse)/total)*100),2)

        statement1 = f"During the time period between {d1} and {d2}, {better_perc}% of the students performed better. {worse_perc}%  of the students performed worse"

        

        # append the statement of the dataframe

        statements.append(statement1)


    # return a list of all the statements    

    return statements

完全重写

  • 该函数最好只做一件事,即提取并返回数据。

  • 负责将多个数据帧传递到函数外部的函数,并将结果收集在一个list或多个数据print帧中。

  • better为和创建新的数据框效率不高worse

    • 使用.value_counts()withnormalize=True来获取百分比。

def stat_generator(df: pd.DataFrame, d1: str, d2: str) -> str: 

           

    df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])


    # create the mask

    mask = df['Announcement Date'].between(d1, d2, inclusive=True)


    # apply the mask

    df_new = df.loc[mask]


    # calculate the percentage

    per = (df_new.performance.value_counts(normalize=True) * 100).round(2)


    return f"During the time period between {d1} and {d2}, {per['better']}% of the students performed better. {per['worse']}%  of the students performed worse"



groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]


statements = list()

for group in groups:

    statements.append(stat_generator(*group))


查看完整回答
反对 回复 2023-06-27
  • 2 回答
  • 0 关注
  • 136 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信