为了账号安全,请及时绑定邮箱和手机立即绑定

具有 2 列的 Groupby - “pandas.core.groupby.generic”

具有 2 列的 Groupby - “pandas.core.groupby.generic”

qq_花开花谢_0 2023-03-16 10:53:49
对于当前的项目,我计划将 Pandas DataFrame 分组为stock_symbol第一标准和quarter第二标准。从其他线程中,我已经看到类似的结构group_data = df.groupby(['stock_symbol', 'quarter'])可能是这一点的可能解决方案。在给定的情况下,我只收到终端输出<pandas.core.groupby.generic.DataFrameGroupBy object at 0x11fdcbf10>。有没有人发现我这条线的思维错误?相关代码部分如下所示:# Datetime conversiondf['date'] = pd.to_datetime(df['date'])# Adding of 'Quarter' columndf['quarter'] = df['date'].dt.to_period('Q')# Grouping both the Stock Symbol and the Quarter columngroup_data = df.groupby(['stock_symbol', 'quarter'])print(group_data)在操作中要调用的函数突出显示如下:# Word frequency analysisdef get_top_n_bigram(corpus, n=None):    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)    bag_of_words = vec.transform(corpus)    sum_words = bag_of_words.sum(axis=0)    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)    return words_freq[:n]
查看完整描述

1 回答

?
慕斯王

TA贡献1864条经验 获得超2个赞

这是实现您所追求的目标的一种方法:


自定义函数:


def get_top_n_bigram(row):

    corpus = row['txt_main'] + row['txt_pro'] + row['txt_con'] + row['txt_adviceMgmt']

    n = 2 % the top n

    vec = CountVectorizer(ngram_range=(2, 2), stop_words='english').fit(corpus)

    bag_of_words = vec.transform(corpus)

    sum_words = bag_of_words.sum(axis=0)

    words_freq = [(word, sum_words[0, idx]) for word, idx in vec.vocabulary_.items()]

    words_freq =sorted(words_freq, key = lambda x: x[1], reverse=True)

    return words_freq[:n]

使用定义的函数调用groupbywith :apply


df['date'] = pd.to_datetime(df['date'])

df['quarter'] = df['date'].dt.to_period('Q')

newdf = df.groupby(['stock_symbol', 'quarter']).apply(get_top_n_bigram).to_frame(name = 'frequencies')


print(newdf)

                                                  frequencies

stock_symbol quarter                                             

AMG          2011Q3         [(smart driven, 2), (driven risk, 2)]

             2013Q1   [(asset management, 2), (smart working, 1)]

             2014Q1     [(audit firm, 3), (employment agency, 2)]

MMM          2017Q2               [(working 3m, 1), (3m time, 1)]


查看完整回答
反对 回复 2023-03-16
  • 1 回答
  • 0 关注
  • 100 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信