我正在用 Pandas 解析几个 csv 文件并将它们连接成一个大数据帧。然后,我想groupby计算mean().这是一个示例数据框:df1.head() Time Node Packets0 1 0 02 1 1 04 1 2 06 1 3 08 1 4 0df1.info(verbose=True)<class 'pandas.core.frame.DataFrame'>Int64Index: 27972 entries, 0 to 55942Data columns (total 3 columns):Time 27972 non-null int64Node 27972 non-null int64Packets 27972 non-null int64dtypes: int64(3)memory usage: 874.1 KBNone然后我将它们连接起来(为了简单起见,三个数据帧)df_total = pd.concat([df1, df2, df3])df_total.info(verbose=True) 结果是<class 'pandas.core.frame.DataFrame'>Int64Index: 83916 entries, 0 to 55942Data columns (total 3 columns):Time 83916 non-null objectNode 83916 non-null objectPackets 83916 non-null objectdtypes: object(3)memory usage: 2.6+ MBNone最后,我尝试:df_total = df_total.groupby(['Time'])['Packets'].mean()这就是错误pandas.core.base.DataError: No numeric types to aggregate出现的地方。虽然我从其他职位如明白这是熊猫改变dtype因为non-null,我无法解决提出的解决方案我的问题。我该如何解决?
2 回答
翻阅古今
TA贡献1780条经验 获得超5个赞
我发现另一篇文章提到数据帧必须用 dtype 初始化,否则它们是对象类型
Did you initialize an empty DataFrame first and then filled it? If so that's probably
why it changed with the new version as before 0.9 empty DataFrames were initialized
to float type but now they are of object type. If so you can change the
initialization to DataFrame(dtype=float).
所以我添加df_total = pd.DataFrame(columns=['Time', 'Node', 'Packets'], dtype=int)到我的代码中并且它起作用了。
添加回答
举报
0/150
提交
取消