首页猿问使用str(df)后如何取回Dat...

使用str(df)后如何取回DataFrame？

Python

红糖糍粑 2021-10-19 09:45:33

我想我在尝试保存包含一堆 Pandas 数据帧的 Pandas 系列时搞砸了。事实证明，每个 DataFrame 都像我调用df.to_string()它们一样被保存。从我目前的观察来看，我的字符串在某些地方有额外的间距，\当 DataFrame 有太多列无法在同一行上显示时，我的字符串也会有额外的间距。这是一个“更合适的数据帧：df = pd.DataFrame(columns=["really long name that goes on for a while", "another really long string", "c"]*6, data=[["some really long data",2,3]*6,[4,5,6]*6,[7,8,9]*6])我拥有并希望变成 DataFrame 的字符串如下所示：# str(df)' really long name that goes on for a while another really long string c \\\n0 some really long data 2 3 \n1 4 5 6 \n2 7 8 9 \n\n really long name that goes on for a while another really long string c \\\n0 some really long data 2 3 \n1 4 5 6 \n2 7 8 9 \n\n really long name that goes on for a while another really long string c \\\n0 some really long data 2 3 \n1 4 5 6 \n2 7 8 9 \n\n really long name that goes on for a while another really long string c \\\n0 some really long data 2 3 \n1 4 5 6 \n2 7 8 9 \n\n really long name that goes on for a while another really long string c \\\n0 some really long data 2 3 \n1 我如何将这样的字符串还原回 DataFrame？谢谢

查看完整描述

3 回答

守着一只汪

TA贡献1872条经验获得超3个赞

尝试这个。更新为包含自动计算行数的逻辑。基本上我提取原始数据帧索引（行号）的最大值，它在大字符串内。

如果我们从使用您提供的示例转换为字符串的数据帧开始：

df = pd.DataFrame(columns=["really long name that goes on for a while", "another really long string", "c"]*6,

data=[["some really long data",2,3]*6,[4,5,6]*6,[7,8,9]*6])

string = str(df)

首先，让我们提取列名：

import re

import numpy as np

lst = re.split('\n', string)

num_rows = int(lst[lst.index('') -1][0]) + 1

col_names = []

lst = [i for i in lst if i != '']

for i in range(0,len(lst), num_rows + 1):

col_names.append(lst[i])

new_col_names = []

for i in col_names:

new_col_names.append(re.split(' ', i))

final_col_names = []

for i in new_col_names:

final_col_names += i

final_col_names = [i for i in final_col_names if i != '']

final_col_names = [i for i in final_col_names if i != '\\']

然后，让我们获取数据：

for i in col_names:

lst.remove(i)

new_lst = [re.split(r'\s{2,}', i) for i in lst]

new_lst = [i[1:-1] for i in new_lst]

newer_lst = []

for i in range(num_rows):

sub_lst = []

for j in range(i,len(final_col_names), num_rows):

sub_lst += new_lst[j]

newer_lst.append(sub_lst)

reshaped = np.reshape(newer_lst, (num_rows,len(final_col_names)))

最后，我们可以使用数据和列名创建重建的数据框：

fixed_df = pd.DataFrame(data=reshaped, columns = final_col_names)

我的代码执行了一些循环，因此如果您的原始数据帧有数十万行，这种方法可能需要一段时间。

反对回复 2021-10-19

3 回答
0 关注
235 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

使用str(df)后如何取回DataFrame？

使用str(df)后如何取回DataFrame？

3 回答

添加回答