首页猿问如何将包含 NUL...

如何将包含 NUL ('\x00') 行的 csv 读入 pandas？

Python

波斯汪 2023-09-05 17:28:40

我有一组 csv 文件，其中日期和时间作为前两列（文件中没有标题）。这些文件在 Excel 中打开得很好，但是当我尝试使用 Pandas read_csv 将它们读入 Python 时，无论我是否尝试类型转换，都只返回第一个日期。当我在记事本中打开时，它不仅仅是逗号分隔，而且在第 1 行之后的每一行之前都有大量空格；我已尝试skipinitialspace = True无济于事我也尝试过各种类型转换，但都不起作用。我目前正在使用parse_dates = [['Date','Time']], infer_datetime_format = True, dayfirst = True输出示例（无转换）： 0 1 2 3 4 ... 12 13 14 15 160 02/03/20 15:13:39 5.5 5.8 42.84 ... 30.0 79.0 0.0 0.0 0.01 NaN 15:13:49 5.5 5.8 42.84 ... 30.0 79.0 0.0 0.0 0.02 NaN 15:13:59 5.5 5.7 34.26 ... 30.0 79.0 0.0 0.0 0.03 NaN 15:14:09 5.5 5.7 34.26 ... 30.0 79.0 0.0 0.0 0.04 NaN 15:14:19 5.5 5.4 17.10 ... 30.0 79.0 0.0 0.0 0.0... ... ... ... ... ... ... ... ... ... ... ...39451 NaN 01:14:27 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.039452 NaN 01:14:37 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.039453 NaN 01:14:47 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.039454 NaN 01:14:57 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.039455 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN以及 parse_dates 等： Date_Time pH1 SP pH Ph1 PV pH ... 1 2 30 02/03/20 15:13:39 5.5 5.8 ... 0.0 0.0 0.01 nan 15:13:49 5.5 5.8 ... 0.0 0.0 0.02 nan 15:13:59 5.5 5.7 ... 0.0 0.0 0.03 nan 15:14:09 5.5 5.7 ... 0.0 0.0 0.04 nan 15:14:19 5.5 5.4 ... 0.0 0.0 0.0从记事本复制的数据（实际上每行前面有更多空格，但在这里不起作用）：数据来自67.csv

查看完整描述

1 回答

慕勒3428872

TA贡献1848条经验获得超6个赞

该文件充满了NUL, '\x00'，需要将其删除。

清理行后，用于pandas.DataFrame从加载数据。d

import pandas as pd

import string # to make column names

# the issue is the the file is filled with NUL not whitespace

def import_file(filename):

# open the file and clean it

with open(filename) as f:

d = list(f.readlines())

# replace NUL, strip whitespace from the end of the strings, split each string into a list

d = [v.replace('\x00', '').strip().split(',') for v in d]

# remove some empty rows

d = [v for v in d if len(v) > 2]

# load the file with pandas

df = pd.DataFrame(d)

# convert column 0 and 1 to a datetime

df['datetime'] = pd.to_datetime(df[0] + ' ' + df[1])

# drop column 0 and 1

df.drop(columns=[0, 1], inplace=True)

# set datetime as the index

df.set_index('datetime', inplace=True)

# convert data in columns to floats

df = df.astype('float')

# give character column names

df.columns = list(string.ascii_uppercase)[:len(df.columns)]

# reset the index

df.reset_index(inplace=True)

return df.copy()

# call the function

dfs = list()

filenames = ['67.csv']

for filename in filenames:

dfs.append(import_file(filename))

display(df)

A B C D E F G H I J K L M N O

datetime

2020-02-03 15:13:39 5.5 5.8 42.84 7.2 6.8 10.63 60.0 0.0 300.0 1.0 30.0 79.0 0.0 0.0 0.0

2020-02-03 15:13:49 5.5 5.8 42.84 7.2 6.8 10.63 60.0 0.0 300.0 1.0 30.0 79.0 0.0 0.0 0.0

2020-02-03 15:13:59 5.5 5.7 34.26 7.2 6.8 10.63 60.0 22.3 300.0 1.0 30.0 79.0 0.0 0.0 0.0

2020-02-03 15:14:09 5.5 5.7 34.26 7.2 6.8 10.63 60.0 15.3 300.0 45.0 30.0 79.0 0.0 0.0 0.0

2020-02-03 15:14:19 5.5 5.4 17.10 7.2 6.8 10.63 60.0 50.2 300.0 86.0 30.0 79.0 0.0 0.0 0.0

反对回复 2023-09-05

1 回答
0 关注
113 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

如何将包含 NUL ('\x00') 行的 csv 读入 pandas？

如何将包含 NUL ('\x00') 行的 csv 读入 pandas？

1 回答

添加回答