首页猿问 Python/Pandas -...

Python/Pandas - 通过分隔符将文本拆分为列；并创建一个 csv 文件

Python

凤凰求蛊 2023-07-18 09:42:51

我有一个很长的文本，其中插入了分隔符“;” 正是我想将文本分成不同列的位置。到目前为止，每当我尝试将文本拆分为“ID”和“ADText”时，我只得到第一行。然而，两列中应该有 1439 行/行。我的文字如下所示：1234；写入的文本包含多个句子，跨越多行，直到某个时刻写入下一个 ID dwon 2345；然后新的广告文本开始直到下一个 ID 3456；等等我想使用 ; 将我的文本分成两列，一列包含 ID，一列包含 AD 文本。#read the text file into python: jobads= pd.read_csv("jobads.txt", header=None)print(jobadsads)#create dataframe df=pd.DataFrame(jobads, index=None, columns=None)type(df)print(df)#name column to target it for split df = df.rename(columns={0:"Job"})print(df)#split it into two columns. Problem: I only get the first row.print(pd.DataFrame(dr.Job.str.split(';',1).tolist(), columns=['ID','AD']))不幸的是，这只适用于第一个条目，然后就停止了。输出如下所示： ID AD0 1234 text in written from with ...我哪里错了？我将不胜感激任何建议=）谢谢！

查看完整描述

1 回答

蝴蝶不菲

TA贡献1810条经验获得超4个赞

示例文本：

FullName;ISO3;ISO1;molecular_weight

Alanine;Ala;A;89.09

Arginine;Arg;R;174.20

Asparagine;Asn;N;132.12

Aspartic_Acid;Asp;D;133.10

Cysteine;Cys;C;121.16

基于“;”创建列分隔器：

import pandas as pd

f = "aminoacids"

df = pd.read_csv(f,sep=";")

//img1.sycdn.imooc.com//64b5ee6c000192cd03030163.jpg

编辑：考虑到评论，我认为文本看起来更像是这样的：

t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """

在这种情况下，像这样的正则表达式会将您的字符串拆分为 id 和文本，然后您可以使用它们来生成 pandas 数据框。

import re

r = re.compile("([0-9]+);")

re.split(r,t)

输出：

['',

'1234',

' text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon ',

'2345',

' then the new Ad-Text begins until the next ID ',

'3456',

' and so on',

'1234',

' text in written from with multiple ']

编辑2：这是对评论中提问者附加问题的回应：如何将此字符串转换为具有 2 列的 pandas 数据框：ID 和文本

import pandas as pd

# a is the output list from the previous part of this answer

# Create list of texts. ::2 takes every other item from a list, starting with the FIRST one.

texts = a[::2][1:]

print(texts)

# Create list of ID's. ::1 takes every other item from a list, starting with the SECOND one

ids = a[1::2]

print(ids)

df = pd.DataFrame({"IDs":ids,"Texts":texts})

反对回复 2023-07-18

1 回答
0 关注
155 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Python/Pandas - 通过分隔符将文本拆分为列；并创建一个 csv 文件

Python/Pandas - 通过分隔符将文本拆分为列；并创建一个 csv 文件

1 回答

添加回答