协助将数据框拆分为新列

我在用 _ 分割数据框并从中创建新列时遇到问题。原来的股AMAT_0000006951_10Q_20200726_Item1A_excerpt.txt as section我当前的代码df = pd.DataFrame(myList,columns=['section','text'])#df['text'] = df['text'].str.replace('•','')df['section'] = df['section'].str.replace('Item1A', 'Filing Section: Risk Factors')df['section'] = df['section'].str.replace('Item2_', 'Filing Section: Management Discussion and Analysis')df['section'] = df['section'].str.replace('excerpt.txt', '').str.replace(r'\d{10}_|\d{8}_', '')df.to_csv("./SECParse.csv", encoding='utf-8-sig', sep=',',index=False)输出：section textAMAT_10Q_Filing Section: Risk Factors_ The COVID-19 pandemic and global measures taken in response thereto have adversely impacted, and may continue to adversely impact, Applied’s operations and financial results.AMAT_10Q_Filing Section: Risk Factors_ The COVID-19 pandemic and measures taken in response by governments and businesses worldwide to contain its spread, AMAT_10Q_Filing Section: Risk Factors_ The degree to which the pandemic ultimately impacts Applied’s financial condition and results of operations and the global economy will depend on future developments beyond our control我真的很想以某种方式拆分“部分”，将其放入基于“_”的新列中我已经尝试了许多不同的正则表达式变体来拆分“部分”，并且所有这些都给了我没有填充的标题或者他们在部分和文本之后添加了列，这是没有用的。我还应该补充一下，大约有 100,000 个观察结果。期望的结果：Ticker Filing type Section TextAMAT 10Q Filing Section: Risk Factors The COVID-19 pandemic and global measures taken in response 任何指导将不胜感激。

查看完整描述

1 回答

jeck猫

TA贡献1909条经验获得超7个赞

如果您始终知道分割数，您可以执行以下操作：

import pandas as pd

df = pd.DataFrame({ "a": [ "test_a_b", "test2_c_d" ] })

# Split column by "_"

items = df["a"].str.split("_")

# Get last item from splitted column and place it on "b"

df["b"] = items.apply(list.pop)

# Get next last item from splitted column and place it on "c"

df["c"] = items.apply(list.pop)

# Get final item from splitted column and place it on "d"

df["d"] = items.apply(list.pop)

这样，数据框将变成

a b c d

0 test_a_b b a test

1 test2_c_d d c test2

由于您希望列按特定顺序排列，因此可以对数据框的列重新排序，如下所示：

>>> df = df[[ "d", "c", "b", "a" ]]

>>> df

d c b a

0 test a b test_a_b

1 test2 c d test2_c_d

反对回复 2023-07-27

热搜

最近搜索清空

协助将数据框拆分为新列

协助将数据框拆分为新列

1 回答

添加回答