首页猿问 Pandas...

Pandas 使用另一个系列作为查找值在一行中查找关键字的位置

Python

波斯汪 2022-07-05 17:52:13

我试图了解如何申请str.find()以在熊猫系列的字符串中查找关键字的索引位置。我想在与 for 的输入值相同的数据框中使用包含字符串的另一个系列str.find()。我试图创建的输出是另一个系列，其中包含关键字在字符串中的位置的整数。例如，对于第一行，我期望 a 1，对于第二行，我期望2。目标是使用关键字/关键短语query的精确匹配在 'Title' 中的字符串中找到精确匹配，并返回关键字在 in 中的字符串中的位置Title。如果关键字/短语不存在，则显示 0。预期产出example_data = pd.DataFrame(([['key word1', 'key word1'], ['key word2', 'Find key word2, not key word1 or key word3 in title']]), columns=['query', 'Title'])我的尝试example_data = pd.DataFrame(([['key word1', 'key word1'], ['key word2', 'Find key word2, not keyword1 or keyword3 in title']]), columns=['query', 'Title'])example_data['query_position'] = example_data['Title'].str.find(example_data['query'])我得到的错误是：TypeError：期望一个字符串对象，而不是系列我不完全确定如何迭代系列并将系列中的字符串值输入str.find().任何人的帮助都会很棒！

查看完整描述

5 回答

杨魅力

TA贡献1811条经验获得超6个赞

您还可以使用series.str.splitwithexpand=True转换为数据框，然后使用df.eq检查数据框是否与其他系列匹配：

example_data['position'] = (example_data['Title'].str.split(expand=True)

.eq(example_data['query']).idxmax(1)+1)

print(example_data)

query Title position

0 keyword1 keyword1 keyword2 keyword3 1

1 keyword1 keyword2 keyword1 keyword3 2

如果可能缺少匹配项，您可以使用：

m = example_data['Title'].str.split(expand=True)

c = m.eq(example_data['query'])

example_data['position'] = np.where(c.any(1),c.idxmax(1)+1,np.nan)

反对回复 2022-07-05

慕的地10843

TA贡献1785条经验获得超8个赞

使用.index但也检查匹配，如果没有返回匹配-1：

out = [b.split().index(a) + 1

if a in b

else -1

for a, b in zip(example_data['query'], example_data['Title'])]

print (out)

[1, 2]

example_data['query_position'] = out

反对回复 2022-07-05

慕田峪4524236

TA贡献1875条经验获得超5个赞

我找到的解决方案更 Pythonic 但有效。

str.find无法帮助，因为它将索引返回为字符数，而不是单词。

example_data['query_position'] = [len(t.split(q)[0].split(' ')) if len(t.split(q)) > 1 else 0 for t, q in zip(example_data['Title'].str.lower(), example_data['query'].str.lower())]

反对回复 2022-07-05

浮云间

TA贡献1829条经验获得超4个赞

如果我理解正确，您正在尝试创建一个新列，query_position它检查字符串是否query出现在中Title，然后给出位置。str.find()如果查询的字符串不存在于另一个字符串中，则该方法返回 -1。您已经说过，如果字符串不存在，您希望它返回 0，但如果您正在搜索的字符串存在并且位于 0 索引处，则可能会导致混淆。

如果您真的想将其设为零，那么我将使用以下方法解决问题str.find()：

# Quick custom function

def match_string(Title, query):

s = Title.find(query)

if s == -1:

return 0

else:

return s

# Use the .apply() function to create a new column using the custom function

example_data['query_position'] = example_data.apply(lambda x: match_string(x['Title'],

x['query']), axis=1)

如果您想保留 -1 原样，那么这是将该str.find()函数应用于您的数据框的方法：

example_data['query_position'] = example_data.apply(lambda x:str.find(x['Title'],

x['query']), axis=1)

反对回复 2022-07-05

30秒到达战场

TA贡献1828条经验获得超6个赞

我认为您希望有一个仅枚举如下行的列：

example_data['enum'] = range(example_data.count())

然后，如果您在标题字符串中找到查询字符串，只需像这样更新 row_id：

example_data['query_position'] = example_data.apply(lambda x: x['enum'] if x['Title'].contains(x['query']) else 0)

让我知道这是否有帮助！

反对回复 2022-07-05

5 回答
0 关注
227 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

Pandas 使用另一个系列作为查找值在一行中查找关键字的位置

Pandas 使用另一个系列作为查找值在一行中查找关键字的位置

5 回答

添加回答