首页猿问返回系列中元素的Python代码

返回系列中元素的Python代码

Python

DIEA 2023-05-09 09:45:47

我目前正在整理一个脚本，用于对抓取的推文进行主题建模，但我遇到了几个问题。我希望能够搜索一个词的所有实例，然后返回该词的所有实例，加上前后的词，以便为词的使用提供更好的上下文。我已经标记了所有推文，并将它们添加到一个系列中，其中相对索引位置用于识别周围的词。我目前拥有的代码是： myseries = pd.Series(["it", 'was', 'a', 'bright', 'cold', 'day', 'in', 'april'], index= [0,1,2,3,4,5,6,7]) def phrase(w): search_word= myseries[myseries == w].index[0] before = myseries[[search_word- 1]].index[0] after = myseries[[search_word+ 1]].index[0] print(myseries[before], myseries[search_word], myseries[after])该代码大部分工作，但如果搜索第一个或最后一个单词将返回错误，因为它超出系列的索引范围。有没有办法忽略超出范围的索引并简单地返回范围内的内容？当前代码也只返回搜索词前后的词。我希望能够在函数中输入一个数字，然后返回前后的一系列单词，但我当前的代码是硬编码的。有没有办法让它返回指定范围的元素？我在创建循环来搜索整个系列时也遇到了问题。根据我写的内容，它要么返回第一个元素而不返回任何其他元素，要么一遍又一遍地重复打印第一个元素而不是继续搜索。不断重复第一个元素的令人讨厌的代码是： def ws(word): for element in tokened_df: if word == element: search_word = tokened_df[tokened_df == word].index[0] before = tokened_df[[search_word - 1]].index[0] after = tokened_df[[search_word + 1]].index[0] print(tokened_df[before], word, tokened_df[after])显然我忽略了一些简单的东西，但我终究无法弄清楚它是什么。我如何修改代码，以便如果同一个词在系列中重复出现，它将返回该词的每个实例以及周围的词？我希望它的工作方式遵循“如果条件为真，则执行‘短语’功能，如果不为真，则继续执行系列”的逻辑。

查看完整描述

2 回答

红颜莎娜

TA贡献1842条经验获得超12个赞

是这样的吗？我在你的例子中添加了一个重复的词（“明亮”）。还添加了n_before和n_after输入周围单词的数量

import pandas as pd

myseries = pd.Series(["it", 'was', 'a', 'bright', 'bright', 'cold', 'day', 'in', 'april'],

index= [0,1,2,3,4,5,6,7,8])

def phrase(w, n_before=1, n_after=1):

search_words = myseries[myseries == w].index

for index in search_words:

start_index = max(index - n_before, 0)

end_index = min(index + n_after+1, myseries.shape[0])

print(myseries.iloc[start_index: end_index])

phrase("bright", n_before=2, n_after=3)

这给出：

1 was

2 a

3 bright

4 bright

5 cold

6 day

dtype: object

2 a

3 bright

4 bright

5 cold

6 day

7 in

dtype: object

反对回复 2023-05-09

茅侃侃

TA贡献1842条经验获得超21个赞

这不是很优雅，但您可能需要一些条件来说明出现在短语开头或结尾的单词。为了解释重复的单词，找到重复单词的所有实例并循环遍历您的打印语句。对于变量myseries，我重复了这个词cold两次，所以应该有两个打印语句

import pandas as pd

myseries = pd.Series(["it", 'was', 'a', 'cold', 'bright', 'cold', 'day', 'in', 'april'],

index= [0,1,2,3,4,5,6,7,8])

def phrase(w):

for i in myseries[myseries == w].index.tolist():

search_word= i

if search_word == 0:

print(myseries[search_word], myseries[i+1])

elif search_word == len(myseries)-1:

print(myseries[i-1], myseries[search_word])

else:

print(myseries[i-1], myseries[search_word], myseries[i+1])

输出：

>>> myseries

0 it

1 was

2 a

3 cold

4 bright

5 cold

6 day

7 in

8 april

dtype: object

>>> phrase("was")

it was a

>>> phrase("cold")

a cold bright

bright cold day

反对回复 2023-05-09

2 回答
0 关注
138 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

返回系列中元素的Python代码

返回系列中元素的Python代码

2 回答

添加回答