首页猿问 numpy...

numpy 中的多个插入，其中配对元素没有潜台词

Python

守着一只汪 2021-11-23 19:24:39

这个问题是在@ecortazar 回答的上一篇帖子的后续问题。但是，我还想在不包含特定字符串的 pd.Series 中的两个元素之间粘贴，仅使用 Pandas / Numpy。注：href文中带的所有行均不同。import pandas as pdimport numpy as nptable = pd.Series( ["<td class='test'>AA</td>", # 0 "<td class='test'>A</td>", # 1 "<td class='test'><a class='test' href=...", # 2 "<td class='test'>B</td>", # 3 "<td class='test'><a class='test' href=...", # 4 "<td class='test'>BB</td>", # 5 "<td class='test'>C</td>", # 6 "<td class='test'><a class='test' href=...", # 7 "<td class='test'>F</td>", # 8 "<td class='test'>G</td>", # 9 "<td class='test'><a class='test' href=...", # 10 "<td class='test'>X</td>"]) # 11dups = ~table.str.contains('href') & table.shift(-1).str.contains('href') array = np.insert(table.values, dups[dups].index, "None")pd.Series(array)# OUTPUT:# 0 <td class='test'>AA</td># 1 None# 2 <td class='test'>A</td># 3 <td class='test'><a class='test' href=...# 4 None Incorrect# 5 <td class='test'>B</td># 6 <td class='test'><a class='test' href=...# 7 <td class='test'>BB</td># 8 None# 9 <td class='test'>C</td># 10 <td class='test'><a class='test' href=...# 11 <td class='test'>F</td># 12 None# 13 <td class='test'>G</td># 14 <td class='test'><a class='test' href=...# 15 <td class='test'>X</td>

查看完整描述

2 回答

SMILET

TA贡献1796条经验获得超4个赞

您可以执行与以前相同的程序。

唯一需要注意的是，您必须在换班前使用 not (~) 运算符。原因是这种转变将在您的系列的第一个位置创建一个 np.nan ，它将系列定义为浮点数，从而在 not 操作上失败。

import pandas as pd

import numpy as np

table = pd.Series(

["<td class='test'>AA</td>", # 0

"<td class='test'>A</td>", # 1

"<td class='test'><a class='test' href=...", # 2

"<td class='test'>B</td>", # 3

"<td class='test'><a class='test' href=...", # 4

"<td class='test'>BB</td>", # 5

"<td class='test'>C</td>", # 6

"<td class='test'><a class='test' href=...", # 7

"<td class='test'>F</td>", # 8

"<td class='test'>G</td>", # 9

"<td class='test'><a class='test' href=...", # 10

"<td class='test'>X</td>"]) # 11

not_contain = ~table.str.contains('href')

cond = not_contain & not_contain.shift(1)

array = np.insert(table.values, cond[cond].index, "None")

pd.Series(array)

反对回复 2021-11-23

眼眸繁星

TA贡献1873条经验获得超9个赞

这解决了上述问题，但没有 Numpy 和 Pandas。如果你能用他们重新创造，我会给你正确的答案。

import pandas as pd

import numpy as np

table = pd.Series(

["<td class='test'>AA</td>", # 0

"<td class='test'>A</td>", # 1

"<td class='test'><a class='test' href=...", # 2

"<td class='test'>B</td>", # 3

"<td class='test'><a class='test' href=...", # 4

"<td class='test'>BB</td>", # 5

"<td class='test'>C</td>", # 6

"<td class='test'><a class='test' href=...", # 7

"<td class='test'>F</td>", # 8

"<td class='test'>G</td>", # 9

"<td class='test'><a class='test' href=...", # 10

"<td class='test'>X</td>"]) # 11

insertAt = []

for i in range(0, len(table)-1):

# print('count ', i)

if i == 1:

if 'href' not in table[0] and 'href' not in table[1]:

print(i, ' starts with tag')

print(i, ' is duplicated')

insertAt.append(True)

elif 'href' not in table[0] and 'href' in table[1]:

print(i, ' not start with tag')

print(i, ' is not duplicated')

insertAt.append(True)

insertAt.append(False)

else:

print(i, ' not start with tag')

print(i, ' is not duplicated')

insertAt.append(False)

if i > 1:

if 'href' not in table[i-1] and 'href' not in table[i]:

print(i + 1, ' is duplicated')

insertAt.append(True)

else:

print(i + 1, ' is not duplicated')

insertAt.append(False)

insertAt = pd.Series(insertAt)

array = np.insert(table.values, insertAt[insertAt].index, "None")

pd.Series(array) # back to series if necessary

反对回复 2021-11-23

2 回答
0 关注
142 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

numpy 中的多个插入，其中配对元素没有潜台词

numpy 中的多个插入，其中配对元素没有潜台词

2 回答

添加回答