如何替换循环数组中的子字符串

我有以下数据集：test_columnAB1243847937BBHP111PG999-HP2221222HPHP3333-22HP111HP3939DN我想按照以下逻辑工作：查找测试列中的所有字母如果该字母字符串的长度大于 2 并且该字符串中有“HP”的实例，则将其从字符串的其余部分中删除一次。如果该字母字符串的长度大于 2 并且该字符串中没有“HP”的实例，则保留整个字符串。如果该字母字符串的长度小于或等于 2，则保留整个字符串。所以我想要的输出看起来像这样：desired_columnABBBHPPGHPHPDN我正在尝试循环，但未能成功生成所需的结果。for index,row in df.iterrows():target_value = row['test_column'] #arraypredefined_code = ['HP'] #array for code in re.findall("[a-zA-Z]+", target_value): #find all alphabets in the target_column if (len(code)>2) and not (code in predefined_code): possible_code = code if (len(code)>2) and (code in predefined_code): possible_code = possible_code.Select(code.replace(predefined_code,'',1)) if (len(code)<=2): possible_code = code

查看完整描述

1 回答

斯蒂芬大帝

TA贡献1827条经验获得超8个赞

由于案例是互斥且完备的，所以逻辑可以简化为

“对于长度 > 2 且包含 'HP' 的字母子字符串，删除第一个 'HP'，否则保留子字符串原样。”

首先使用正则表达式删除每个字符串的非字母部分，然后使用简单的 if-else 语句实现逻辑。

import pandas as pd

import re

df= pd.DataFrame({'test_column': ['AB124','3847937BB','HP111','PG999-HP222','1222HP','HP3333-22HP','111HP3939DN']})

for index,row in df.iterrows():

target_value = row['test_column'] #array

regex = re.compile("[^A-Z]")

code = regex.sub('',target_value)

if len(code) > 2 and 'HP' in code:

possible_code = code.replace('HP','',1)

else:

possible_code = code

print(possible_code)

根据需要提供：

反对回复 2021-07-13

热搜

最近搜索清空

如何替换循环数组中的子字符串

如何替换循环数组中的子字符串

1 回答

添加回答