2 回答
TA贡献1900条经验 获得超5个赞
你的正则表达式几乎是正确的,但你必须考虑.
到点后面的数字和数字可能不存在。这可以这样实现:
\s+(\d+(?:\.\d+)?)\s+
不同之处在于,您可以通过在组后使用问号将其添加到可能存在或不存在\.\d+
的非捕获组中:(?:xxxx)
(?:xxxx)?
TA贡献1827条经验 获得超9个赞
我建议使用
res = re.match(r'^(?:(?!.*\d\.\d)(.*?)\s*\b(\d+(?:\s*mg)?)\b\s*(.*)|((?:(?!\d+\.\d).)*?)\s*\b(\d+\.\d+(?:\s*mg)?)\b\s*(.*))$', i)
if res:
all_extract.append(list(filter(None, res.groups())))
请参阅正则表达式演示。
没有注释代码的完整Python 演示:
import re
def show():
newresult = ['Naproxen 500 Active ingredient Ph Eur','Croscarmellose sodium 22.0 mg Disintegrant Ph Eur','Povidone K90 11.0 Binder 56 Ph Eur','Water, purifieda','Silica, colloidal anhydrous 2.62 Glidant Ph Eur','Water purified 49 Solvent Ph Eur','Magnesium stearate 1.38 Lubricant Ph Eur']
all_extract = []
for i in newresult:
res = re.match(r'^(?:(?!.*\d\.\d)(.*?)\s*\b(\d+(?:\s*mg)?)\b\s*(.*)|((?:(?!\d+\.\d).)*?)\s*\b(\d+\.\d+(?:\s*mg)?)\b\s*(.*))$', i)
if res:
all_extract.append(list(filter(None, res.groups())))
else:
print("ONLY INTEGER")
regex_integer_part = re.split(r'\s+(\d+(?:\.\d+)?)\s+', i, 1)
all_extract.append(regex_integer_part)
return all_extract
print(show())
产量
[['Naproxen', '500', 'Active ingredient Ph Eur'], ['Croscarmellose sodium', '22.0 mg', 'Disintegrant Ph Eur'], ['Povidone K90', '11.0', 'Binder 56 Ph Eur'], ['Water, purifieda'], ['Silica, colloidal anhydrous', '2.62', 'Glidant Ph Eur'], ['Water purified', '49', 'Solvent Ph Eur'], ['Magnesium stearate', '1.38', 'Lubricant Ph Eur']]
添加回答
举报