为了账号安全,请及时绑定邮箱和手机立即绑定

提取数字和单词之间的文本

提取数字和单词之间的文本

森栏 2021-06-07 16:01:21
我有一个包含以下内容的文件:01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  01009800  Motorola  Motorola T194 EOTD  GSM 1900  01009900  Option International  ,GSM 900  01009901  Option International  ,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  0,GSM 900 使用正则表达式,我试图从 8 位数字和第一次 GSM 出现之前提取任何内容,例如:01009700  Samsung  Samsung SGH-N62501009800  Motorola  Motorola T194 EOTD01009900  Option International01009902  Option International01009919  Option International01010000  Sierra Wireless Sierra Wireless Aircard01010100  Sierra Wireless Sierra Wireless Aircard我试过了,\d{8}.+(GSM)?但似乎不起作用。什么是正确的正则表达式?
查看完整描述

1 回答

?
暮色呼如

TA贡献1853条经验 获得超9个赞

您可以使用

re.findall(r'\b(\d{8}.*?)\W*GSM', s)

查看正则表达式演示

细节

  • \b - 字边界(

  • (\d{8}.*?) - 第 1 组:八位数字,然后是除换行符以外的任何 0+ 字符,尽可能少

  • \W* - 任何 0+ 个非单词字符

  • GSM- 一个GSM子串。

Python 演示

import re

s="""01009700  Samsung  Samsung SGH-N625  GSM 1900,GSM 900  

01009800  Motorola  Motorola T194 EOTD  GSM 1900  


01009900  Option International  

,GSM 900  

01009901  Option International  


,GSM 1900,GSM 900 01009902 Option International ,GSM 1900,GSM 900 01009903 Option International ,GSM 1900,GSM 900 01009904 Option International ,GSM 1900,GSM 900 01009905 Option International ,GSM 1900,GSM 900 01009906 Option International ,GSM 1900,GSM 900 01009907 Option International ,GSM 1900,GSM 900 01009908 Option International ,GSM 1900,GSM 900 01009909 Option International ,GSM 1900,GSM 900 01009910 Option International ,GSM 1900,GSM 900 01009911 Option International ,GSM 1900,GSM 900 01009912 Option International ,GSM 1900,GSM 900 01009913 Option International ,GSM 1900,GSM 900 01009914 Option International ,GSM 1900,GSM 900 01009915 Option International ,GSM 1900,GSM 900 01009916 Option International ,GSM 1900,GSM 900 01009917 Option International ,GSM 1900,GSM 900 01009918 Option International ,GSM 1900,GSM 900 01009919 Option International ,GSM 1900,GSM 900 

Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 Option Internati. Globetrotter GSM 1800 

01010000  Sierra Wireless Sierra Wireless Aircard 710  GSM 1900  

01010100  Sierra Wireless Sierra Wireless Aircard 750  GSM 1800,GSM 190  

0,GSM 900 """

print(re.findall(r"\b(\d{8}.*?)\W*GSM", s))

输出:


['01009700  Samsung  Samsung SGH-N625', '01009800  Motorola  Motorola T194 EOTD', '01009900  Option International', '01009901  Option International', '01009902 Option International', '01009903 Option International', '01009904 Option International', '01009905 Option International', '01009906 Option International', '01009907 Option International', '01009908 Option International', '01009909 Option International', '01009910 Option International', '01009911 Option International', '01009912 Option International', '01009913 Option International', '01009914 Option International', '01009915 Option International', '01009916 Option International', '01009917 Option International', '01009918 Option International', '01009919 Option International', '01010000  Sierra Wireless Sierra Wireless Aircard 710', '01010100  Sierra Wireless Sierra Wireless Aircard 750']



查看完整回答
反对 回复 2021-06-09
  • 1 回答
  • 0 关注
  • 148 浏览
慕课专栏
更多

添加回答

举报

0/150
提交
取消
意见反馈 帮助中心 APP下载
官方微信