3 回答
TA贡献1875条经验 获得超3个赞
您只是?在该(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)组之后遗漏了一个,以将其标记为不必要。另外,我+在最后两个组后面添加了一个后缀,以确保正则表达式不会将“ 2009年3月20日”之类的日期拆分为两个不同的日期。
完整代码:
import re
regEx = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)?[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)+(?:\d{2,4})+'
dateEntries = "04-20-2009; 04/20/09; 4/20/09; 4/3/09; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009; 20 Mar 2009; 20 March 2009; 2 Mar. 2009; 20 March, 2009; Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009; Feb 2009; Sep 2009; Oct 2010; 6/2008; 12/2009; 2009; 2010"
result = re.findall(regEx, dateEntries)
print(result)
如果您的日期有前导空格,则结果也将有前导空格。如果继续使用日期字符串,则可以使用以下方法将其删除:.strip()
TA贡献1982条经验 获得超2个赞
您的正则表达式模式是完全不可读的。请使用简单的构建块来构建您的正则表达式模式。这将使代码更具可读性
import re
import calendar
full_months = [month for month in calendar.month_name if month]
short_months = [d[:3] for d in full_months]
months = '|'.join(short_months + full_months)
sep = r'[.,]?\s+' # seperator
day = r'\d+'
year = r'\d+'
day_or_year = r'\d+(?:\w+)?'
r = re.compile(rf'(?:{day}{sep})?(?:{months}){sep}{day_or_year}(?:{sep}{year})?')
r.findall(dateEntries)
# ['Mar 20, 2009', 'March 20, 2009', 'Mar. 20, 2009', 'Mar 20 2009', '20 Mar 2009', '20 March 2009', '2 Mar. 2009', '20 March, 2009', 'Mar 20th, 2009', 'Mar 21st, 2009', 'Mar 22nd, 2009', 'Feb 2009', 'Sep 2009', 'Oct 2010']
TA贡献1799条经验 获得超9个赞
试试正则表达式:
^(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?(?:(?:(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)(?:(?:-|/)|(?:,|\.)?\s)?)?(?:\d{1,2}(?:(?:-|/)|(?:th|st|nd|rd)?\s))?)(?:\d{2,4})$
添加回答
举报