3 回答
TA贡献1868条经验 获得超4个赞
你可以试试这个队友
^([a-z, \(\)-]*?)?\(?([\d,]+)?\)?\s*?\(?([\d,-]+)?\)?$
解释
^
- 锚定到字符串的开头。([a-z, \(\)-]+?)?
- 匹配任何字符 a 到 z,或,
or(
或 ')` 或 '-' 零次或多次(懒惰模式)。\(?
- 匹配(
(?
使其成为可选)。([\d,]+)?- 匹配任何数字或
,
一次或多次。(?
使其成为可选)。\)
- 匹配)
。\s*?
- 匹配空间零次或多次。(?([\d,-]+)?\)?
- 匹配任何数字或-
。$
- 字符串结束。
TA贡献1783条经验 获得超4个赞
我认为这个正则表达式会做你想做的:
^([A-Z][A-Za-z0-9 (),%;-]+?[^(\d\s])? ?(?:(\(?[\d,]+\)?|-)\s+(\(?[\d,]+\)?|-))?$
它查找一组字母字符,以字母开头,可能包括一些[(),%;-],但不以 a (、数字或空格结尾,后跟两组可能()包围的数字和,或-。所有组都是可选的,以允许匹配没有描述或没有数字的行。
在 Python 中:
import re
data = """LOSS BEFORE INCOME TAXES (900,000) (900,000)
INCOME TAXES (RECOVERED) (90,000) (90,000)
RETAINED EARNINGS - BEGINNING OF YEAR 9,999,999 9,999,999
EXPENSES
Subcontracts 8,058 2,655
Business taxes 116 -
600,000 600,000
GROSS PROFIT (50%; 2016 - 50%) 500,000 500,000
Bad debts - 50
Salaries, wages and benefits 400,000 400,000"""
regex = re.compile('^([A-Z][A-Za-z0-9 (),%;-]+?[^(\d\s])? ?(?:(\(?[\d,]+\)?|-)\s+(\(?[\d,]+\)?|-))?$', re.MULTILINE)
print regex.findall(data)
输出:
[('LOSS BEFORE INCOME TAXES', '(900,000)', '(900,000)'),
('INCOME TAXES (RECOVERED)', '(90,000)', '(90,000)'),
('RETAINED EARNINGS - BEGINNING OF YEAR', '9,999,999', '9,999,999'),
('EXPENSES', '', ''),
('Subcontracts', '8,058', '2,655'),
('Business taxes', '116', '-'),
('', '600,000', '600,000'),
('GROSS PROFIT (50%; 2016 - 50%)', '500,000', '500,000'),
('Bad debts', '-', '50'),
('Salaries, wages and benefits', '400,000', '400,000')
]
添加回答
举报