我尝试使用 59k 行中的正则表达式来匹配字符串。当然,我期望结果是相同的 59k 行。但是结果仅返回前 10 行。我觉得这是一个愚蠢的问题,但仍然想知道这里出了什么问题。y = str(data[['geometry']])
z = re.findall("(?<=\()\d.*(?=\))", y)
3 回答
ibeautiful
TA贡献1993条经验 获得超5个赞
你可能str.findall
需要tolist()
前任:
data['geometry'].str.findall("(?<=\()\d.*(?=\))").tolist()
演示:
df = pd.DataFrame({'geometry': ['aa (123) bb (1.5)', 'aa (123) bb (1.5)', 'aa (123) bb (1.5)', 'aa (123) bb (1.5)']}) print(df['geometry'].str.findall("(?<=\()(\d.*?)(?=\))").tolist())
输出:
[['123', '1.5'], ['123', '1.5'], ['123', '1.5'], ['123', '1.5']]
慕桂英3389331
TA贡献2036条经验 获得超8个赞
我通过使用循环得到了自己的解决方案:
location = []
for l in data['geometry']:
latlon = re.findall("(?<=\()\d.*(?=\))", l)
location.append(z)
df_latlon = DataFrame(location)
df_latlon
繁花如伊
TA贡献2012条经验 获得超12个赞
用途str.extract
:
data['geometry'].str.extract(r'\((\d.*)\)', expand=False).tolist()
请参阅正则表达式演示
解释
EXPLANATION -------------------------------------------------------------------------------- \( '('-------------------------------------------------------------------------------- ( group and capture to \1: -------------------------------------------------------------------------------- \d digits (0-9) -------------------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- ) end of \1-------------------------------------------------------------------------------- \) ')'
添加回答
举报
0/150
提交
取消