如何在除#字符之外的任何标点符号和空格处拆分字符串?tweet="I went on #Russia to see the world cup. We lost!"我想这样分割下面的字符串:["I", "went", "to", "#Russia", "to, "see", "the", "world", "cup", "We","lost"]我的尝试:p = re.compile(r"\w+|[^\w\s]", re.UNICODE)由于它创建的是“ Russia”而不是“ #Russia”,因此不起作用
3 回答
守候你守候我
TA贡献1802条经验 获得超10个赞
具有re.findall功能:
tweet="I went on #Russia to see the world cup. We lost!"
words = re.findall(r'[\w#]+', tweet)
print(words)
输出:
['I', 'went', 'on', '#Russia', 'to', 'see', 'the', 'world', 'cup', 'We', 'lost']
牧羊人nacy
TA贡献1862条经验 获得超7个赞
使用 re.sub
前任:
import re
tweet="I went on #Russia to see the world cup. We lost!"
res = list(map(lambda x: re.sub("[^\w#]", "", x), tweet.split()))
print(res)
输出:
['I', 'went', 'on', '#Russia', 'to', 'see', 'the', 'world', 'cup', 'We', 'lost']
添加回答
举报
0/150
提交
取消