已解决430363个问题，去搜搜看，总会有你想问的

如何通过Twitter API使用python格式化推文？

首页猿问如何通过Twitter...

Python

幕布斯6054654 2021-03-19 14:15:10

我通过Twitter API收集了一些推文。然后我数了split(' ')在python中使用的单词。但是，有些单词如下所示：correct! correct.,correctblah"...那么，如何格式化不带标点符号的推文呢？或者，也许我应该尝试另一种split推文方式？谢谢。

查看完整描述

3 回答

斯蒂芬大帝

TA贡献1827条经验获得超8个赞

您可以使用re.split...

from string import punctuation

import re

puncrx = re.compile(r'[{}\s]'.format(re.escape(punctuation)))

print filter(None, puncrx.split(your_tweet))

或者，只查找包含某些连续字符的单词：

print re.findall(re.findall('[\w#@]+', s), your_tweet)

例如：

print re.findall(r'[\w@#]+', 'talking about #python with @someone is so much fun! Is there a 140 char limit? So not cool!')

# ['talking', 'about', '#python', 'with', '@someone', 'is', 'so', 'much', 'fun', 'Is', 'there', 'a', '140', 'char', 'limit', 'So', 'not', 'cool']

我最初在示例中确实有一个笑脸，但是当然这些最终都被这种方法过滤掉了，因此需要警惕。

反对回复 2021-03-23

江户川乱折腾

TA贡献1851条经验获得超5个赞

我建议使用以下代码从特殊符号中清除文本：

tweet_object["text"] = re.sub(u'[!?@#$.,#:\u2026]', '', tweet_object["text"])

您需要先导入re，然后再使用function sub

import re

反对回复 2021-03-23

关注

0/150

提交

取消

购课补贴
联系客服咨询优惠详情

慕课网APP
您的移动学习伙伴

扫描二维码
关注慕课网微信公众号