2 回答
TA贡献2036条经验 获得超8个赞
请至少阅读评论:
from collections import Counter
from nltk import word_tokenize, ngrams
text='''Joli appartement s3 aux jardins de carthage mz823
Villa 600m2 haut standing à hammamet
Hammem lif
S2 manzah 7
Terrain constructible de 252m2 clôturé
Terrain nu a gammarth
Terrain agrecole al fahes
Bureau 17 pièces
Usine 5000m2 mannouba'''
# Create a counter object to track ngrams and counts.
ngram_counters = Counter()
# Split the text into sentences,
# For now, assume '\n' delimits the sentences.
for line in text.split('\n'):
# Update the counters with ngrams in each sentence,
ngram_counters.update(ngrams(word_tokenize(line), n=3))
# Opens a file to print out.
with open('ngram_counts.tsv', 'w') as fout:
# Iterate through the counter object, like a dictionary.
for ng, counts in ngram_counters.items():
# Use space to join the tokens in the ngrams before printing.
# Print the counts in a separate column.
print(' '.join(ng) +'\t' + str(counts), end='\n', file=fout)
添加回答
举报