我想列出目录中的所有文本文件。然后我想在每个文件中创建单独的内容列表。例如 document1=[] 然后 document2=[] 等等。然后通过使用文档 1 和文档 2 关键字,我想计算词频和其他过程。代码正在运行,但不能为列表分配不同的名称,如 document1 等等。import globimport mathimport rea=0flist=glob.glob(r'D:/Final Year Project/Development process/Text_data_extraction/MyFolder/*.txt') #get all the files from the d`#open each file >> tokenize the content >> and store it in a setfor fname in flist: tfile=open(fname,"r") line=tfile.read() a+=1 line = line.lower() # lowercase line = re.sub("</?.*?>"," <> ",line) #remove tags line = re.sub("(\\d|\\W)+"," ",line) # remove special characters and digits l_ist = line.split("\n") print 'document' print(l_ist)tfile.close() # close the fileprint"Number of documents:"print(a)
添加回答
举报
0/150
提交
取消