我正在尝试对一些电影评论数据运行分类器。数据已经被分成reviews_train.txt和reviews_test.txt。然后我加载数据并将每个数据分成评论和标签(正 (0) 或负 (1)),然后对这些数据进行矢量化。这是我的代码:from sklearn import treefrom sklearn.metrics import accuracy_scorefrom sklearn.feature_extraction.text import TfidfVectorizer#read the reviews and their polarities from a given filedef loadData(fname): reviews=[] labels=[] f=open(fname) for line in f: review,rating=line.strip().split('\t') reviews.append(review.lower()) labels.append(int(rating)) f.close() return reviews,labelsrev_train,labels_train=loadData('reviews_train.txt')rev_test,labels_test=loadData('reviews_test.txt')#vectorizing the inputvectorizer = TfidfVectorizer(ngram_range=(1,2))vectors_train = vectorizer.fit_transform(rev_train)vectors_test = vectorizer.fit_transform(rev_test)clf = tree.DecisionTreeClassifier()clf = clf.fit(vectors_train, labels_train)#predictionpred=clf.predict(vectors_test)#print accuracyprint (accuracy_score(pred,labels_test))但是我不断收到此错误:ValueError: Number of features of the model must match the input.Model n_features is 118686 and input n_features is 34169 我对 Python 很陌生,所以如果这是一个简单的修复,我提前道歉。
添加回答
举报
0/150
提交
取消