情感分析逻辑回归的错误输入形状

我想使用逻辑回归预测情感分析模型的准确性，但出现错误：错误的输入形状（使用输入进行编辑）数据框：dfsentence | polarity_labelnew release! | positivebuy | neutralleast good-looking | negative代码：from sklearn.preprocessing import OneHotEncoder from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, ENGLISH_STOP_WORDS# Define the set of stop wordsmy_stop_words = ENGLISH_STOP_WORDSvect = CountVectorizer(max_features=5000,stop_words=my_stop_words)vect.fit(df.sentence)X = vect.transform(df.sentence)y = df.polarity_labelencoder = OneHotEncoder()encoder.fit_transform(y)from sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=123)LogisticRegression(penalty='l2',C=1.0)log_reg = LogisticRegression().fit(X_train, y_train)错误信息ValueError: Expected 2D array, got 1D array instead:array=['Neutral' 'Positive' 'Positive' ... 'Neutral' 'Neutral' 'Neutral'].Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.```How can I fix this?

查看完整描述

2 回答

子衿沉夜

TA贡献1828条经验获得超3个赞

我认为你需要将 y 标签转换为 One hot 编码，现在你的标签向量可能是这样的 [0,1,0,0,1,0]，但是对于逻辑回归，你需要将它们转换为这种形式 [ [0,1],[1,0],[0,1],[0,1]]，因为在逻辑回归中我们倾向于计算所有类别的概率/似然。

您可以使用 sklearn onehotencoder来做到这一点，

from sklearn.preprocessing import OneHotEncoder                                                   
encoder = OneHotEncoder()
encoder.fit_transform(y)

反对回复 2023-07-27

大话西游666

TA贡献1817条经验获得超14个赞

调整您的代码，例如：

y = df.polarity_label

当前，您正在尝试使用 CountVectorizer 将 y 转换为向量，该向量是根据句子数据进行训练的。

所以 CountVectorizer 有这个词汇表（你可以使用获得它vect.get_feature_names()）：

['购买'、'好'、'看起来'、'新'、'发布']

并将包含这些单词的一些文本转换为向量。

但是，当您在只有单词的 y 上使用它时positive, neutral, negative，它找不到任何“已知”单词，因此您的 y 为空。

如果您在转换后检查 y，您还可以看到它是空的：

<3x5 sparse matrix of type '<class 'numpy.int64'>'
    with 0 stored elements in Compressed Sparse Row format>

反对回复 2023-07-27

热搜

最近搜索清空

情感分析逻辑回归的错误输入形状

情感分析逻辑回归的错误输入形状

2 回答

添加回答