首页手记 ML.NET教程之情感分析(二元分类问题)

ML.NET教程之情感分析(二元分类问题)

标签：

机器学习

理解问题

本教程需要解决的问题是根据网站内评论的意见采取合适的行动。

可用的训练数据集中，网站评论可能是有毒(toxic)(1)或者无毒(not toxic)(0)两种类型。这种场景下，机器学习中的分类任务最为适合。

分类任务用于区分数据内的类别(category)，类型(type)或种类(class)。常见的例子有：

识别情感是正面或是负面
将邮件按照是否为垃圾邮件归类
判定病人的实验室样本是否为癌症
按照客户的偏好进行分类以响应销售活动

分类任务可以是二元又或是多元的。这里面临的是二元分类的问题。

准备数据

首先建立一个控制台应用程序，基于.NET Core。完成搭建后，添加Microsoft.ML类库包。接着在工程下新建名为Data的文件夹。

之后，下载WikiPedia-detox-250-line-data.tsv与wikipedia-detox-250-line-test.tsv文件，并将它们放入Data文件夹，值得注意的是，这两个文件的Copy to Output Directory属性需要修改成Copy if newer。

加载数据

在Program.cs文件的Main方法里加入以下代码：

MLContext mlContext = new MLContext(seed: 0);

_textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()
{
    Separator = "tab",
    HasHeader = true,
    Column = new[]
                {                    new TextLoader.Column("Label", DataKind.Bool, 0),                    new TextLoader.Column("SentimentText", DataKind.Text, 1)
                }
});

其目的是通过使用TextLoader类为数据的加载作好准备。

Column属性中构建了两个对象，即对应数据集中的两列数据。不过第一列这里必须使用Label而不是Sentiment。

提取特征

新建一个SentimentData.cs文件，其中加入SentimentData类与SentimentPrediction。

public class SentimentData{
    [Column(ordinal: "0", name: "Label")]    public float Sentiment;
    [Column(ordinal: "1")]    public string SentimentText;
}public class SentimentPrediction{
    [ColumnName("PredictedLabel")]    public bool Prediction { get; set; }

    [ColumnName("Probability")]    public float Probability { get; set; }

    [ColumnName("Score")]    public float Score { get; set; }
}

SentimentData类中的SentimentText为输入数据集的特征，Sentiment则是数据集的标记(label)。

SentimentPrediction类用于模型被训练后的预测。

训练模型

在Program类中加入Train方法。首先它会读取训练数据集，接着将特征列中的文本型数据转换为浮点型数组并设定了训练时所使用的决策树二元分类模型。之后，即是实际训练模型。

public static ITransformer Train(MLContext mlContext, string dataPath){
    IDataView dataView = _textLoader.Read(dataPath);    var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
        .Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20));

    Console.WriteLine("=============== Create and Train the Model ===============");    var model = pipeline.Fit(dataView);
    Console.WriteLine("=============== End of training ===============");
    Console.WriteLine();    return model;
}

评估模型

加入Evaluate方法。到了这一步，需要读取的是用于测试的数据集，且读取后的数据仍然需要转换成合适的数据类型。

public static void Evaluate(MLContext mlContext, ITransformer model){
    IDataView dataView = _textLoader.Read(_testDataPath);
    Console.WriteLine("=============== Evaluating Model accuracy with Test data===============");    var predictions = model.Transform(dataView);    var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
    Console.WriteLine();
    Console.WriteLine("Model quality metrics evaluation");
    Console.WriteLine("--------------------------------");
    Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
    Console.WriteLine($"Auc: {metrics.Auc:P2}");
    Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
    Console.WriteLine("=============== End of model evaluation ===============");
}

使用模型

训练及评估模型完成后，就可以正式使用它了。这里需要建立一个用于预测的对象(PredictionFunction)，其预测方法的输入参数是SentimentData类型，返回结果为SentimentPrediction类型。

private static void Predict(MLContext mlContext, ITransformer model){    var predictionFunction = model.MakePredictionFunction<SentimentData, SentimentPrediction>(mlContext);
    SentimentData sampleStatement = new SentimentData
    {
        SentimentText = "This is a very rude movie"
    };    var resultprediction = predictionFunction.Predict(sampleStatement);

    Console.WriteLine();
    Console.WriteLine("=============== Prediction Test of model with a single sample and test dataset ===============");

    Console.WriteLine();
    Console.WriteLine($"Sentiment: {sampleStatement.SentimentText} | Prediction: {(Convert.ToBoolean(resultprediction.Prediction) ? "Toxic" : "Not Toxic")} | Probability: {resultprediction.Probability} ");
    Console.WriteLine("=============== End of Predictions ===============");
    Console.WriteLine();
}

完整示例代码

using System;using System.Collections.Generic;using System.IO;using System.Linq;using Microsoft.ML;using Microsoft.ML.Core.Data;using Microsoft.ML.Runtime.Data;using Microsoft.ML.Transforms.Text;namespace SentimentAnalysis{    class Program
    {        static readonly string _trainDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "wikipedia-detox-250-line-data.tsv");        static readonly string _testDataPath = Path.Combine(Environment.CurrentDirectory, "Data", "wikipedia-detox-250-line-test.tsv");        static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");        static TextLoader _textLoader;        static void Main(string[] args)        {
            MLContext mlContext = new MLContext(seed: 0);

            _textLoader = mlContext.Data.TextReader(new TextLoader.Arguments()
            {
                Separator = "tab",
                HasHeader = true,
                Column = new[]
                            {                                new TextLoader.Column("Label", DataKind.Bool, 0),                                new TextLoader.Column("SentimentText", DataKind.Text, 1)
                            }
            });            var model = Train(mlContext, _trainDataPath);

            Evaluate(mlContext, model);

            Predict(mlContext, model);

            Console.Read();
        }        public static ITransformer Train(MLContext mlContext, string dataPath)        {
            IDataView dataView = _textLoader.Read(dataPath);            var pipeline = mlContext.Transforms.Text.FeaturizeText("SentimentText", "Features")
                .Append(mlContext.BinaryClassification.Trainers.FastTree(numLeaves: 50, numTrees: 50, minDatapointsInLeaves: 20));

            Console.WriteLine("=============== Create and Train the Model ===============");            var model = pipeline.Fit(dataView);
            Console.WriteLine("=============== End of training ===============");
            Console.WriteLine();            return model;
        }        public static void Evaluate(MLContext mlContext, ITransformer model)        {
            IDataView dataView = _textLoader.Read(_testDataPath);
            Console.WriteLine("=============== Evaluating Model accuracy with Test data===============");            var predictions = model.Transform(dataView);            var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
            Console.WriteLine();
            Console.WriteLine("Model quality metrics evaluation");
            Console.WriteLine("--------------------------------");
            Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
            Console.WriteLine($"Auc: {metrics.Auc:P2}");
            Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
            Console.WriteLine("=============== End of model evaluation ===============");
        }        private static void Predict(MLContext mlContext, ITransformer model)        {            var predictionFunction = model.MakePredictionFunction<SentimentData, SentimentPrediction>(mlContext);
            SentimentData sampleStatement = new SentimentData
            {
                SentimentText = "This is a very rude movie"
            };            var resultprediction = predictionFunction.Predict(sampleStatement);

            Console.WriteLine();
            Console.WriteLine("=============== Prediction Test of model with a single sample and test dataset ===============");

            Console.WriteLine();
            Console.WriteLine($"Sentiment: {sampleStatement.SentimentText} | Prediction: {(Convert.ToBoolean(resultprediction.Prediction) ? "Toxic" : "Not Toxic")} | Probability: {resultprediction.Probability} ");
            Console.WriteLine("=============== End of Predictions ===============");
            Console.WriteLine();
        }
    }
}

程序运行后显示的结果：

=============== Create and Train the Model ===============
=============== End of training ===============

=============== Evaluating Model accuracy with Test data===============

Model quality metrics evaluation
--------------------------------
Accuracy: 83.33%
Auc: 98.77%
F1Score: 85.71%
=============== End of model evaluation ===============

=============== Prediction Test of model with a single sample and test dataset ===============

Sentiment: This is a very rude movie | Prediction: Toxic | Probability: 0.7387648
=============== End of Predictions ===============

可以看到在预测This is a very rude movie(这是一部粗制滥造的电影)这句评论时，模型判定其是有毒的:-)

原文出处：https://www.cnblogs.com/kenwoo/p/10093362.html

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

蝴蝶不菲

手记
篇

粉丝

82

获赞与收藏

388

关注作者，订阅最新文章

阅读免费教程

后端通用面试教程

41个小节 32882 371

网络编程入门教程

20个小节 13640 256

Pandas 入门教程

25个小节 20282 387

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空