Machine Learning 简明教程
Machine Learning - Semi-supervised
半监督机器学习算法既不是完全监督的也不是完全无监督的。它们基本上介于两者之间,即监督学习方法和无监督学习方法。
Semi-supervised machine learning algorithms are neither fully supervised nor fully unsupervised. They basically fall between the two, i.e., supervised and unsupervised learning methods.
半监督算法通常使用小型监督学习组件,即少量的预先标注数据和大型无监督学习组件,即大量的未标注数据用于训练。
Semi-supervised algorithms generally use small supervised learning component, i.e., small amount of pre-labeled annotated data and large unsupervised learning component, i.e., lots of unlabeled data for training.
我们可以遵循以下任何一种方法来实现半监督学习方法 −
We can follow any of the following approaches for implementing semi-supervised learning methods −
-
The first and simple approach is to build the supervised model based on a small labeled and annotated data and then build the unsupervised model by applying the same to the large amounts of unlabeled data to get more labeled samples. Now, train the model on them and repeat the process.
-
The second approach needs some extra efforts. In this approach, we can first use the unsupervised methods to cluster similar data samples, annotate these groups and then use a combination of this information to train the model.
该算法针对包含标注数据和未标注数据的数据集进行训练。当我们有一大组可用的未标注数据时,通常使用半监督学习。在任何监督学习算法中,可用的数据都必须进行手动标注,这可能是一个相当昂贵的过程。相比之下,无监督学习中使用的未标注数据具有有限的应用。因此,开发了无监督学习算法,可以在这两者之间提供完美的平衡。
The algorithm is trained on a dataset that contains both labeled and unlabeled data. Semi-supervised learning is generally used when we have a huge set of unlabeled data available. In any supervised learning algorithm, the available data has to be manually labelled which can be quite an expensive process. In contrast, the unlabelled data used in unsupervised learning has limited applications. Hence, unsupervised learning algorithms were developed which can provide a perfect balance between the two.
半监督学习算法在文本分类、图像分类、语音分析、异常检测等应用中找到了它的应用,其中一般目标是将实体分类为预定义的类别。半监督算法假设数据可以划分为离散的簇,并且彼此距离较近的数据点更有可能共享相同的输出标签。
Semi-Supervised Learning algorithm find its application in text classification, image classification, speech analysis, anomaly detection, etc. where the general goal is to classify an entity into a predefined category. Semi-supervised algorithm assumes that the data can be divided into discrete clusters and the data points closer to each other are more likely to share the same output label.