Machine Learning 简明教程

Machine Learning - Centroid-Based Clustering

基于质心的聚类是一类机器学习算法,它旨在根据数据点与每个集群质心的接近程度将数据集划分为组或集群。

Centroid-based clustering is a class of machine learning algorithms that aims to partition a dataset into groups or clusters based on the proximity of data points to the centroid of each cluster.

集群的质心是该集群中所有数据点的算术平均值,并且作为该集群的代表点。

The centroid of a cluster is the arithmetic mean of all the data points in that cluster and serves as a representative point for that cluster.

两种最流行的基于质心的聚类算法是 −

The two most popular centroid-based clustering algorithms are −

K-means Clustering

K-Means clustering 是一种流行的无监督机器学习算法,用于对数据进行聚类。它是一个简单、高效的算法,可以根据数据点的相似性将它们分组到 K 个簇中。该算法首先随机选择 K 个质心,它们是每个簇的初始中心。接下来,将每个数据点分配给质心最靠近的簇。然后,通过计算簇中所有数据点的均值来更新质心。重复此过程,直到质心不再移动或达到最大迭代次数。

K-Means clustering is a popular unsupervised machine learning algorithm used for clustering data. It is a simple and efficient algorithm that can group data points into K clusters based on their similarity. The algorithm works by first randomly selecting K centroids, which are the initial centers of each cluster. Each data point is then assigned to the cluster whose centroid is closest to it. The centroids are then updated by taking the mean of all the data points in the cluster. This process is repeated until the centroids no longer move or the maximum number of iterations is reached.

K-Medoids Clustering

K-medoids clustering 是一种基于划分的聚类算法,用于将一组数据点聚类到“k”个簇中。与使用数据点的均值来表示簇中心的 K 均值聚类不同,K 均值聚类使用一个称为质点的代表性数据点来表示簇中心。质点是数据点,它最大程度地减少了它与簇中所有其他数据点之间的距离之和。这使得 K 均值聚类比 K 均值聚类对离群值和噪声更有鲁棒性。

K-medoids clustering is a partition-based clustering algorithm that is used to cluster a set of data points into "k" clusters. Unlike K-means clustering, which uses the mean value of the data points to represent the center of the cluster, K-medoids clustering uses a representative data point, called a medoid, to represent the center of the cluster. The medoid is the data point that minimizes the sum of the distances between it and all the other data points in the cluster. This makes K-medoids clustering more robust to outliers and noise than K-means clustering.

我们将在接下来的两章中讨论这两种聚类方法。

We will discuss these two clustering methods in the next two chapters.