Machine Learning 简明教程

Machine Learning - Dimensionality Reduction

机器学习中的降维是减少数据集中的特征或变量数量的过程,同时尽可能多地保留原始信息。换句话说,这是一种通过降低数据复杂性来简化数据的方法。

Dimensionality reduction in machine learning is the process of reducing the number of features or variables in a dataset while retaining as much of the original information as possible. In other words, it is a way of simplifying the data by reducing its complexity.

当数据集具有大量的特征或变量时,就需要进行降维。特征过多会导致过度拟合并增加模型的复杂性。它还会使可视化数据变得困难并可能减慢训练过程。

The need for dimensionality reduction arises when a dataset has a large number of features or variables. Having too many features can lead to overfitting and increase the complexity of the model. It can also make it difficult to visualize the data and can slow down the training process.

有两种主要的降维方法:

There are two main approaches to dimensionality reduction −

Feature Selection

这涉及根据某些条件(例如其重要性或与目标变量的相关性)从原始特征中选择一个子集。

This involves selecting a subset of the original features based on certain criteria, such as their importance or relevance to the target variable.

以下是常用的一些特征选择技术:

The following are some commonly used feature selection techniques −

  1. Filter Methods

  2. Wrapper Methods

  3. Embedded Methods

Feature Extraction

特征提取是一个将原始数据转换为机器学习模型可以使用的一组有意义特征的过程。它涉及通过选择、合并或转换特征来降低输入数据的维度,以创建一组对机器学习模型更有用的新特征。

Feature extraction is a process of transforming raw data into a set of meaningful features that can be used for machine learning models. It involves reducing the dimensionality of the input data by selecting, combining or transforming features to create a new set of features that are more useful for the machine learning model.

降维可以提高机器学习模型的准确性和速度,减少过拟合,并简化数据可视化。

Dimensionality reduction can improve the accuracy and speed of machine learning models, reduce overfitting, and simplify data visualization.