Machine Learning 简明教程
Machine Learning - Getting Started
近年来,机器学习已成为一个越来越重要的主题,因为企业和个人生成的数据量继续呈指数级增长。从自动驾驶汽车到流媒体平台上的个性化推荐,机器学习算法现在被用于广泛的应用程序中。
Machine learning has become an increasingly important topic in recent years as the amount of data generated by businesses and individuals continues to grow at an exponential rate. From self-driving cars to personalized recommendations on streaming platforms, machine learning algorithms are now used in a wide range of applications.
让我们探索机器学习到底是什么。
Let’s explore what exactly machine learning is.
What is Machine learning?
机器学习是人工智能的一个子集;顾名思义,它被定义为机器学习表现出“智能行为”的能力,就像人类一样。机器学习使用在数据集上训练的算法来理解数据中的模式,并创建能够预测结果的自学习模型。
Machine learning is a subset of Artificial Intelligence; as the name suggests, it is defined as the capability of a machine to learn to exhibit "intelligent behavior" like humans. Machine learning uses algorithms that are trained on datasets to understand patterns in the data and to create self-learning models that are capable of predicting outcomes.
Types of Machine Learning
我们可以将机器学习算法分为三种不同的类型 - 监督式、非监督式和强化学习。让我们详细讨论这三种类型 −
We can categorize the machine learning algorithms into three different types - supervised, unsupervised, and reinforcement learning. Let’s discuss these three types in detail −
Supervised Learning
监督式学习使用标记数据集来训练算法以了解数据模式并预测结果。例如,将邮件过滤到收件箱或垃圾邮件文件夹。
Supervised learning that uses labeled dataset to train algorithms to understand data patterns and predict outcomes. For example, filtering a mail into inbox or spam folder.
监督式学习还可以分为两类:分类和回归。
The supervised learning further can be classified into two types − classification and regression.
有不同的监督式学习算法被广泛使用 −
There are different supervised learning algorithms that are widely used −
-
Linear Discriminant Analysis
Unsupervised Learning
非监督式学习是一种机器学习,使用未标记数据集来发现模式,而没有任何明确的指导或指令。例如,客户细分,即根据相似性将公司的客户分为不同的组。
Unsupervised learning is a type of Machine learning that uses unlabeled dataset to discover patterns without any explicit guidance or instruction. For example, customer segmentation i.e, dividing a company’s customers into groups that reflect similarity.
此外,我们可以将非监督式学习算法分为三类:聚类、关联和降维。
Further, we can classify the unsupervised learning algorithms into three types − clustering, association, and dimensionality reduction.
以下是一些常用的非监督式学习算法 −
Followings are some commonly used unsupervised learning algorithms −
-
Autoencoder
-
Restricted Boltzmann machine (RBM)
Reinforcement Learning
强化学习算法在数据集上进行训练,以通过最小化试错法做出决策并实现优化结果。例如,机器人技术。
Reinforcement learning algorithms are trained on datasets to make decisions and achieve optimized results by minimizing the trial and error method. For example, Robotics.
以下是一些常见的强化学习算法 −
Following are some common reinforcement learning algorithms −
-
Q-learning
-
Markov Decision Process (MDP)
-
SARSA
-
DQN
-
DDPG
Use Cases of Machine Learning
让我们讨论一下不同类型的机器学习算法的一些重要的实际用例
Let’s discuss some important real-life use cases of different types of machine learning algorithms
Supervised Learning
以下是监督式学习的一些实际用例 −
Following are some real-life use cases of supervised learning −
-
Image Classification
-
Spam Filtering
-
House Price Prediction
-
Signature Recognition
-
Weather Forecasting
-
Stock price prediction
Prerequisites to Get Started
若要开始使用机器学习,您应该对计算机科学基础知识有一些基本的了解。除了基本的计算机科学知识外,您还应熟悉以下内容 -
To get started with machine learning, you should have some basic understanding of computer science fundamentals. Along with basic computer science, you should be familiar with the following −
-
Programming languages
-
Libraries and Packages
-
Mathematics and statistics
让我们逐个讨论上述三个先决条件。
Let’s discuss the above three prerequisites one by one.
Programming Languages: Python or R
有很多编程语言(如 C++、Java、Python、R、Julia 等)用于机器学习开发。您可以从您选择的任何编程语言开始。Python 编程广泛用于机器学习和数据科学。
There are many programming languages, such as C++, Java, Python, R, Julia, etc., that are used for machine learning development. You can start with any programming language of your choice. Python programming is widely used for machine learning and data science.
在本机器学习教程中,我们将使用 Python 和/或 R 编程来实现示例程序。
In this machine learning tutorial, we will be using Python and/ or R programming to implement the example programs.
在开始本教程之前,以下是需要介绍的一些基本主题 -
Following are some basic topics to cover before starting this tutorial −
-
Variables, basic data types
-
Data Structures: list, set, dictionaries
-
Loops and conditional statements
-
Functions
-
String formatting
-
Classes and Objects
.
Libraries and Packages
若要开始使用本机器学习教程,我们建议您熟悉一些库、包和模块,例如 NumPy、Pandas、Matplotlib 等。
To get started with this machine learning tutorial, we recommend getting familiar with some libraries, packages, and modules such as NumPy, Pandas, Matplotlib, etc.
由于在本教程中我们使用 Python 编程,因此您应该对以下库/包/模块有一些基本的了解 -
As we are using Python programming in this tutorial, you should have some basic understanding of the following libraries/ packages/ modules −
-
NumPy − for numeric computations.
-
Pandas − for data manipulation and preprocessing.
-
Scikit-learn − has implemented almost all the machine learning algorithms such as linear regression, logistic regression, k-means clustering, k-nearest neighbor, etc.
-
Matplotlib − for data visualization.
Mathematics and Statistics
数学与统计在开发机器学习和数据科学相关应用程序中起着重要作用。入门不需要高级数学知识,但有助于深入了解机器学习概念。
Mathematics and statistics play important role in developing machine learning and data science related applications. Advanced mathematics is not required to get started but it helps to understand the machine learning concepts in great detail.
在开始机器学习教程之前,通常建议熟悉以下主题 -
The following topics are generally recommended to get familiar with before getting started with machine learning tutorial −
-
Variables, coefficients, functions.
-
Linear equations, logarithm and logarithmic equations, sigmoid function.
-
Vector and matrix, matrix multiplication, dot product
-
tensor and tensor ranks
-
Mean, median, mode, outliers, and standard deviation
-
Ability to read a histogram
-
Probability, conditional probability, Bayes rules
-
Concept of a derivative, gradient, or slope
-
Partial derivatives
-
Chain rule
-
Trigonometric functions (specially tanh) used in activation functions
Getting started with Machine Learning
你可能想知道机器学习是否难以学习?答案绝对不难;你需要对数学、计算机科学和编码有深入的理解,并应紧跟人工智能趋势。好吧,精于机器学习是一些技术人员的梦想,但不知道从何下手,以下是一些可以帮助你入门步骤:
You might wonder if Machine learning is hard to learn? The answer would be absolutely not; you will require a strong understanding of mathematics, computer science and coding, and should keep up with the AI trends. Well, excelling in Machine learning is something that every technophile dreams of but does not know where to start, so here are a few steps that help you get started.
Step 1 − Learn Prerequisites
有一些前提为理解算法和机器学习模型如何工作奠定基础。从学习基础开始:
There are a few prerequisites that lay the foundation to understand how algorithms and machine learning models work. Start by learning the basics of:
-
Any programming language like Python or R.
-
Libraries and Packages
-
Mathematics and Statistics(Like Calculus, Linear Algebra and more)
Step 2 − Learn Machine Learning Fundamentals
在深入机器学习之前,重要的是对基础知识有深入了解。这包括了解不同类型的机器学习方法,如回归,分类,聚类,降维等
Before diving into machine learning, it’s important to have a solid understanding of the fundamentals. This includes learning about different types of Machine Learning methods such as regression, classification, clustering, dimensionality reduction, etc.
在此机器学习教程中,我们涵盖了从基础到高级的所有机器学习概念,以及其实现。你只需要开始按章节学习教程并继续练习编程示例。
In this Machine Learning tutorial, we have covered all the machine learning concepts from basics to advanced, along with their implementations. You just need to start learning the tutorial chapter-wise and keep practicing the programming examples.
Step 3 − Explore Machine Learning Algorithms
算法构成了机器学习的基础,使计算机能够观察数据模式并预测输出。探索和理解 Naive Bayes, Random Forest, Decision tree 等基本算法。这将帮助你了解算法的工作流程。
Algorithms form the foundation of Machine learning, allowing computers to observe data patterns and predict output. Explore and understand essential algorithms like Naive Bayes, Random Forest, Decision tree, etc. This will help you understand the working flow of an algorithm.
Step 4 − Choose a Machine Learning Framework/ Library
机器学习有不同的工具,框架,软件和平台。具有挑战性的任务是根据你的模型选择最佳工具。机器学习工具的精通使你能够使用数据,训练你的模型,发现新方法并创建算法。一些常用的机器学习工具是 Scikit-learn, TensorFlow, PyTorch, 等等。
There are different tools, frameworks, software, and platforms for Machine learning. The challenging task is to select the best tool as per your model. Mastering machine learning tools enables you to work with data, train your model, discover new methods, and create algorithms. Some commonly used Machine learning tools are Scikit-learn, TensorFlow, PyTorch, and many more.
除了工具和算法之外,对 NumPy、SciPy、Matplotlib 等库有很好的掌握,将在你的机器学习之旅中为你提供帮助。
In addition to the tools and algorithms having a good grip on libraries like NumPy, SciPy, Matplotlib, etc., serves you well in your Machine Learning journey.
Step 5 − Practice with Real Data
数据集是任何机器学习算法的主干。这涉及将大量数据分组到一个集合中。数据集用于训练和测试算法,分析模式和获得见解。
Dataset is the backbone of any Machine Learning algorithm. This involves a large amount of data grouped into a collection. Datasets are used to train and test algorithms, analyze patterns, and gain insights.
有很多网站,如 Scikit-learn, TensorFlow, PyTorch, ,Google Dataset 搜索等提供公开可用的数据集。
There are many websites like Kaggle, Google Dataset search, and others that provide publicly available datasets.
Step 6 − Build Your Own Projects
在掌握基础知识后,是时候创建你自己的项目并选择你提出的问题陈述。这将帮助你应用你到目前为止所学到的东西,并将进一步发展你的技能。
After mastering the basics, it’s time to create your own project with a problem statement that you choose. This will help you apply what you have learned so far and will develop your skills further.
你可以从使用预处理数据集的简单算法(如分类或推荐系统)开始,然后在感到舒服后转到开发复杂算法。
You can start with simple algorithms like classification or recommendation systems using pre-processed dataset, then move to developing complex algorithms once you are comfortable.
Step 7 − Participate in Machine Learning Communities
加入机器学习社区,如 Github ,这是一个与具有类似兴趣的人联系的好方法。通过这些社区,你将有机会向他人学习,分享经验,并获得对你的项目的反馈。这有助于你保持学习和成长的动力。
Join machine learning communities like Github, which is a great way to connect with people with similar interests as you. Through these communities, you will get a chance to learn from others, share experiences, and get feedback on your projects. This helps you stay motivated to learn and grow.