Machine Learning With Python 简明教程
Machine Learning - Ecosystem
Python 已成为机器学习最流行的编程语言之一,因为它简单、用途广泛,并拥有大量的库和工具生态系统。有很多种编程语言,例如 Java、C++、Lisp、Julia、Python 等,可用于机器学习。在这些语言中,Python 编程语言获得了极大的普及。
Python has become one of the most popular programming languages for machine learning due to its simplicity, versatility, and extensive ecosystem of libraries and tools. There are various programming languages such as Java, C++, Lisp, Julia, Python, etc., that can be used in machine learning. Among them, Python programming language has gained a huge popularity.
在这里,我们将探讨 Python 机器学习生态系统,重点介绍一些最流行的库和框架。
Here, we will explore the Python ecosystem for machine learning and highlight some of the most popular libraries and frameworks.
Python Machine Learning Ecosystem
机器学习生态系统是指用于开发机器学习应用程序的一系列工具和技术。Python 提供了各种库和工具,构成了 Python 机器学习生态系统中的组件。这些有用的组件使 Python 成为机器学习和数据科学的重要语言。尽管有许多这样的组件,但让我们在此讨论 Python 生态系统中一些重要的组件:
The machine learning ecosystem refers to the collection of tools and technologies that are used to develop the machine learning applications. Python provides various libraries and tools that form the components of Python machine learning ecosystem. These useful components make Python an important language for Machine Learning & Data Science. Though there are many such components, let us discuss some of the importance components of Python ecosystem here −
-
Programming Language: Python
-
Integrated Development Environment
-
Python Libraries
Programming Language: Python
编程语言是任何开发生态系统中的重要组件。Python 编程语言广泛用于机器学习和数据科学。
The programming languages such are the important components of any development ecosystem. Python programming language is extensively used in machine learning and data science.
让我们讨论为什么 Python 是机器学习的最佳选择。
Let’s discuss why Python is the best choice for machine learning.
Why Python for Machine Learning?
根据 Stack OverFlow 开发者调查 2023,Python 是第三大最流行编程语言,并且是最流行的机器学习和数据科学语言。下面是使 Python 成为数据科学语言的首选的特性:
According to Stack OverFlow Developer Survey 2023, Python is third most popular programming language as well as the most popular language for machine learning and data science. The following are the features of Python that makes it the preferred choice of language for data science −
Python 拥有庞大且功能强大的软件包集,可随时用于各个领域。它还拥有机器学习和数据科学所需的软件包,如 numpy, scipy, pandas, scikit-learn 等。
Python has an extensive and powerful set of packages which are ready to be used in various domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which are required for machine learning and data science.
Python 的另一个重要特性使其成为数据科学语言首选,那就是轻松而快速的原型制作。该特性对于开发新算法非常有用。
Another important feature of Python that makes it the choice of language for data science is the easy and fast prototyping. This feature is useful for developing new algorithm.
数据科学领域基本上需要良好的协作,而 Python 提供了许多有用的工具,极大地促进了协作。
The field of data science basically needs good collaboration and Python provides many useful tools that make this extremely.
典型的数据科学项目包括各个领域,如数据提取、数据处理、数据分析、特征提取、建模、评估、部署和更新解决方案。由于 Python 是一种多用途语言,它允许数据科学家从一个通用平台来处理所有这些领域。
A typical data science project includes various domains like data extraction, data manipulation, data analysis, feature extraction, modelling, evaluation, deployment and updating the solution. As Python is a multi-purpose language, it allows the data scientist to address all these domains from a common platform.
Strengths and Weaknesses of Python
每种编程语言都有一些优点和缺点,Python 也如此。
Every programming language has some strengths as well as weaknesses, so does Python too.
根据研究和调查,Python 是第五大最重要语言,也是机器学习和数据科学中最流行的语言。这是因为 Python 具有以下优势:
According to studies and surveys, Python is the fifth most important language as well as the most popular language for machine learning and data science. It is because of the following strengths that Python has −
Easy to learn and understand :Python 语法更简单;因此,即使是初学者也相对容易学习和理解该语言。
Easy to learn and understand − The syntax of Python is simpler; hence it is relatively easy, even for beginners also, to learn and understand the language.
Multi-purpose language :Python 是一种多用途编程语言,因为它支持结构化编程、面向对象编程以及函数式编程。
Multi-purpose language − Python is a multi-purpose programming language because it supports structured programming, object-oriented programming as well as functional programming.
Huge number of modules :Python 拥有大量模块,涵盖了编程的各个方面。这些模块很容易使用,因此使 Python 成为一种可扩展的语言。
Huge number of modules − Python has huge number of modules for covering every aspect of programming. These modules are easily available for use hence making Python an extensible language.
Support of open source community :作为开源编程语言,Python 得到非常庞大的开发人员社区的支持。因此,Python 社区可以轻松修复 bug。此特性使 Python 非常强大且具有适应性。
Support of open source community − As being open source programming language, Python is supported by a very large developer community. Due to this, the bugs are easily fixed by the Python community. This characteristic makes Python very robust and adaptive.
Scalability :Python 是一种可扩展编程语言,因为它提供了比 shell 脚本更好的支持大型程序的结构。
Scalability − Python is a scalable programming language because it provides an improved structure for supporting large programs than shell-scripts.
尽管 Python是一种流行且功能强大的编程语言,但它也有自己的弱点,即执行速度慢。
Although Python is a popular and powerful programming language, it has its own weakness of slow execution speed.
与编译型语言相比,Python 的执行速度较慢,因为 Python 是一种解释型语言。这可能是 Python 社区的主要改进领域。
The execution speed of Python is slow as compared to compiled languages because Python is an interpreted language. This can be the major area of improvement for Python community.
Installing Python
要使用 Python,我们必须先安装它。你可以使用以下两种方法之一来安装 Python:
For working in Python, we must first have to install it. You can perform the installation of Python in any of the following two ways −
-
Installing Python individually
-
Using Pre-packaged Python distribution − Anaconda
让我们详细讨论每一个。
Let us discuss these each in detail.
如果你想在计算机上安装 Python,则只需为你所在的平台下载适用的二进制代码即可。Python 发行版适用于 Windows、Linux 和 Mac 平台。
If you want to install Python on your computer, then then you need to download only the binary code applicable for your platform. Python distribution is available for Windows, Linux and Mac platforms.
以下是上述平台上安装 Python 的快速概述:
The following is a quick overview of installing Python on the above-mentioned platforms −
On Unix and Linux platform
On Unix and Linux platform
通过以下步骤,我们可以在 Unix 和 Linux 平台上安装 Python −
With the help of following steps, we can install Python on Unix and Linux platform −
-
First, go to www.python.org/downloads/.
-
Next, click on the link to download zipped source code available for Unix/Linux.
-
Now, Download and extract files.
-
Next, we can edit the Modules/Setup file if we want to customize some options. Next, write the command run ./configure script make make install
On Windows platform
On Windows platform
借助以下步骤,我们可以在 Windows 平台上安装 Python:
With the help of following steps, we can install Python on Windows platform −
-
First, go to www.python.org/downloads/.
-
Next, click on the link for Windows installer python-XYZ.msi file. Here XYZ is the version we wish to install.
-
Now, we must run the file that is downloaded. It will take us to the Python install wizard, which is easy to use. Now, accept the default settings and wait until the install is finished.
On Macintosh platform
On Macintosh platform
对于 Mac OS X,建议使用 Homebrew,一个易于使用的软件包安装程序来安装 Python 3。如果你没有 Homebrew,可以使用以下命令安装:
For Mac OS X, Homebrew, a great and easy to use package installer is recommended to install Python 3. In case if you don’t have Homebrew, you can install it with the help of following command −
$ ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"
可以使用以下命令更新:
It can be updated with the command below −
$ brew update
现在,要在你的系统上安装 Python3,我们需要运行以下命令:
Now, to install Python3 on your system, we need to run the following command −
$ brew install python3
Anaconda 是 Python 的一个打包编译,它具有在数据科学中广泛使用的所有库。我们可以按照以下步骤使用 Anaconda 设置 Python 环境:
Anaconda is a packaged compilation of Python which have all the libraries widely used in Data science. We can follow the following steps to setup Python environment using Anaconda −
-
Step 1 − First, we need to download the required installation package from Anaconda distribution. The link for the same is www.anaconda.com/distribution/. You can choose from Windows, Mac and Linux OS as per your requirement.
-
Step 2 − Next, select the Python version you want to install on your machine. The latest Python version is 3.7. There you will get the options for 64-bit and 32-bit Graphical installer both.
-
Step 3 − After selecting the OS and Python version, it will download the Anaconda installer on your computer. Now, double click the file and the installer will install Anaconda package.
-
Step 4 − For checking whether it is installed or not, open a command prompt and type Python.
你还可以通过 Python Essentials Online Training 中的详细视频讲座检查这一点。
You can also check this in detailed video lecture at Python Essentials Online Training.
Integrated Development Environment
集成开发环境(IDE)是一款软件工具,将标准开发工具组合成一个易用的单一用户界面(图形用户界面)。在机器学习和数据科学相关开发中使用着许多流行的 IDE。其中一些如下所示 −
An Integrated Development Environment (IDE) is a software tool that combines standard developer tools into a single user-friendly interface (Graphical User interface). There are many popular IDEs that are used in machine learning and data science related development. Some of them are as follow −
-
Jupyter Notebook
-
PyCharm
-
Visual Studio Code
-
Spyder
-
Sublime Text
-
Atom
-
Thonny
-
Google Colab Notebook
在此,我们将详细讨论 Jupyter 笔记本。你可以访问特定 IDE 的各自官方网站以获取更多详细信息,如如何下载、安装和使用它们。
Here, we will discuss in detail about the Jupyter notebook. You can visit to the respective official websites for the particular IDEs for more details such how to download, install and use them.
Jupyter Notebook
Jupyter 笔记本基本上提供了一个交互式计算环境,用于开发基于 Python 的数据科学应用程序。它们以前称为 iPython 笔记本。以下是 Jupyter 笔记本的一些特性,使其成为 Python ML 生态系统最佳组件之一 −
Jupyter notebooks basically provides an interactive computational environment for developing Python based Data Science applications. They are formerly known as ipython notebooks. The following are some of the features of Jupyter notebooks that makes it one of the best components of Python ML ecosystem −
-
Jupyter notebooks can illustrate the analysis process step by step by arranging the stuff like code, images, text, output etc. in a step by step manner.
-
It helps a data scientist to document the thought process while developing the analysis process.
-
One can also capture the result as the part of the notebook.
-
With the help of jupyter notebooks, we can share our work with a peer also.
如果你使用的是 Anaconda 发行版,那么你无需单独安装 Jupyter 笔记本,因为它已随附安装。你只需转到 Anaconda Prompt 并键入以下命令 −
If you are using Anaconda distribution, then you need not install jupyter notebook separately as it is already installed with it. You just need to go to Anaconda Prompt and type the following command −
C:\>jupyter notebook
按 Enter 后,它将在计算机的 localhost:8888 上启动一个笔记本服务器。它显示在以下屏幕截图中 −
After pressing enter, it will start a notebook server at localhost:8888 of your computer. It is shown in the following screen shot −
现在,在单击新选项卡后,你会得到一个选项列表。选择 Python 3,它将带你到新笔记本中开始工作。你可以在以下屏幕截图中看到它的预览 −
Now, after clicking the New tab, you will get a list of options. Select Python 3 and it will take you to the new notebook for start working in it. You will get a glimpse of it in the following screenshots −
另一方面,如果你使用的是标准 Python 发行版,则可以使用流行的 Python 软件包安装程序 pip 安装 Jupyter 笔记本。
On the other hand, if you are using standard Python distribution then jupyter notebook can be installed using popular python package installer, pip.
pip install jupyter
以下是 Jupyter 笔记本中的三种类型的单元格 −
The following are the three types of cells in a jupyter notebook −
Code cells − 顾名思义,我们可以使用这些单元格编写代码。在编写完代码/内容后,它会将其发送到与该笔记本关联的内核。
Code cells − As the name suggests, we can use these cells to write code. After writing the code/content, it will send it to the kernel that is associated with the notebook.
Markdown cells − 我们可以使用这些单元格来记录计算过程。它们可以包含诸如文本、图像、Latex 方程式、HTML 标签等内容。
Markdown cells − We can use these cells for notating the computation process. They can contain the stuff like text, images, Latex equations, HTML tags etc.
Raw cells − 其中编写的文本按原样显示。这些单元格基本上用于添加我们不希望被 Jupyter 笔记本的自动转换机制转换的文本。
Raw cells − The text written in them is displayed as it is. These cells are basically used to add the text that we do not wish to be converted by the automatic conversion mechanism of jupyter notebook.
有关 Jupyter 笔记本的更详细研究,你可以访问以下链接 www.tutorialspoint.com/jupyter/index.htm 。
For more detailed study of jupyter notebook, you can go to the link www.tutorialspoint.com/jupyter/index.htm.
Python Libraries and Packages
Python 生态系统拥有大量的库和软件包,可帮助开发人员轻松快速地构建机器学习模型。我们在下面讨论了其中的一些 −
Python ecosystem has a huge collection of libraries and packages that help developers to build easily and quickly machine learning models. We have discussed here some of them as follows −
NumPy
NumPy 是 Python 中科学计算的基础库。它为大型、多维数组和矩阵提供支持,以及一组用于在这些数组和矩阵上执行操作的数学函数。
NumPy is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on them.
NumPy 是 Python 机器学习生态系统的一个关键组成部分,因为它提供了许多机器学习算法所需的底层数据结构和数值运算。以下是安装 NumPy 的命令 −
NumPy is a critical component of the Python machine learning ecosystem, as it provides the underlying data structure and numerical operations required for many machine learning algorithms. Below is the command to install NumPy −
pip install numpy
Pandas
Pandas 是一个用于数据处理和分析的强大库。它提供了一系列用于导入、清理和转换数据的函数,连同分组和聚合数据的强大工具。
Pandas is a powerful library for data manipulation and analysis. It provides a range of functions for importing, cleaning, and transforming data, along with powerful tools for grouping and aggregating data.
Pandas 特别适用于机器学习中的数据预处理,因为它允许对数据进行高效的处理和操作。以下是如何安装 Pandas 的命令 −
Pandas is particularly useful for data preprocessing in machine learning, as it allows for efficient data handling and manipulation. Below is the command to install Pandas −
pip install pandas
Scikit-learn
Scikit-learn 是一个流行的 Python 机器学习库,提供了一系列用于分类、回归、聚类等的算法。它还包括用于数据预处理、特征选择和模型评估的工具。由于易用性、性能和广泛的文档,Scikit-learn 被广泛用于机器学习领域。
Scikit-learn is a popular machine learning library in Python, providing a range of algorithms for classification, regression, clustering, and more. It also includes tools for data preprocessing, feature selection, and model evaluation. Scikit-learn is widely used in the machine learning community due to its ease of use, performance, and extensive documentation.
以下是如何安装 Scikit-learn 的命令 −
Below is the command to install Scikit-learn −
pip install scikit-learn
TensorFlow
TensorFlow 是一个由 Google 开发的用于机器学习的开源库。它提供了构建和训练深度学习模型的支持,并提供分布式计算和部署的工具。TensorFlow 是构建复杂机器学习模型的强大工具,特别是在计算机视觉和自然语言处理领域。以下是如何安装 TensorFlow 的命令 −
TensorFlow is an open-source library for machine learning developed by Google. It provides support for building and training deep learning models, along with tools for distributed computing and deployment. TensorFlow is a powerful tool for building complex machine learning models, particularly in the areas of computer vision and natural language processing. Below is the command to install TensorFlow −
pip install tensorflow
PyTorch
PyTorch 是另一个流行的 Python 深度学习库。它由 Facebook 开发,提供了一系列用于构建和训练神经网络的工具,并支持动态计算图和 GPU 加速。
PyTorch is another popular deep learning library in Python. Developed by Facebook, it provides a range of tools for building and training neural networks, along with support for dynamic computation graphs and GPU acceleration.
PyTorch 对于需要灵活且强大的深度学习框架的研究人员和开发人员特别有用。以下是如何安装 PyTorch 的命令 −
PyTorch is particularly useful for researchers and developers who need a flexible and powerful deep learning framework. Below is the command to install PyTorch −
pip install torch
Keras
Keras 是一个基于 TensorFlow 和其他较低级别框架运行的高级神经网络库。它为构建和训练深度学习模型提供了一个简单直观的 API,这使其成为初学者和需要快速创建原型并试验不同模型的研究人员的绝佳选择。以下是如何安装 Keras 的命令 −
Keras is a high-level neural network library that runs on top of TensorFlow and other lower-level frameworks. It provides a simple and intuitive API for building and training deep learning models, making it an excellent choice for beginners and researchers who need to quickly prototype and experiment with different models. Below is the command to install Keras −
pip install keras
OpenCV
OpenCV 是一个计算机视觉库,它提供了用于图像和视频处理的工具,并支持机器学习算法。计算机视觉领域广泛使用它来执行诸如对象检测、图像分割和面部识别等任务。以下是如何安装 OpenCV 的命令 −
OpenCV is a computer vision library that provides tools for image and video processing, along with support for machine learning algorithms. It is widely used in the computer vision community for tasks such as object detection, image segmentation, and facial recognition. Below is the command to install OpenCV −
pip install opencv-python
除了这些库之外,Python 生态系统中还有许多其他用于机器学习的工具和框架,包括 XGBoost, LightGBM, spaCy, 和 NLTK 。
In addition to these libraries, there are many other tools and frameworks in the Python ecosystem for machine learning, including XGBoost, LightGBM, spaCy, and NLTK.
用于机器学习的 Python 生态系统不断发展,不断开发新的库和工具。
The Python ecosystem for machine learning is constantly evolving, with new libraries and tools being developed all the time.
无论您是初学者还是经验丰富的机器学习从业者,Python 都提供了一个丰富的灵活环境,用于开发和部署机器学习模型。
Whether you are a beginner or an experienced machine learning practitioner, Python provides a rich and flexible environment for developing and deploying machine learning models.
此处还必须注意,某些库可能需要其他依赖项或特定于系统的要求。在这些情况下,建议查阅库的文档以了解安装说明和要求。
Here, it is also important to note that some libraries may require additional dependencies or system-specific requirements. In such cases, it is recommended to consult the library’s documentation for installation instructions and requirements.