Python Web Scraping 简明教程
Getting Started with Python
在第一章中,我们学习了网络抓取是什么。在本章中,让我们看看如何使用Python来实现网络抓取。
In the first chapter, we have learnt what web scraping is all about. In this chapter, let us see how to implement web scraping using Python.
Why Python for Web Scraping?
Python是实现网络抓取的流行工具。Python编程语言也用于与网络安全、渗透测试以及数字取证应用程序相关的其他有用的项目。使用Python的基本编程,可以在不使用任何其他第三方工具的情况下执行网络抓取。
Python is a popular tool for implementing web scraping. Python programming language is also used for other useful projects related to cyber security, penetration testing as well as digital forensic applications. Using the base programming of Python, web scraping can be performed without using any other third party tool.
Python编程语言正获得巨大的欢迎,使Python非常适合网络抓取项目的理由如下:
Python programming language is gaining huge popularity and the reasons that make Python a good fit for web scraping projects are as below −
Syntax Simplicity
与其他编程语言相比,Python具有最简单的结构。Python的这个特性使其测试更加容易,开发人员可以更多地关注编程。
Python has the simplest structure when compared to other programming languages. This feature of Python makes the testing easier and a developer can focus more on programming.
Inbuilt Modules
使用Python进行网络抓取的另一个原因是它拥有内置的和外部的有用库。我们可以通过使用Python作为编程的基础,执行与网络抓取相关的许多实现。
Another reason for using Python for web scraping is the inbuilt as well as external useful libraries it possesses. We can perform many implementations related to web scraping by using Python as the base for programming.
Installation of Python
Python发行版适用于Windows、MAC和Unix/Linux等平台。我们只需要下载适用于我们平台的二进制代码即可安装Python。但是如果我们平台的二进制代码不可用,则我们必须有一个C编译器,以便可以手工编译源代码。
Python distribution is available for platforms like Windows, MAC and Unix/Linux. We need to download only the binary code applicable for our platform to install Python. But in case if the binary code for our platform is not available, we must have a C compiler so that source code can be compiled manually.
我们可在不同平台上安装 Python,方法如下 −
We can install Python on various platforms as follows −
Installing Python on Unix and Linux
您需要执行以下步骤才能在 Unix/Linux 机器上安装 Python −
You need to followings steps given below to install Python on Unix/Linux machines −
Step 1 − 访问链接 https://www.python.org/downloads/
Step 1 − Go to the link https://www.python.org/downloads/
Step 2 − 下载适用于 Unix/Linux 的压缩源代码,这是在上述链接中提供的。
Step 2 − Download the zipped source code available for Unix/Linux on above link.
Step 3 − 将这些文件解压到您的机器上。
Step 3 − Extract the files onto your computer.
Step 4 − 使用以下命令完成安装 −
Step 4 − Use the following commands to complete the installation −
run ./configure script
make
make install
您可以在标准位置 /usr/local/bin 中找到已安装的 Python,其库位于 /usr/local/lib/pythonXX ,其中 XX 是 Python 的版本。
You can find installed Python at the standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX, where XX is the version of Python.
Installing Python on Windows
您需要执行以下步骤才能在 Windows 机器上安装 Python −
You need to followings steps given below to install Python on Windows machines −
Step 1 − 访问链接 https://www.python.org/downloads/
Step 1 − Go to the link https://www.python.org/downloads/
Step 2 − 下载 Windows 安装程序 python-XYZ.msi 文件,其中 XYZ 是我们需要安装的版本。
Step 2 − Download the Windows installer python-XYZ.msi file, where XYZ is the version we need to install.
Step 3 − 现在,将安装程序文件保存在您的本地机器中并运行 MSI 文件。
Step 3 − Now, save the installer file to your local machine and run the MSI file.
Step 4 − 最后,运行下载的文件,调出 Python 安装向导。
Step 4 − At last, run the downloaded file to bring up the Python install wizard.
Installing Python on Macintosh
我们必须使用 Homebrew 来在 Mac OS X 上安装 Python 3。Homebrew 易于安装且是一个出色的软件包安装程序。
We must use Homebrew for installing Python 3 on Mac OS X. Homebrew is easy to install and a great package installer.
也可以使用以下命令安装 Homebrew −
Homebrew can also be installed by using the following command −
$ ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)"
为更新软件包管理器,我们可以使用以下命令 −
For updating the package manager, we can use the following command −
$ brew update
借助以下命令,我们可在我们的 MAC 机器上安装 Python3 −
With the help of the following command, we can install Python3 on our MAC machine −
$ brew install python3
Setting Up the PATH
您可以使用以下说明设置不同环境中的路径 −
You can use the following instructions to set up the path on various environments −
Running Python
我们可以通过以下三种方式中的任何一种启动 Python −
We can start Python using any of the following three ways −
Interactive Interpreter
可使用提供命令行解释器或 shell 的操作系统,如 UNIX 和 DOS 来启动 Python。
An operating system such as UNIX and DOS that is providing a command-line interpreter or shell can be used for starting Python.
我们可以按照以下方式在交互解释器中开始编码:
We can start coding in interactive interpreter as follows −
Step 1 − 在命令行中输入 python 。
Step 1 − Enter python at the command line.
Step 2 - 然后,我们可以在交互解释器中立即开始编码。
Step 2 − Then, we can start coding right away in the interactive interpreter.
$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS
Script from the Command-line
我们可以通过调用解释器来在命令行执行 Python 脚本。它可以理解为以下内容:
We can execute a Python script at command line by invoking the interpreter. It can be understood as follows −
$python script.py # Unix/Linux
or
python% script.py # Unix/Linux
or
C: >python script.py # Windows/DOS
Integrated Development Environment
如果系统具有支持 Python 的 GUI 应用程序,我们还可以从 GUI 环境运行 Python。以下列出了一些在各种平台上支持 Python 的集成开发环境:
We can also run Python from GUI environment if the system is having GUI application that is supporting Python. Some IDEs that support Python on various platforms are given below −
IDE for UNIX - UNIX 针对 Python 具有 IDLE IDE。
IDE for UNIX − UNIX, for Python, has IDLE IDE.
IDE for Windows - Windows 具有具有 GUI 的 PythonWin IDE。
IDE for Windows − Windows has PythonWin IDE which has GUI too.
IDE for Macintosh - Macintosh 具有 IDLE IDE,可以从主网站下载为 MacBinary 或 BinHex’d 文件。
IDE for Macintosh − Macintosh has IDLE IDE which is downloadable as either MacBinary or BinHex’d files from the main website.