Requests 简明教程

Requests - Quick Guide

Requests - Overview

Requests 是一个 HTTP 库,它提供了轻松处理 Web 应用程序中 http 请求/响应的功能。此库使用 python 开发。

Python Requests 的官方网站,可从 https://2.python-requests.org/en/master/ 获得,将 Requests 定义如下 −

Requests 是 Python 中一个优雅简单的 HTTP 库,为人类而设计。

Features of Requests

Requests 的功能在下面讨论 −

Request

python 请求库提供了易于使用的可用来处理 Http 请求的方法。传递参数并处理 GET、POST、PUT、DELETE 等请求类型非常简单。

Response

您可以获取所需的格式的响应,支持的格式有文本格式、二进制响应、json 响应和原始响应。

Headers

此库允许您根据您的要求读取、更新或发送新的标头。

Timeouts

使用 python requests 库,可以轻松地将超时添加到您请求的 URL。您碰巧使用的是第三方 URL 并等待响应。

在 URL 上给出一个超时总是一个好习惯,因为我们可能希望 URL 在该超时内通过响应或错误做出响应。不这样做可能会导致无限期等待该请求。

Error handling

requests 模块提供了对错误处理的支持,其中包括连接错误、超时错误、TooManyRedirects、Response.raise_for_status 错误等。

Cookies

该库允许您对请求的 URL 进行读取、写入和更新。

Sessions

为了维护数据,您需要在请求之间进行会话。因此,如果反复调用同一个主机,您可以重用 TCP 连接,这反过来将提高性能。

SSL certificates

SSL 证书是一项随安全 url 而来的安全功能。当您使用 Requests 时,它还会验证给定 https URL 的 SSL 证书。Requests 库中默认启用了 SSL 验证,如果证书不存在,将抛出错误。

Authentication

HTTP 身份验证是在服务器端要求某些身份验证信息,例如用户名、密码,当客户端请求一个 URL 时。这是对客户端和服务器之间交换的请求和响应的附加安全措施。

Advantages of using Python Requests Library

以下是使用 Python Requests 库的优点 −

  1. 易于使用且可以从给定的 URL 中获取数据。

  2. Requests 库可以用来抓取网站中的数据。

  3. 使用 requests,您可以获取、发布、删除、更新给定 URL 的数据。

  4. 处理 Cookie 和会话非常简单。

  5. 在认证模块支持的帮助下也会处理安全性。

Requests - Environment Setup

在本章中,我们将着手 Requests 的安装。要开始使用 Requests 模块,我们首先需要安装 Python。因此,我们将处理以下内容:

  1. Install Python

  2. Install Requests

Installing Python

转到 Python 官方网站: https://www.python.org/downloads/ ,如下所示,然后单击适用于 Windows、Linux/Unix 和 Mac OS 的最新可用版本。根据您现有的 64 位或 32 位操作系统下载 Python。

python download

下载完成后,单击 .exe 文件并按照步骤在您的系统上安装 Python。

python for windows

Python 包管理器(即 pip)也将默认随上述安装一起安装。要使它在您的系统上全局工作,请直接将 Python 的位置添加到 PATH 变量。在安装开始时显示相同内容,请记住勾选“添加到 PATH”复选框。如果您忘记选中它,请按照以下给定的步骤添加到 PATH。

To add to PATH follow the steps−

右键单击您的计算机图标,然后单击属性>高级系统设置。

它会显示如下屏幕:

system properties

单击“环境变量”,如下图所示,它会显示如下屏幕:

environment variables

选择路径并单击编辑按钮,在末尾添加您的 Python 的位置路径。现在,让我们检查 python 版本。

Checking the python version

E:\prequests>python --version
Python 3.7.3

Install Requests

既然我们已经安装了 Python,我们将安装 Requests。

安装 Python 后,也会安装 Python 包管理器(即 pip)。以下是检查 pip 版本的命令。

E:\prequests>pip --version
pip 19.1.1 from c:\users\xxxxx\appdata\local\programs\python\python37\lib\site-p
ackages\pip (python 3.7)

我们已安装了 pip 并且版本为 19.1.1。现在,将使用 pip 安装 Requests 模块。

下面给出了命令−

pip install requests
E:\prequests>pip install requests
Requirement already satisfied: requests in c:\users\xxxx\appdata\local\programs
\python\python37\lib\site-packages (2.22.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\kamat\appdata\loca
l\programs\python\python37\lib\site-packages (from requests) (2019.3.9)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\use
rs\xxxxx\appdata\local\programs\python\python37\lib\site-packages (from requests
) (1.25.3)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\xxxxxxx\appdata\local\pr
ograms\python\python37\lib\site-packages (from requests) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\xxxxx\appdata\l
ocal\programs\python\python37\lib\site-packages (from requests) (3.0.4)

我们已经安装了该模块,因此在命令提示符中,它显示 Requirement already satisfied(需求已满足);如果没有安装,它将已为安装下载所需的包。

要查看已安装的 requests 模块的详细信息,您可以使用以下命令−

pip show requests
E:\prequests>pip show requests
Name: requests
Version: 2.22.0
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
Author-email: me@kennethreitz.org
License: Apache 2.0
Location: c:\users\xxxxx\appdata\local\programs\python\python37\lib\site-package
S
Requires: certifi, idna, urllib3, chardet
Required-by:

Requests 模块的版本为 2.22.0。

Requests - How Http Requests Work?

Python 的 Requests 是一个 HTTP 库,它将帮助我们在客户端和服务器之间交换数据。想象一下,您有一个表单的 UI,您需要在其中输入用户详细信息,因此一旦您输入了它,您就必须提交数据,而这只不过是从客户端到服务器的保存数据的 HTTP POST 或 PUT 请求。

当您想要数据时,您需要从服务器获取它,这又是一个 Http GET 请求。客户端请求数据时数据在客户端和服务器之间的交换,以及服务器用所需数据进行响应,这种客户端和服务器之间的关系非常重要。

请求已发送到给定的 URL,它可以是安全或非安全 URL。

对 URL 的请求可以使用 GET、POST、PUT、DELETE。使用最广泛的是 GET 方法,主要用于您想要从服务器获取数据时。

您还可以将数据作为查询字符串发送至 URL,例如−

因此,在这里,我们向 URL 传递 id = 9 和 username = Delphine。所有值都在问号 (?) 之后以键/值对的形式发送,并且使用 & 符号将多个参数传递给 URL。

使用请求库,使用字符串字典按如下方式调用 URL。

其中,数据以字符串字典的形式发送至 URL。如果您希望传递 id=9 and username=Delphine,则可以执行以下操作−

payload = {'id': '9', 'username': 'Delphine'}

请求库按如下方式调用−

res = requests.get('https://jsonplaceholder.typicode.com/users', params=payload')

Using POST, we can do as follows−

res = requests.post('https://jsonplaceholder.typicode.com/users', data = {'id':'9', 'username':'Delphine'})

Using PUT

res = requests.put('https://jsonplaceholder.typicode.com/users', data = {'id':'9', 'username':'Delphine'})

Using DELETE

res = requests.delete('https://jsonplaceholder.typicode.com/users')

HTTP 请求的响应可以为文本编码形式、二进制编码、json 格式或原始响应。请求和响应的详细信息将在下一章详细说明。

Requests - Working with Requests

在本章中,我们将了解如何使用 requests 模块。我们接下来将了解以下内容−

  1. Making HTTP Requests.

  2. 将参数传递给 HTTP 请求。

Making HTTP Requests

要进行 HTTP 请求,我们首先需要导入 request 模块,如下所示−

import requests

现在让我们了解如何使用 requests 模块发起对 URL 的调用。

让我们在代码中使用 URL− link: https://jsonplaceholder.typicode.com/users 来测试 Requests 模块。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.status_code)

url− https://jsonplaceholder.typicode.com/users 是使用 requests.get() 方法调用的。URL 的响应对象存储在 getdata 变量中。当我们打印该变量时,它给出了 200 响应代码,这意味着我们已成功获得响应。

Output

E:\prequests>python makeRequest.py
<Response [200]>

要从响应中获取内容,我们可以使用 getdata.content 执行以下操作−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.content)

getdata.content 将打印响应中所有可用数据。

Output

E:\prequests>python makeRequest.py
b'[\n {\n  "id": 1,\n  "name": "Leanne Graham",\n  "username": "Bret",\n
"email": "Sincere@april.biz",\n  "address": {\n  "street": "Kulas Light
",\n  "suite": "Apt. 556",\n  "city": "Gwenborough",\n  "zipcode": "
92998-3874",\n  "geo": {\n "lat": "-37.3159",\n  "lng": "81.149
6"\n }\n },\n  "phone": "1-770-736-8031 x56442",\n  "website": "hild
egard.org",\n  "company": {\n "name": "Romaguera-Crona",\n  "catchPhr
ase": "Multi-layered client-server neural-net",\n  "bs": "harness real-time
e-markets"\n }\n }

Passing Parameters to HTTP Requests

仅请求 URL 是不够的,我们还需要将参数传递给 URL。

params 通常以键/值对形式传递,例如−

 https://jsonplaceholder.typicode.com/users?id=9&username=Delphine

因此,我们有 id=9 和 username=Delphine。现在,将了解如何将此类数据传递给 requests Http 模块。

Example

import requests
payload = {'id': 9, 'username': 'Delphine'}
getdata = requests.get('https://jsonplaceholder.typicode.com/users', params=payload)
print(getdata.content)

这些详细信息存储在键/值对中的对象有效负载中,并传递到 get() 方法内部的 params。

Output

E:\prequests>python makeRequest.py
b'[\n {\n "id": 9,\n "name": "Glenna Reichert",\n "username": "Delphin
e",\n "email": "Chaim_McDermott@dana.io",\n "address": {\n "street":
"Dayna Park",\n "suite": "Suite 449",\n "city": "Bartholomebury",\n
"zipcode": "76495-3109",\n "geo": {\n "lat": "24.6463",\n
"lng": "-168.8889"\n }\n },\n "phone": "(775)976-6794 x41206",\n "
website": "conrad.com",\n "company": {\n "name": "Yost and Sons",\n
"catchPhrase": "Switchable contextually-based project",\n "bs": "aggregate
real-time technologies"\n }\n }\n]'

我们现在正在响应中获取 id=9 和 username=Delphine 详细信息。

如果您想查看在传递参数后 URL 的外观,则可以使用响应对象来获取 URL。

Example

import requests
payload = {'id': 9, 'username': 'Delphine'}
getdata = requests.get('https://jsonplaceholder.typicode.com/users', params=payload)
print(getdata.url)

Output

E:\prequests>python makeRequest.py
https://jsonplaceholder.typicode.com/users?id=9&username=Delphine

Handling Response for HTTP Requests

在本章中,我们将更详细地了解从请求模块接收到的响应。我们将讨论以下详细信息:

  1. Getting Response

  2. JSON Response

  3. RAW Response

  4. Binary Response

Getting Response

我们将使用 request.get() 方法向 URL 发出请求。

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users');

getdata 具有响应对象。它包含响应的所有详细信息。我们可以使用两种方式((.text) 和 (.content))获取响应。使用 response.text 将以文本格式返回数据,如下所示:

Example

E:\prequests>python makeRequest.py
[
 {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
   "bs": "harness real-time e-markets"
  }
},

您会看到响应与在浏览器中为该 URL 查看源代码时出现的相同,如下所示:

typicode

您还可以尝试 .html URL,并使用 response.text 查看内容,它将与浏览器中 .html URL 的查看源内容相同。

现在,让我们尝试针对同一个 URL 使用 response.content, 并查看输出。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.content)

Output

E:\prequests>python makeRequest.py
b'[\n {\n "id": 1,\n "name": "Leanne Graham",\n "username": "Bret",\n
"email": "Sincere@april.biz",\n "address": {\n "street": "Kulas Light
",\n "suite": "Apt. 556",\n "city": "Gwenborough",\n "zipcode": "
92998-3874",\n "geo": {\n "lat": "-37.3159",\n "lng": "81.149
6"\n }\n },\n "phone": "1-770-736-8031 x56442",\n "website": "hild
egard.org",\n "company": {\n "name": "Romaguera-Crona",\n "catchPhr
ase": "Multi-layered client-server neural-net",\n "bs": "harness real-time
e-markets"\n }\n },\n {\n "id": 2,\n "name": "Ervin Howell",\n "us
ername": "Antonette",\n "email": "Shanna@melissa.tv",\n "address": {\n
"street": "Victor Plains",\n "suite": "Suite 879",\n "city": "Wisoky
burgh",\n "zipcode": "90566-7771",\n "geo": {\n "lat": "-43.950
9",\n "lng": "-34.4618"\n }\n },\n "phone": "010-692-6593 x091
25",\n "website": "anastasia.net",\n "company": {\n "name": "Deckow-C
rist",\n "catchPhrase": "Proactive didactic contingency",\n "bs": "syn
ergize scalable supply-chains"\n }\n },\n {\n "id": 3,\n "name": "Cle
mentine Bauch",\n "username": "Samantha",\n "email":
"Nathan@yesenia.net",
\n "address": {\n "street": "Douglas Extension",\n "suite": "Suite
847",\n "city": "McKenziehaven",\n "zipcode": "59590-4157",\n "ge
o": {\n "lat": "-68.6102",\n "lng": "-47.0653"\n }\n },\n

响应以字节为单位提供。您将在响应开头得到一个字符 b 。使用 requests 模块,您可以获得使用的编码,也可以在需要时更改编码。例如,要获得编码,可以使用 response.encoding。

print(getdata.encoding)

Output

utf-8

您可以按如下方式更改编码:您可以使用您选择的编码。

getdata.encoding = 'ISO-8859-1'

JSON Response

You can also get the response for the Http request in json format by using response.json() method as follows−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.json())

Output

E:\prequests>python makeRequest.py
[{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.
biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenbor
ough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}},
'
phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name':
'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs
': 'harness real-time e-markets'}}]

RAW Response

In case you need the raw response for the Http URL you can make use of response.raw, also add stream=True inside the get method as shown below−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users', stream=True)
print(getdata.raw)

Output

E:\prequests>python makeRequest.py
<urllib3.response.HTTPResponse object at 0x000000A8833D7B70>

要从原始数据中读取更多内容,您可以按如下操作:

print(getdata.raw.read(50))

Output

E:\prequests>python makeRequest.py
b'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\x95\x98[o\xe38\x12\x85\xdf\xe7W\x10y\
xda\x01F\x82.\xd4m\x9f\xdc\x9dd\xba\xb7\x93\xf4\x06q\xef4\x06\x83A@K\x15\x89m'

Binary Response

要获得二进制响应,我们可以利用 response.content。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.content)

Output

E:\prequests>python makeRequest.py
b'[\n {\n "id": 1,\n "name": "Leanne Graham",\n "username": "Bret",\n
"email": "Sincere@april.biz",\n "address": {\n "street": "Kulas Light
",\n "suite": "Apt. 556",\n "city": "Gwenborough",\n "zipcode": "
92998-3874",\n "geo": {\n "lat": "-37.3159",\n "lng": "81.149
6"\n }\n },\n "phone": "1-770-736-8031 x56442",\n "website": "hild
egard.org",\n "company": {\n "name": "Romaguera-Crona",\n "catchPhr
ase": "Multi-layered client-server neural-net",\n "bs": "harness real-time
e-markets"\n }\n },\n {\n "id": 2,\n "name": "Ervin Howell",\n "us
ername": "Antonette",\n "email": "Shanna@melissa.tv",\n "address": {\n
"street": "Victor Plains",\n "suite": "Suite 879",\n "city": "Wisoky
burgh",\n "zipcode": "90566-7771",\n "geo": {\n "lat": "-43.950
9",\n "lng": "-34.4618"\n }\n },\n "phone": "010-692-6593 x091
25",\n "website": "anastasia.net",\n "company": {\n "name": "Deckow-C
rist",\n "catchPhrase": "Proactive didactic contingency",\n "bs": "syn
ergize scalable supply-chains"\n }\n },\n {\n "id": 3,\n "name": "Cle
mentine Bauch",\n "username": "Samantha",\n "email": "Nathan@yesenia.net",
\n "address": {\n "street": "Douglas Extension",\n "suite": "Suite
847",\n "city": "McKenziehaven",\n "zipcode": "59590-4157",\n "ge
o": {\n "lat": "-68.6102",\n "lng": "-47.0653"\n }\n },\n

响应以字节为单位提供。您将在响应开头得到一个字符 b 。二进制响应主要用于非文本请求。

Requests - HTTP Requests Headers

在上一章中,我们已经了解了如何发出请求并获取响应。本章将更深入地探讨 URL 的标头部分。因此,我们将研究以下内容:

  1. Understanding Request Headers

  2. Custom Headers

  3. Response Headers

Understanding Request Headers

在浏览器中击中任意 URL,检查它并在开发者工具网络选项卡中查看。

你将得到响应头部、请求头部、载荷等。

例如,考虑以下 URL−

view source

你可以按以下方式获得头部详细信息−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users', stream=True)
print(getdata.headers)

Output

E:\prequests>python makeRequest.py
{'Date': 'Sat, 30 Nov 2019 05:15:00 GMT', 'Content-Type': 'application/json; cha
rset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Co
okie': '__cfduid=d2b84ccf43c40e18b95122b0b49f5cf091575090900; expires=Mon, 30-De
c-19 05:15:00 GMT; path=/; domain=.typicode.com; HttpOnly', 'X-Powered-By': 'Exp
ress', 'Vary': 'Origin, Accept-Encoding', 'Access-Control-Allow-Credentials': 't
rue', 'Cache-Control': 'max-age=14400', 'Pragma': 'no-cache', 'Expires': '-1', '
X-Content-Type-Options': 'nosniff', 'Etag': 'W/"160d-1eMSsxeJRfnVLRBmYJSbCiJZ1qQ
"', 'Content-Encoding': 'gzip', 'Via': '1.1 vegur', 'CF-Cache-Status': 'HIT', 'A
ge': '2271', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudf
lare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '53da574f
f99fc331-SIN'}

要读取任何 HTTP 头部,你可以按以下方式执行−

getdata.headers["Content-Encoding"] // gzip

Custom Headers

还可以向被调用的 URL 发送头部,如下所示。

Example

import requests
headers = {'x-user': 'test123'}
getdata = requests.get('https://jsonplaceholder.typicode.com/users', headers=headers)

传递的头部必须是字符串、字节串或 Unicode 格式。请求的行为不会根据通过的自定义头部而改变。

Response Headers

当你从浏览器开发者工具中的网络选项卡中查看 URL 时,响应头部看起来如下−

view sourcecode

要从请求模块中获取头部详细信息,请使用。Response.headers 如下所示−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.headers)

Output

E:\prequests>python makeRequest.py
{'Date': 'Sat, 30 Nov 2019 06:08:10 GMT', 'Content-Type': 'application/json; cha
rset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Co
okie': '__cfduid=de1158f1a5116f3754c2c353055694e0d1575094090; expires=Mon, 30-De
c-19 06:08:10 GMT; path=/; domain=.typicode.com; HttpOnly', 'X-Powered-By': 'Exp
ress', 'Vary': 'Origin, Accept-Encoding', 'Access-Control-Allow-Credentials': 't
rue', 'Cache-Control': 'max-age=14400', 'Pragma': 'no-cache', 'Expires': '-1', '
X-Content-Type-Options': 'nosniff', 'Etag': 'W/"160d-1eMSsxeJRfnVLRBmYJSbCiJZ1qQ
"', 'Content-Encoding': 'gzip', 'Via': '1.1 vegur', 'CF-Cache-Status': 'HIT', 'A
ge': '5461', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudf
lare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '53daa52f
3b7ec395-SIN'}

你可以按以下方式获取任何你想要的特定头部−

print(getdata.headers["Expect-CT"])

Output

max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/exp
ect-ct

You can also get the header details by using the get() method.

print(getdata.headers.get("Expect-CT"))

Output

max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/exp
ect-ct

Requests - Handling GET Requests

本章将更集中于 GET 请求,GET 请求是最常见且最常用的。请求模块中的 GET 工作非常简单。下面是一个使用 URL 和 GET 方法工作的简单示例。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.content)
getdata.content, will print all the data available in the response.

Output

E:\prequests>python makeRequest.py
b'[\n {\n "id": 1,\n "name": "Leanne Graham",\n "username": "Bret",\n
"email": "Sincere@april.biz",\n "address": {\n "street": "Kulas Light
",\n "suite": "Apt. 556",\n "city": "Gwenborough",\n "zipcode": "
92998-3874",\n "geo": {\n "lat": "-37.3159",\n "lng": "81.149
6"\n }\n },\n "phone": "1-770-736-8031 x56442",\n "website": "hild
egard.org",\n "company": {\n "name": "Romaguera-Crona",\n "catchPhr
ase": "Multi-layered client-server neural-net",\n "bs": "harness real-time
e-markets"\n }\n }

你还可以使用 param 属性向 get 方法传递参数,如下所示−

import requests
payload = {'id': 9, 'username': 'Delphine'}
getdata = requests.get('https://jsonplaceholder.typicode.com/users',
params=payload)
print(getdata.content)

这些详细信息存储在键/值对中的对象有效负载中,并传递到 get() 方法内部的 params。

Output

E:\prequests>python makeRequest.py
b'[\n {\n "id": 9,\n "name": "Glenna Reichert",\n "username": "Delphin
e",\n "email": "Chaim_McDermott@dana.io",\n "address": {\n "street":
"Dayna Park",\n "suite": "Suite 449",\n "city": "Bartholomebury",\n
"zipcode": "76495-3109",\n "geo": {\n "lat": "24.6463",\n
"lng": "-168.8889"\n }\n },\n "phone": "(775)976-6794 x41206",\n "
website": "conrad.com",\n "company": {\n "name": "Yost and Sons",\n
"catchPhrase": "Switchable contextually-based project",\n "bs": "aggregate
real-time technologies"\n }\n }\n]'

Handling POST, PUT, PATCH and DELETE Requests

在本章中,我们将介绍如何使用请求库来使用 POST 方法以及如何将参数传递到 URL 中。

Using POST

对于 PUT 请求,Requests 库具有 requests.post() 方法,其示例如下所示:

import requests

myurl = 'https://postman-echo.com/post'
myparams = {'name': 'ABC', 'email':'xyz@gmail.com'}
res = requests.post(myurl, data=myparams)
print(res.text)

Output

E:\prequests>python makeRequest.py
{"args":{},"data":"","files":{},"form":{"name":"ABC","email":"xyz@gmail.com"},"headers":{"x-forwarded-proto":"https","host":"postman-echo.com","content-length":"30","accept":"*/*","accept-encoding":"gzip,deflate","content-type":"application/x-www-form-urlencoded","user-agent":"python-requests/2.22.0","x-forwarded-port":"443"},"json":{"name":"ABC","email":"xyz@gmail.com"},"url":"https://postman-echo.com/post"}

在前述示例中,你可以将表单数据作为键值对传递到 requests.post() 内部的 data 参数。我们还将了解如何在请求模块中使用 PUT、PATCH 和 DELETE。

Using PUT

对于 PUT 请求,Requests 库具有 requests.put() 方法,其示例如下所示。

import requests
myurl = 'https://postman-echo.com/put'
myparams = {'name': 'ABC', 'email':'xyz@gmail.com'}
res = requests.put(myurl, data=myparams)
print(res.text)

Output

E:\prequests>python makeRequest.py
{"args":{},"data":"","files":{},"form":{"name":"ABC","email":"xyz@gmail.com"},"h
eaders":{"x-forwarded-proto":"https","host":"postman-echo.com","content-length":
"30","accept":"*/*","accept-encoding":"gzip, deflate","content-type":"applicatio
n/x-www-form-urlencoded","user-agent":"python-requests/2.22.0","x-forwarded-port
":"443"},"json":{"name":"ABC","email":"xyz@gmail.com"},"url":"https://postman-ec
ho.com/put"}

Using PATCH

对于 PATCH 请求,Requests 库具有 requests.patch() 方法,其示例如下所示。

import requests
myurl = https://postman-echo.com/patch'
res = requests.patch(myurl, data="testing patch")
print(res.text)

Output

E:\prequests>python makeRequest.py
{"args":{},"data":{},"files":{},"form":{},"headers":{"x-forwarded-proto":"https"
,"host":"postman-echo.com","content-length":"13","accept":"*/*","accept-encoding
":"gzip, deflate","user-agent":"python-requests/2.22.0","x-forwarded-port":"443"
},"json":null,"url":"https://postman-echo.com/patch"}

Using DELETE

DELETE 请求中,Requests 库有 requests.delete() 方法,以下所示为示例。

import requests
myurl = 'https://postman-echo.com/delete'
res = requests.delete(myurl, data="testing delete")
print(res.text)

Output

E:\prequests>python makeRequest.py
{"args":{},"data":{},"files":{},"form":{},"headers":{"x-forwarded-proto":"https"
,"host":"postman-echo.com","content-length":"14","accept":"*/*","accept-encoding
":"gzip, deflate","user-agent":"python-requests/2.22.0","x-forwarded-port":"443"
},"json":null,"url":"https://postman-echo.com/delete"}

Requests - File Upload

In this chapter, we will upload a file using request and read the contents of the file uploaded. We can do it using the ` files ` param as shown in the example below.

We will use the ` http://httpbin.org/ `post to upload the file.

Example

import requests
myurl = 'https://httpbin.org/post'
files = {'file': open('test.txt', 'rb')}
getdata = requests.post(myurl, files=files)
print(getdata.text)

Test.txt

File upload test using Requests

Example

var total = [0, 1, 2, 3].reduceRight(function(a, b){ return a + b; });
console.log("total is : " + total );

Output

E:\prequests>python makeRequest.py
{
  "args": {},
  "data": "",
  "files": {
   "file": "File upload test using Requests"
  },
  "form": {},
  "headers": {
   "Accept": "*/*",
   "Accept-Encoding": "gzip, deflate",
   "Content-Length": "175",
   "Content-Type": "multipart/form-data;
boundary=28aee3a9d15a3571fb80d4d2a94bf
d33",
   "Host": "httpbin.org",
   "User-Agent": "python-requests/2.22.0"
  },
  "json": null,
  "origin": "117.223.63.135, 117.223.63.135",
  "url": "https://httpbin.org/post"
}

It is also possible to send the contents of the file as shown below−

Example

import requests
myurl = 'https://httpbin.org/post'
files = {'file': ('test1.txt', 'Welcome to TutorialsPoint')}
getdata = requests.post(myurl, files=files)
print(getdata.text)

Output

E:\prequests>python makeRequest.py
{
  "args": {},
  "data": "",
  "files": {
   "file": "Welcome to TutorialsPoint"
},
"form": {},
"headers": {
   "Accept": "*/*",
   "Accept-Encoding": "gzip, deflate",
   "Content-Length": "170",
   "Content-Type": "multipart/form-data; boundary=f2837238286fe40e32080aa7e172b
e4f",
  "Host": "httpbin.org",
  "User-Agent": "python-requests/2.22.0"
},
  "json": null,
  "origin": "117.223.63.135, 117.223.63.135",
  "url": "https://httpbin.org/post"
}

Requests - Working with Cookies

This chapter will discuss how to deal with cookies. You can get the cookies as well as send your cookies while calling the URL using the requests library.

The url, ` https://jsonplaceholder.typicode.com/users ` when hits in the browser we can get the details of the cookies as shown below−

typicode sourcecode

You can read the cookies as shown below−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users')
print(getdata.cookies["__cfduid"])

Output

E:\prequests>python makeRequest.py
d1733467caa1e3431fb7f768fa79ed3741575094848

You can also send cookies when we make a request.

Example

import requests
cookies = dict(test='test123')
getdata = requests.get('https://httpbin.org/cookies',cookies=cookies)
print(getdata.text)

Output

E:\prequests>python makeRequest.py
{
   "cookies": {
   "test": "test123"
}
}

Requests - Working with Errors

This chapter will discuss how to deal with errors coming down when working with the Http request library. It is always a good practice to have errors managed for all possible cases.

Error Exception

The requests module gives the following types of error exception−

` ConnectionError − This will be raised, if there is any connection error. For example, the network failed, DNS error so the Request library will raise `ConnectionError exception.

` Response.raise_for_status() − Based on status code i.e. 401, 404 it will raise `HTTPError for the url requested.

` HTTPError `− This error will be raised for an invalid response coming down for the request made.

` Timeout `− Errors raised for a timeout for the URL requested.

` TooManyRedirects − If the limit is crossed for maximum redirections than it will raise `TooManyRedirects error.

Example

Here is an example of errors shown for timeout−

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users',timeout=0.001)
print(getdata.text)

Output

raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='jsonplaceholder.ty
picode.com', port=443): Max retries exceeded with url: /users (Caused by Connect
TimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at
0x000000B02AD
E76A0>, 'Connection to jsonplaceholder.typicode.com timed out. (connect timeout= 0.001)'))

Requests - Handling Timeouts

Timeouts can be easily added to the URL you are requesting. It so happens that, you are using a third-party URL and waiting for a response. It is always a good practice to give a timeout on the URL, as we might want the URL to respond within a timespan with a response or an error. Not doing so, can cause to wait on that request indefinitely.

We can give timeout to the URL by using the timeout param and value is passed in seconds as shown in the example below−

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users',timeout=0.001)
print(getdata.text)

Output

raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='jsonplaceholder.ty
picode.com', port=443): Max retries exceeded with url: /users (Caused by Connect
TimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x000000B02AD
E76A0>, 'Connection to jsonplaceholder.typicode.com timed out. (connect timeout=
0.001)'))

The timeout given is as follows−

getdata = requests.get('https://jsonplaceholder.typicode.com/users',timeout=0.001)

执行抛出连接超时错误,如图中所示。给出的超时是 0.001,这使得请求无法获取响应并抛出错误。现在,我们将增加超时并进行检查。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users',timeout=1.000)
print(getdata.text)

Output

E:\prequests>python makeRequest.py
[
 {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
   "bs": "harness real-time e-markets"
 }

使用 1 秒的超时,我们可以获得请求的 URL 的响应。

Requests - Handling Redirection

本章将了解 Request 库如何处理 URL 重定向的情况。

Example

import requests
getdata = requests.get('http://google.com/')
print(getdata.status_code)
print(getdata.history)

url http://google.com 将使用状态代码 301(永久移动)重定向到 https://www.google.com/ 。该重定向将保存在历史记录中。

Output

执行上述代码后,我们将获得以下结果:

E:\prequests>python makeRequest.py
200
[<Response [301]>]

您可以使用 allow_redirects=False 停止 URL 的重定向。可以在所使用的 GET、POST、OPTIONS、PUT、DELETE、PATCH 方法上进行。

Example

以下是一个示例。

import requests
getdata = requests.get('http://google.com/', allow_redirects=False)
print(getdata.status_code)
print(getdata.history)
print(getdata.text)

现在,如果您检查输出,则不允许重定向,并将获得 301 状态代码。

Output

E:\prequests>python makeRequest.py
301
[]
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

Requests - Handling History

您可以通过使用 response.history 获得给定 URL 的历史记录。如果给定的 URL 有任何重定向,则会将重定向存储在历史记录中。

For history

import requests
getdata = requests.get('http://google.com/')
print(getdata.status_code)
print(getdata.history)

Output

E:\prequests>python makeRequest.py
200
[<Response [301]>]

response.history 属性将包括基于请求完成的响应对象的详细信息。显示的值将从最旧的排序到最新的。 response.history 属性将跟踪对请求的 URL 所做的所有重定向。

Requests - Handling Sessions

若要在请求之间维护数据,则需要会话。因此,如果反复调用同一主机,您可以重用 TCP 连接,这反过来将提高性能。现在,让我们看看如何使用会话在发出的请求之间维护 Cookie。

Adding cookies using session

import requests
req = requests.Session()
cookies = dict(test='test123')
getdata = req.get('https://httpbin.org/cookies',cookies=cookies)
print(getdata.text)

Output

E:\prequests>python makeRequest.py
{
   "cookies": {
   "test": "test123"
}
}

使用会话,您可以保留请求之间的 Cookie 数据。还可以使用会话传递标头数据,如下所示:

Example

import requests
req = requests.Session()
req.headers.update({'x-user1': 'ABC'})
headers = {'x-user2': 'XYZ'}
getdata = req.get('https://httpbin.org/headers', headers=headers)
print(getdata.headers)

Requests - SSL Certification

SSL证书是安全网址附带的一项安全功能。当您使用 Requests 库时,它也会验证给定 https 网址的 SSL 证书。SSL 验证在 requests 模块中默认启用,如果证书不存在,则会引发错误。

Working with secure URL

以下是使用安全网址的示例−

import requests
getdata = requests.get(https://jsonplaceholder.typicode.com/users)
print(getdata.text)

Output

E:\prequests>python makeRequest.py
[
   {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
   "bs": "harness real-time e-markets"
   }
  }
]

我们很容易从上述 https 网址获取响应,这是因为 request 模块可以验证 SSL 证书。

您可以通过在示例中所示的方式简单添加 verify=False 来禁用 SSL 验证。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users', verify=False)
print(getdata.text)

您将获得输出,但它还会给出一条警告消息,即 SSL 证书尚未得到验证,建议添加证书验证。

Output

E:\prequests>python makeRequest.py
connectionpool.py:851: InsecureRequestWarning: Unverified HTTPS request is being
made. Adding certificate verification is strongly advised. See: https://urllib3
   .readthedocs.io/en/latest/advanced-usage.htm  l#ssl-warnings
 InsecureRequestWarning)
[
 {
  "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered   client-server neural-net",
   "bs": "harness real-time e-markets"
  }
 }
]

您还可以通过在自己这边托管 SSL 证书并使用 verify 参数给出路径来验证 SSL 证书,如下所示。

Example

import requests
getdata = requests.get('https://jsonplaceholder.typicode.com/users', verify='C:\Users\AppData\Local\certificate.txt')
print(getdata.text)

Output

E:\prequests>python makeRequest.py
[
  {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered   client-server neural-net",
   "bs": "harness real-time e-markets"
   }
  }
]

Requests - Authentication

本章将讨论 Requests 模块中可用的认证类型。

我们将讨论以下内容−

  1. 使用 HTTP 请求进行认证

  2. Basic Authentication

  3. Digest Authentication

  4. OAuth2 Authentication

Working of Authentication in HTTP Requests

HTTP 认证是在服务器端进行的,当客户端请求一个网址时,要求提供一些认证信息,例如用户名和密码。这是对客户端和服务器之间交换的请求和响应的额外安全保护。

从客户端角度而言,这些额外的认证信息(即用户名和密码)可以发送在标头中,稍后将在服务器端进行验证。只有在认证有效的情况下,才会从服务器端传递响应。

Requests 库已在 requests.auth 中使用了最常用的认证,它们是基本认证(HTTPBasicAuth)和摘要认证(HTTPDigestAuth)。

Basic Authentication

这是提供服务器认证的最简单形式。为了使用基本认证,我们将使用 requests 库提供的 HTTPBasicAuth 类。

Example

这是一个如何使用它的工作示例。

import requests
from requests.auth import HTTPBasicAuth
response_data = requests.get('httpbin.org/basic-auth/admin/admin123', auth=HTTPDigestAuth('admin', 'admin123'))
print(response_data.text)

我们调用网址 https://httpbin.org/basic-auth/admin/admin123 ,其中用户是 admin,密码是 admin123。

因此,如果没有认证(即用户名和密码),这个网址将无法工作。一旦您使用 auth 参数提供认证,则只有服务器才会返回响应。

Output

E:\prequests>python makeRequest.py
{
   "authenticated": true,
   "user": "admin"
}

Digest Authentication

这是 requests 中可用的另一种认证形式。我们将使用 requests 中的 HTTPDigestAuth 类。

Example

import requests
from requests.auth import HTTPDigestAuth
response_data = requests.get('https://httpbin.org/digest-auth/auth/admin/admin123>, auth=HTTPDigestAuth('admin', 'admin123'))
print(response_data.text)

Output

E:\prequests>python makeRequest.py
{
   "authenticated": true,
   "user": "admin"
}

OAuth2 Authentication

要使用 OAuth2 认证,我们需要 “requests_oauth2” 库。要安装 “requests_oauth2”,请执行以下操作−

pip install requests_oauth2

在终端中安装时显示的内容如下图所示−

E:\prequests>pip install requests_oauth2
Collecting requests_oauth2
Downloading https://files.pythonhosted.org/packages/52/dc/01c3c75e6e7341a2c7a9
71d111d7105df230ddb74b5d4e10a3dabb61750c/requests-oauth2-0.3.0.tar.gz
Requirement already satisfied: requests in c:\users\xyz\appdata\local\programs
\python\python37\lib\site-packages (from requests_oauth2) (2.22.0)
Requirement already satisfied: six in c:\users\xyz\appdata\local\programs\pyth
on\python37\lib\site-packages (from requests_oauth2) (1.12.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\use
rs\xyz\appdata\local\programs\python\python37\lib\site-packages (from requests
->requests_oauth2) (1.25.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\xyz\appdata\loca
l\programs\python\python37\lib\site-packages (from requests->requests_oauth2) (2
019.3.9)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\xyz\appdata\l
ocal\programs\python\python37\lib\site-packages (from requests->requests_oauth2)
(3.0.4)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\xyz\appdata\local\pr
ograms\python\python37\lib\site-packages (from requests->requests_oauth2) (2.8)
Building wheels for collected packages: requests-oauth2
Building wheel for requests-oauth2 (setup.py) ... done
Stored in directory: C:\Users\xyz\AppData\Local\pip\Cache\wheels\90\ef\b4\43
3743cbbc488463491da7df510d41c4e5aa28213caeedd586
Successfully built requests-oauth2

我们已经完成了“requests-oauth2”的安装。要使用 Google、Twitter 的 API,我们需要其同意,这将使用 OAuth2 身份验证完成。

对于 OAuth2 身份验证,我们需要客户端 ID 和密钥。有关如何获取它们的详细信息,可以在链接中找到: https://developers.google.com/identity/protocols/OAuth2

稍后,登录 Google API 控制台,该控制台可在 https://console.developers.google.com/ 获得,并获取客户端 ID 和密钥。

Example

下面是使用“requests-oauth2”的示例。

import requests
from requests_oauth2.services import GoogleClient
google_auth = GoogleClient(
   client_id="xxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com",
redirect_uri="http://localhost/auth/success.html",
)
a = google_auth.authorize_url(
   scope=["profile", "email"],
   response_type="code",
)
res = requests.get(a)
print(res.url)

我们无法重定向到给定的 URL,因为它需要登录 Gmail 帐户,但在这里,您将从示例中看到,google_auth 可用,并给出了授权 URL。

Output

E:\prequests>python oauthRequest.py
https://accounts.google.com/o/oauth2/auth?redirect_uri=
http%3A%2F%2Flocalhost%2Fauth%2Fsuccess.html&
client_id=xxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com&
scope=profile+email&response_type=code

Requests - Event Hooks

我们可以使用事件挂钩将事件添加到请求的 URL。在下面的示例中,我们将添加一个回调函数,该函数将在响应可用时被调用。

Example

要添加回调,我们需要像下面示例中所示使用 hooks 参数−

mport requests
def printData(r, *args, **kwargs):
   print(r.url)
   print(r.text)
getdata = requests.get('https://jsonplaceholder.typicode.com/users',
hooks={'response': printData})

Output

E:\prequests>python makeRequest.py
https://jsonplaceholder.typicode.com/users
[
{
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
    "lat": "-37.3159",
    "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
   "bs": "harness real-time e-markets"
   }
  }
]

您还可以像下面所示调用多个回调函数−

Example

import requests
def printRequestedUrl(r, *args, **kwargs):
   print(r.url)
def printData(r, *args, **kwargs):
   print(r.text)
getdata = requests.get('https://jsonplaceholder.typicode.com/users', hooks={'response': [printRequestedUrl, printData]})

Output

E:\prequests>python makeRequest.py
https://jsonplaceholder.typicode.com/users
[
  {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
 }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
    "bs": "harness real-time e-markets"
   }
  }
]

您还可以将钩子添加到下面所示的创建的会话−

Example

import requests
def printData(r, *args, **kwargs):
print(r.text)
s = requests.Session()
s.hooks['response'].append(printData)
s.get('https://jsonplaceholder.typicode.com/users')

Output

E:\prequests>python makeRequest.py
[
 {
   "id": 1,
   "name": "Leanne Graham",
   "username": "Bret",
   "email": "Sincere@april.biz",
   "address": {
   "street": "Kulas Light",
   "suite": "Apt. 556",
   "city": "Gwenborough",
   "zipcode": "92998-3874",
   "geo": {
   "lat": "-37.3159",
   "lng": "81.1496"
   }
  },
   "phone": "1-770-736-8031 x56442",
   "website": "hildegard.org",
   "company": {
   "name": "Romaguera-Crona",
   "catchPhrase": "Multi-layered client-server neural-net",
   "bs": "harness real-time e-markets"
   }
  }
]

Requests - Proxy

到目前为止,我们已经看到客户端直接连接并与服务器通信。使用代理,交互如下进行−

  1. 客户端向代理发送请求。

  2. 代理将请求发送到服务器。

  3. 服务器将响应发回代理。

  4. 代理将向客户端发送响应。

使用 Http-proxy 是分配给管理客户端和服务器之间数据交换的附加安全性。requests 库还提供了处理代理的设置,方法是使用代理参数,如下所示−

Example

import requests
proxies = {
'http': 'http://localhost:8080'
}
res = requests.get('http://httpbin.org/', proxies=proxies)
print(res.status_code)

该请求将路由到( http://localhost:8080 )URL。

Output

200

Requests - Web Scraping using Requests

我们已经看到如何使用 Python requests 库从给定的 URL 获取数据。我们将尝试使用以下代码从 Tutorialspoint 站点(可在此处获取: https://www.tutorialspoint.com/tutorialslibrary.htm )提取数据:

  1. Requests Library

  2. Python 中 BeautifulSoup 库

我们已经安装了 Requests 库,现在让我们安装 BeautifulSoup 包。如果您想进一步了解 BeautifulSoup 的一些功能,这是 beautiful soup 的官方网站,该网站可在此处获取: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Installing Beautifulsoup

我们将在下面看到如何安装 Beautiful Soup:

E:\prequests>pip install beautifulsoup4
Collecting beautifulsoup4
Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4ba
cdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (
101kB)
|████████████████████████████████| 102kB 22kB/s
Collecting soupsieve>=1.2 (from beautifulsoup4)
Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0
a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.1 soupsieve-1.9.5

我们现在已经安装了 Python requests 库和 beautiful soup。

现在让我们编写代码,这将从给定的 URL 中提取数据。

Web scraping

import requests
from bs4 import BeautifulSoup
res = requests.get('https://www.tutorialspoint.com/tutorialslibrary.htm')
print("The status code is ", res.status_code)
print("\n")
soup_data = BeautifulSoup(res.text, 'html.parser')
print(soup_data.title)
print("\n")
print(soup_data.find_all('h4'))

使用 requests 库,我们可以从给定的 URL 中获取内容,beautiful soup 库有助于分析它并以我们想要的方式获取详细信息。

您可以使用 beautiful soup 库使用 Html 标记、类、id、css 选择器以及更多方式提取数据。以下是我们获得的输出,其中我们打印了页面的标题以及页面上的所有 h4 标记。

Output

E:\prequests>python makeRequest.py
The status code is 200
<title>Free Online Tutorials and Courses</title>
[<h4>Academic</h4>, <h4>Computer Science</h4>, <h4>Digital Marketing</h4>, <h4>M
onuments</h4>,<h4>Machine Learning</h4>, <h4>Mathematics</h4>, <h4>Mobile Devel
opment</h4>,<h4>SAP</h4>, <h4>Software Quality</h4>, <h4>Big Data & Analyti
cs</h4>, <h4>Databases</h4>, <h4>Engineering Tutorials</h4>, <h4>Mainframe Devel
opment</h4>, <h4>Microsoft Technologies</h4>, <h4>Java Technologies</h4>,<h4>XM
L Technologies</h4>, <h4>Python Technologies</h4>, <h4>Sports</h4>, <h4>Computer
Programming</h4>,<h4>DevOps</h4>, <h4>Latest Technologies</h4>, <h4>Telecom</h4>, <h4>Exams Syllabus</h4>, <h4>UPSC IAS Exams</h4>, <h4>Web Development</h4>,
<h4>Scripts</h4>, <h4>Management</h4>,<h4>Soft Skills</h4>, <h4>Selected Readin
g</h4>, <h4>Misc</h4>]