Python Network Programming 简明教程

Python - Building URLs

requests 模块可以帮助我们构建 URL 并动态地操作 URL 值。可以以编程方式获取 URL 的任何子目录,然后将其一部分替换为新值以构建新 URL。

The requests module can help us build the URLS and manipulate the URL value dynamically. Any sub-directory of the URL can be fetched programmatically and then some part of it can be substituted with new values to build new URLs.

Build_URL

以下示例使用 urljoin 获取 URL 路径中的不同子文件夹。urljoin 方法用于向基本 URL 添加新值。

The below example uses urljoin to fetch the different subfolders in the URL path. The urljoin method is used to add new values to the base URL.

from requests.compat import urljoin
base='https://stackoverflow.com/questions/3764291'
print urljoin(base,'.')
print urljoin(base,'..')
print urljoin(base,'...')
print urljoin(base,'/3764299/')
url_query = urljoin(base,'?vers=1.0')
print url_query
url_sec = urljoin(url_query,'#section-5.4')
print url_sec

当我们运行以上程序时,我们得到以下输出:

When we run the above program, we get the following output −

https://stackoverflow.com/questions/
https://stackoverflow.com/
https://stackoverflow.com/questions/...
https://stackoverflow.com/3764299/
https://stackoverflow.com/questions/3764291?vers=1.0
https://stackoverflow.com/questions/3764291?vers=1.0#section-5.4

Split the URLS

URL 还可以分割成主地址之外的许多部分。使用 urlparse 方法分隔用于特定查询或附加到 URL 的标签的其他参数,如下所示。

The URLs can also be split into many parts beyond the main address. The additional parameters which are used for a specific query or tags attached to the URL are separated by using the urlparse method as shown below.

from requests.compat import urlparse
url1 = 'https://docs.python.org/2/py-modindex.html#cap-f'
url2='https://docs.python.org/2/search.html?q=urlparse'
print urlparse(url1)
print urlparse(url2)

当我们运行以上程序时,我们得到以下输出:

When we run the above program, we get the following output −

ParseResult(scheme='https', netloc='docs.python.org', path='/2/py-modindex.html', params='', query='', fragment='cap-f')
ParseResult(scheme='https', netloc='docs.python.org', path='/2/search.html', params='', query='q=urlparse', fragment='')