Beautiful Soup 简明教程

Beautiful Soup Tutorial

在本教程中,我们将向您展示如何使用 Beautiful Soup 4 在 Python 中执行网络爬取,以从 HTML、XML 和其他标记语言中获取数据。在这里,我们将尝试从各种不同网站(包括 IMDB)中爬取网页。我们将介绍 beautiful soup 4、python 基本工具,用于有效且清晰地导航、搜索和解析 HTML 网页。

In this tutorial, we will show you, how to perform web scraping in Python using Beautiful Soup 4 for getting data out of HTML, XML and other markup languages. In this we will try to scrap webpage from various different websites (including IMDB). We will cover beautiful soup 4, python basic tools for efficiently and clearly navigating, searching and parsing HTML web page.

在本教程中,我们已尝试介绍 Beautiful Soup 4 的几乎所有功能。你可以将本教程中介绍的多个功能整合到一个更大的程序中,从网站中捕获多个有意义的数据,作为输入放入其他子程序。

We have tried to cover almost all the functionalities of Beautiful Soup 4 in this tutorial. You can combine multiple functionalities introduced in this tutorial into one bigger program to capture multiple meaningful data from the website into some other sub-program as input.

Audience

本教程基本上旨在指导你抓取一个网页。所有这一切的基本要求是从大量的无组织数据集中获取有意义的数据。本教程的目标受众可以是任何人——

This tutorial is basically designed to guide you in scarping a web page. Basic requirement of all this is to get meaningful data out of huge unorganized set of data. The target audience of this tutorial can be anyone of −

  1. Anyone who wants to know - how to scrap webpage in python using BeautifulSoup.

  2. Any data science developer/enthusiasts or anyone, how wants to use this scraped (meaningful) data to different python data science libraries to make better decision.

Prerequisites

尽管本教程没有强制性要求。但是,如果您对任何以下提及的技术有任何或全部(超炫)的先验知识,将是一个附加优势−

Though there is NO mandatory requirement to have for this tutorial. However, if you have any or all (supercool) prior knowledge on any below mentioned technologies that will be an added advantage −

  1. Knowledge of any web related technologies (HTML/CSS/Document object Model etc.).

  2. Python Language (as it is the python package).

  3. Developers who have any prior knowledge of scraping in any language.

  4. Basic understanding of HTML tree structure.