Beautiful Soup 简明教程
Beautiful Soup - web-scraping
抓取只是从(各种方法)中提取、复制和筛选数据的过程。
Scraping is simply a process of extracting (from various means), copying and screening of data.
当我们从网络上抓取或提取数据或提要(例如来自网页或网站)时,这被称为网络抓取。
When we scrape or extract data or feeds from the web (like from web-pages or websites), it is termed as web-scraping.
因此,网络抓取(也称为网络数据提取或网络获取)是从网络中提取数据的过程。简而言之,网络抓取为开发人员提供了一种从互联网收集和分析数据的方法。
So, web scraping (which is also known as web data extraction or web harvesting) is the extraction of data from web. In short, web scraping provides a way to the developers to collect and analyze data from the internet.
Why Web-scraping?
网络抓取提供了一个很好的工具,可以自动执行人在浏览时所做的很多事情。网络抓取在企业中有多种用途——
Web-scraping provides one of the great tools to automate most of the things a human does while browsing. Web-scraping is used in an enterprise in a variety of ways −
Data for Research
聪明的分析师(例如研究人员或记者)使用网络抓取器,而不是手动从网站收集和清理数据。
Smart analyst (like researcher or journalist) uses web scrapper instead of manually collecting and cleaning data from the websites.
Products, prices & popularity comparison
目前,有一些服务使用网络抓取器从众多在线网站收集数据,并使用这些数据来比较产品的受欢迎程度和价格。
Currently there are couple of services which use web scrappers to collect data from numerous online sites and use it to compare products popularity and prices.
SEO Monitoring
有许多用于竞争性分析和从客户网站提取数据的 SEO 工具,例如 Ahref、Seobility、SEMrush 等。
There are numerous SEO tools such as Ahrefs, Seobility, SEMrush, etc., which are used for competitive analysis and for pulling data from your client’s websites.
Search engines
有一些大型 IT 公司的业务完全依赖网络抓取。
There are some big IT companies whose business solely depends on web scraping.
Sales and Marketing
通过网络抓取收集的数据可由营销人员用来分析不同的利基市场和竞争对手,或由销售专家用来销售内容营销或社交媒体推广服务。
The data gathered through web scraping can be used by marketers to analyze different niches and competitors or by the sales specialist for selling content marketing or social media promotion services.
Why Python for Web Scraping?
Python 是最流行的网络抓取语言之一,因为它可以非常轻松地处理大多数与网络爬取相关的事务。
Python is one of the most popular languages for web scraping as it can handle most of the web crawling related tasks very easily.
以下是选择 Python 进行网络爬取的原因:
Below are some of the points on why to choose python for web scraping −
Ease of Use
大多数开发人员都同意 Python 非常容易编码。我们不必在任何地方使用花括号“{ }”或分号“;”,这使得它在开发网络爬取器时更具可读性和易用性。
As most of the developers agree that python is very easy to code. We don’t have to use any curly braces "{ }" or semi-colons ";" anywhere, which makes it more readable and easy-to-use while developing web scrapers.
Huge Library Support
Python 为不同的需求提供了大量的库,因此它适用于网络爬取以及数据可视化、机器学习等。
Python provides huge set of libraries for different requirements, so it is appropriate for web scraping as well as for data visualization, machine learning, etc.
Easily Explicable Syntax
Python 是一种非常易读的编程语言,因为 Python 语法易于理解。Python 非常具表现力,并且代码缩进有助于用户区分代码中的不同块或范围。
Python is a very readable programming language as python syntax are easy to understand. Python is very expressive and code indentation helps the users to differentiate different blocks or scopes in the code.