Scrapy 简明教程
Scrapy - Overview
Scrapy 是一个使用 Python 编写的高速、开放源代码 Web 爬取框架,它用于借助基于 XPath 的选择器从网页提取数据。
Scrapy is a fast, open-source web crawling framework written in Python, used to extract the data from the web page with the help of selectors based on XPath.
Scrapy 最初于 2008 年 6 月 26 日发布,并获得 BSD 许可,并在 2015 年 6 月发布了 1.0 里程碑版本。
Scrapy was first released on June 26, 2008 licensed under BSD, with a milestone 1.0 releasing in June 2015.
Why Use Scrapy?
-
It is easier to build and scale large crawling projects.
-
It has a built-in mechanism called Selectors, for extracting the data from websites.
-
It handles the requests asynchronously and it is fast.
-
It automatically adjusts crawling speed using Auto-throttling mechanism.
-
Ensures developer accessibility.
Features of Scrapy
-
Scrapy is an open source and free to use web crawling framework.
-
Scrapy generates feed exports in formats such as JSON, CSV, and XML.
-
Scrapy has built-in support for selecting and extracting data from sources either by XPath or CSS expressions.
-
Scrapy based on crawler, allows extracting data from the web pages automatically.
Advantages
-
Scrapy is easily extensible, fast, and powerful.
-
It is a cross-platform application framework (Windows, Linux, Mac OS and BSD).
-
Scrapy requests are scheduled and processed asynchronously.
-
Scrapy comes with built-in service called Scrapyd which allows to upload projects and control spiders using JSON web service.
-
It is possible to scrap any website, though that website does not have API for raw data access.