Reading: Scrapy: Web Scraping at Scale with Python

A comprehensive, self-contained guide to Scrapy, the Python framework for web scraping and crawling — extracting structured data from websites, at…

Welcome to Scrapy: Web Scraping at Scale with Python.

A comprehensive, self-contained guide to Scrapy, the Python framework for web scraping and crawling — extracting structured data from websites, at scale, reliably. Where a quick script with requests + BeautifulSoup scrapes one page, Scrapy is a full framework for crawling thousands or millions of pages: an asynchronous engine, spiders that define what to crawl and extract, selectors (CSS and XPath) for pulling data out of HTML, item pipelines for cleaning and storing it, and the middleware, politeness, and deployment machinery real scraping needs. This book teaches it end to end — spiders, selectors, following links, items and pipelines, the architecture, middlewares, robustness, ethics and legality, and deployment — blending intuition, the concepts behind the framework, and runnable code.

This title is part of the ShriIra library and is free to read in full, right here — our small contribution to making world-class knowledge easy to reach.

A note on reading it: open the Contents menu at the top of the reader to jump between chapters, use the Aa menu to set a comfortable text size, theme (light, sepia, or night), and single- or two-page layout. Your place is saved automatically, so you can always pick up where you left off.

We hope it serves you well.

— Shriira Press

Preface

Contents