Beautiful Soup: Parsing HTML and XML with Python cover

Web · Ebook

Beautiful Soup: Parsing HTML and XML with Python

by Shriira Press

4.3(6,128)86 pagesPublished 2026

A comprehensive, self-contained guide to Beautiful Soup (the beautifulsoup4 / bs4 package) — Python's friendliest library for parsing HTML and XML and pulling data out of it. Where Scrapy is a whole framework for crawling thousands of pages, Beautiful Soup is a focused library for the other 90% of the time: you have some HTML — a page you fetched, a file, an API response — and you need to navigate it, search it, and extract the data, even when the markup is messy or broken. This book teaches it end to end — the parse tree, parsers, the four object types, navigating and searching, CSS selectors, extracting and modifying, real-world patterns, robustness, and where it fits the ecosystem — blending intuition, the concepts behind the library, and runnable code.

Contents

  1. 1Preface
  2. 2Chapter 1 — What Is Beautiful Soup?
  3. 3Chapter 2 — HTML, the Parse Tree, and Parsers
  4. 4Chapter 3 — Making Soup: Your First Parse
  5. 5Chapter 4 — The Four Objects: Tag, NavigableString, BeautifulSoup, Comment
  6. 6Chapter 5 — Navigating the Tree
  7. 7Chapter 6 — Searching the Tree: find and find_all
  8. 8Chapter 7 — CSS Selectors with select()
  9. 9Chapter 8 — Extracting Text and Attributes
  10. 10Chapter 9 — Modifying the Tree
  11. 11Chapter 10 — Real-World Parsing Patterns
  12. 12Chapter 11 — Robustness and Common Pitfalls
  13. 13Chapter 12 — Beautiful Soup vs. the Ecosystem
  14. 14Chapter 13 — In Practice and the Profession
  15. 15Appendix A — Glossary and Quick Reference
  16. 16Appendix B — Further Reading and Resources