rvest

Simple web scraping for R

rvest is an R package for web scraping that extracts data from HTML web pages. It uses a pipe-friendly syntax inspired by libraries like Beautiful Soup to make common scraping tasks straightforward.

The package provides functions to parse HTML, select elements using CSS selectors or XPath, extract text and attributes, and convert HTML tables directly to data frames. It integrates well with tidyverse workflows and supports both single-element and multi-element extraction. For scraping multiple pages, it works alongside the polite package to respect robots.txt and avoid overwhelming servers.

rvest

Contributors

Hadley Wickham

Jeroen Ooms

Charlie Gao

Charlotte Wickham