Screen scraping: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
Line 1: Line 1:
{{scraping}}
{{scraping}}


<!--
Screen scraping refers to taking content intended to be on screen for a user,
and consuming it as a machine instead, often to find specific information, or to automate something normally done via browser interaction.


Screen scraping is often specifically web scraping, where you take information from a served webpage.
...and this is often done with what we call a [[mechanical browser]] - something that isn't a full-blown graphical browser (sometimes much less),
because while leveraging an ''actual'' browser may be most representative of what an actual browser would see, it is also computationally more expensive and most of the time has no extra benefits.
The ''practice'' of scraping is often picking out the information you actually need -- particularly if what you read is meant for human interpretation only (most HTML) and often not as structured as data would be.
-->





Latest revision as of 15:11, 8 January 2024

Screen scraping (mostly HTML and XML parsing)

Python: BeautifulSoup · ElementTree / lxml scraping
Wrapping or controlling a browser



Is scraping legal?