Screen scraping: Difference between revisions

Latest revision as of 15:11, 8 January 2024

Screen scraping (mostly HTML and XML parsing)

@@ Line 1: / Line 1: @@
 {{scraping}}
+<!--
+Screen scraping refers to taking content intended to be on screen for a user,
+and consuming it as a machine instead, often to find specific information, or to automate something normally done via browser interaction.
+Screen scraping is often specifically web scraping, where you take information from a served webpage.
+...and this is often done with what we call a [[mechanical browser]] - something that isn't a full-blown graphical browser (sometimes much less),
+because while leveraging an ''actual'' browser may be most representative of what an actual browser would see, it is also computationally more expensive and most of the time has no extra benefits.
+The ''practice'' of scraping is often picking out the information you actually need -- particularly if what you read is meant for human interpretation only (most HTML) and often not as structured as data would be.
+-->