Text processing notes
Jump to navigation
Jump to search
✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
Command line
*nix-ly basics:
- cut, tr, wc, grep,
- awk, sed
- but also things like recode
...see e.g. [1]
There are various interesting tricks you can do with grep, awk, and sed. For example:
Variants on grep: (see also [2])
- sgrep (structured grep) interprets HTML and similar structures and has a GCL-like query syntax(verify)
- agrep (approximate grep) returns things that differ up to a certain amount of characters
Ideas
- finite state automata fore string recognition, simple string translation
Unsorted
Command line:
- cgrep (context grep)
- http://portal.acm.org/citation.cfm?id=962225&dl=GUIDE&coll=GUIDE
- http://www.site.uottawa.ca/~tcl/kbre/options/