Markup language notes
...that is, markup used more for documents -- for similar things mostly used for data, see e.g. Programming_notes/Communicated_state_and_calls#Data_and_serialization
Markdown
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense. |
- Headings
# First-level heading ## Second-level heading ### Third-level heading etc.
- links, images
[inline link](url.here) ![alt text](image.url.here)
- Stylizing
*emphasis (italic)* _emphasis (italic)_ **strong emphasis (boldface)** __strong emphasis (boldface)__ ***very strong emphasis (italic and boldface)*** ___very strong emphasis (italic and boldface)___
Text with `some_code()`
Longer code should be indented with four spaces
- Text layout
Paragraphs of natural text are separated by one or more empty lines.
Like this.
For code, you sometimes want manual line breaks. You can get those by ending a line with two or more spaces
> Blockquote
- Lists
Unordered bullet lists via +, - and *
Ordere lists via numbers.
+ Thing 1. Numbered subthing 1. Another numbered subthing + Other thing If you need a paragraph to belong to an item, use four spaces (or a tab)
- Other
Horizontal rules can be had by having a line containing three or more asterisks or minus signs, e.g.
* * * **** - - - ---------------------------------------
- Tables
Not a thing.
See also:
Implementations:
Github flavoured markdown
Adds things like:
> [!NOTE] > Useful information that users should know, even when skimming content.
> [!TIP] > Helpful advice for doing things better or more easily.
> [!IMPORTANT] > Key information users need to know to achieve their goal.
> [!WARNING] > Urgent info that needs immediate user attention to avoid problems.
> [!CAUTION] > Advises about risks or negative outcomes of certain actions.
<!-- avoid rendering -->
reStructuredText
Abbreviated rst, or reST (not to be confused with REST)
https://docutils.sourceforge.net/rst.html
https://docutils.sourceforge.io/docs/user/rst/cheatsheet.txt
epytext
Made for epydoc, a python documentation generator, but also used by others, e.g. pydoctor
https://epydoc.sourceforge.net/epytext.html
BBcode
BBCode ('bulletin board code') allows a simple alternative to HTML, for users in forums, etc.
It is a little simpler to type, but perhaps more importantly, it makes sanitizing your input easier, both for invalid/unbalanced HTML that could disturb the page, and for things like nasty XSS script inserts, and do so in a "whitelist, don't blacklist" approach.
Generally, you would remove all html, then parse and convert bbcode to html, though removal of HTML is sometimes done unsafely in itself. (One alternative is having your BBCode parser escape all HTML so that exploit code is simply displayed verbatim)
BBCode is not a standard, so there is variation in what tags parsers will accept, and in what form they will or won't accept them. Consider:
- capitalizing
- nesting
- spacing
- unbalanced bbtags
- unknown arguments, usage of arguments at all (see the various [url] styles)
- how they transform it, and whether they guarantee correct HTML output (regexp-based implementations regularly do not)
- whether they actually live up to the mentioned safety.
This depends mainly on how the implementer understand the intricacies.
For example, the core tags seem to consist of roughly:
[b]bolded text[/b] [i]italicized text[/i] [u]underlined text[/u] [s]strikethrough[/s] [img]http://example.com/pic.png[/img] [url]http://example.com[/url] and sometimes also: [url=http://example.com]Link name[/url]
It's not uncommon to see:
[email]foo@example.com[/email] [color=red]Red Text[/color] [size=15]Large Text[/size] [center]horizontal centering[/center] [pre]strikethrough[/pre] [quote]quoted text[/quote] also: [quote=Will]quoted text[/quote] [quote Will said]quoted text[/quote] [code]monospaced text[/code] [:-)]
And I've seen mention of:
[link] (same functionality as url) [list] * Item * Item [/list] [google], [wiki] (search link, by term) [spoiler]Dumbledore likes to boogie.[/spoiler] [whisper=username]Psst.[/whisper] (private message to specific user on bboard) [html]Freeform HTML. If available at all, only admins should ever get to use this.[/html] [flash], [audio] (embedding, with various options)
mediawiki
No real reference; authors say "see the parser code".
For information extraction, it may be simpler to parse the resulting HTML, partly because the parser code does a little correction and normalization.
See also: