Markup language notes

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...that is, markup used more for documents -- for similar things mostly used for data, see e.g. Programming_notes/Communicated_state_and_calls#Data_and_serialization

Markdown

📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.

Headings

# First-level heading
## Second-level heading
### Third-level heading
etc.

links, images

[inline link](url.here)
![alt text](image.url.here)

Stylizing

*emphasis (italic)*
_emphasis (italic)_
**strong emphasis (boldface)**
__strong emphasis (boldface)__
***very strong emphasis (italic and boldface)***
___very strong emphasis (italic and boldface)___

Text with `some_code()`

    Longer code 
    should be indented with
    four spaces

Text layout

Paragraphs of natural text are separated by one or more empty lines.

Like this.

For code, you sometimes want manual line breaks. You can get those
by ending a line with two or more spaces

> Blockquote

Lists

Unordered bullet lists via +, - and *

Ordere lists via numbers.

+ Thing
  1. Numbered subthing
  1. Another numbered subthing
+ Other thing
    If you need a paragraph to belong to an item, use four spaces (or a tab)

Other

Horizontal rules can be had by having a line containing three or more asterisks or minus signs, e.g.

* * *
****
- - -
---------------------------------------

Tables

Not a thing.

Github flavoured markdown

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Adds things like:

> [!NOTE]
> Useful information that users should know, even when skimming content.

> [!TIP]
> Helpful advice for doing things better or more easily.

> [!IMPORTANT]
> Key information users need to know to achieve their goal.

> [!WARNING]
> Urgent info that needs immediate user attention to avoid problems.

> [!CAUTION]
> Advises about risks or negative outcomes of certain actions.

<!-- avoid rendering -->

reStructuredText

Abbreviated rst, or reST (not to be confused with REST)

https://docutils.sourceforge.net/rst.html

https://docutils.sourceforge.io/docs/user/rst/cheatsheet.txt

epytext

Made for epydoc, a python documentation generator

https://epydoc.sourceforge.net/epytext.html

BBcode

BBCode ('bulletin board code') allows a simple alternative to HTML, for users in forums, etc.

It is a little simpler to type, but perhaps more importantly, it makes sanitizing your input easier, both for invalid/unbalanced HTML that could disturb the page, and for things like nasty XSS script inserts, and do so in a "whitelist, don't blacklist" approach.

Generally, you would remove all html, then parse and convert bbcode to html, though removal of HTML is sometimes done unsafely in itself. (One alternative is having your BBCode parser escape all HTML so that exploit code is simply displayed verbatim)

BBCode is not a standard, so there is variation in what tags parsers will accept, and in what form they will or won't accept them. Consider:

capitalizing
nesting
spacing
unbalanced bbtags
unknown arguments, usage of arguments at all (see the various [url] styles)

how they transform it, and whether they guarantee correct HTML output (regexp-based implementations regularly do not)

whether they actually live up to the mentioned safety.

This depends mainly on how the implementer understand the intricacies.

For example, the core tags seem to consist of roughly:

[b]bolded text[/b]

[i]italicized text[/i]

[u]underlined text[/u]

[s]strikethrough[/s]

[img]http://example.com/pic.png[/img]

[url]http://example.com[/url]
    and sometimes also:
[url=http://example.com]Link name[/url]

It's not uncommon to see:

[email]foo@example.com[/email]

[color=red]Red Text[/color]

[size=15]Large Text[/size]

[center]horizontal centering[/center]

[pre]strikethrough[/pre]

[quote]quoted text[/quote]
 also:
   [quote=Will]quoted text[/quote]
   [quote Will said]quoted text[/quote]
   
[code]monospaced text[/code]

[:-)]

And I've seen mention of:

[link]  (same functionality as url)

[list]
* Item
* Item
[/list]

[google], [wiki]  (search link, by term)

[spoiler]Dumbledore likes to boogie.[/spoiler]

[whisper=username]Psst.[/whisper] (private message to specific user on bboard)

[html]Freeform HTML. If available at all, only admins should ever get to use this.[/html]

[flash], [audio] (embedding, with various options)

mediawiki

No real reference; authors say "see the parser code".

For information extraction, it may be simpler to parse the resulting HTML, partly because the parser code does a little correction and normalization.