Markup language notes

From Helpful
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


...that is, markup used more for documents -- for similar things mostly used for data, see e.g. Programming_notes/Communicated_state_and_calls#Data_and_serialization


Markdown

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


Headings
# First-level heading
## Second-level heading
### Third-level heading
etc.


links, images
[inline link](url.here)
![alt text](image.url.here)


Stylizing
*emphasis (italic)*
_emphasis (italic)_
**strong emphasis (boldface)**
__strong emphasis (boldface)__
***very strong emphasis (italic and boldface)***
___very strong emphasis (italic and boldface)___ 


Text with `some_code()`
    Longer code 
    should be indented with
    four spaces


Text layout
Paragraphs of natural text are separated by one or more empty lines.
Like this.
For code, you sometimes want manual line breaks. You can get those
by ending a line with two or more spaces
> Blockquote



Lists

Unordered bullet lists via +, - and *

Ordere lists via numbers.

+ Thing
  1. Numbered subthing
  1. Another numbered subthing
+ Other thing
    If you need a paragraph to belong to an item, use four spaces (or a tab)


Other

Horizontal rules can be had by having a line containing three or more asterisks or minus signs, e.g.

* * *
****
- - -
---------------------------------------


Tables

Not a thing.




See also:

Implementations:


Github flavoured markdown

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Adds things like:

> [!NOTE]
> Useful information that users should know, even when skimming content.
> [!TIP]
> Helpful advice for doing things better or more easily.
> [!IMPORTANT]
> Key information users need to know to achieve their goal.
> [!WARNING]
> Urgent info that needs immediate user attention to avoid problems.
> [!CAUTION]
> Advises about risks or negative outcomes of certain actions.


<!-- avoid rendering -->






reStructuredText

Abbreviated rst, or reST (not to be confused with REST)



https://docutils.sourceforge.net/rst.html

https://docutils.sourceforge.io/docs/user/rst/cheatsheet.txt

epytext

Made for epydoc, a python documentation generator

https://epydoc.sourceforge.net/epytext.html

BBcode

BBCode ('bulletin board code') allows a simple alternative to HTML, for users in forums, etc.


It is a little simpler to type, but perhaps more importantly, it makes sanitizing your input easier, both for invalid/unbalanced HTML that could disturb the page, and for things like nasty XSS script inserts, and do so in a "whitelist, don't blacklist" approach.

Generally, you would remove all html, then parse and convert bbcode to html, though removal of HTML is sometimes done unsafely in itself. (One alternative is having your BBCode parser escape all HTML so that exploit code is simply displayed verbatim)


BBCode is not a standard, so there is variation in what tags parsers will accept, and in what form they will or won't accept them. Consider:

  • capitalizing
  • nesting
  • spacing
  • unbalanced bbtags
  • unknown arguments, usage of arguments at all (see the various [url] styles)
  • how they transform it, and whether they guarantee correct HTML output (regexp-based implementations regularly do not)
  • whether they actually live up to the mentioned safety.

This depends mainly on how the implementer understand the intricacies.


For example, the core tags seem to consist of roughly:

[b]bolded text[/b]

[i]italicized text[/i]

[u]underlined text[/u]

[s]strikethrough[/s]

[img]http://example.com/pic.png[/img]

[url]http://example.com[/url]
    and sometimes also:
[url=http://example.com]Link name[/url]


It's not uncommon to see:

[email]foo@example.com[/email]

[color=red]Red Text[/color]

[size=15]Large Text[/size]

[center]horizontal centering[/center]

[pre]strikethrough[/pre]

[quote]quoted text[/quote]
 also:
   [quote=Will]quoted text[/quote]
   [quote Will said]quoted text[/quote]
   
[code]monospaced text[/code]

[:-)]


And I've seen mention of:

[link]  (same functionality as url)

[list]
* Item
* Item
[/list]

[google], [wiki]  (search link, by term)

[spoiler]Dumbledore likes to boogie.[/spoiler]

[whisper=username]Psst.[/whisper] (private message to specific user on bboard)

[html]Freeform HTML. If available at all, only admins should ever get to use this.[/html]

[flash], [audio] (embedding, with various options)


mediawiki

No real reference; authors say "see the parser code".

For information extraction, it may be simpler to parse the resulting HTML, partly because the parser code does a little correction and normalization.


See also: