Knowledge representation / Semantic annotation / structured data / linked data on the web

From Helpful
(Redirected from Microformats)
Jump to navigation Jump to search

Data reference, annotation: Data annotation notes and tools · Knowledge representation / Semantic annotation / structured data / linked data on the web

Reference: Open science, research, access, data, etc. · Citations

Library related: Library glossary · Identifiers, classifiers, and other codes · Repository notes · Metadata models and standards

Library systems · Online (library) search related · Library-related service notes · OpenURL notes · OCLC Pica notes · Library - unsorted


Coding metadata (not tied to libraries and fixed metadata models)

See also

For metadata-only, library-related and academia-related side of things (and some "knowledge representations" that is glorified metadata), see also Metadata models and standards



URN

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

See Uniform_Resource_Somethings#Basic_concepts

CURIE

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A Compact URI (CURIE) amounts to a way of abbreviating a URL in a context - usually to abbreviate many.

It's seen more around XML and XHTML, because they're conceptualized from XML namespaces, and used directly in things like RDF(verify) and SPARQL, though could be used elsewhere fairly easily.

However, they are only parsed by CURIE-aware things, which isn't many(verify).


Consider:

<html xmlns:wikipedia="http://en.wikipedia.org/wiki/">
  <head></head>
  <body>
      <p>Find out more about <a href="[wikipedia:Biome]">biomes</a>.</p>
  </body>
</html>
</code>

or

PREFIX foaf:   <http://xmlns.com/foaf/0.1/> 
SELECT ?x ?name 
WHERE  { ?x foaf:name ?name }


Notes:

  • not all uses look the same
  • It ends up looking like a URN
but you can't really assume the prefix is fixed, as it is in URNs
also, a CURIE expands into URL/URI, while URN is a and stays a string that just happens to be an identifier
  • It is reminiscent of a <base>,
but that sets the base for all relative urls in a HTML document, and there can only be one


See also:

Microformats

Microformats add semantic markers to HTML.

Emerged at the time of HTML4, and (probably mainly for validation reasons) chose to reuse existing attributes - seemingly mostly class and rel.


For example, showing personal information in HTML and also telling a potential microformat parser what's what according to hCard:

<ul class="vcard">
  <li class="fn">Joe Doe</li>
  <li class="org">The Example Company</li>
  <li class="tel">604-555-1234</li>
  <li><a class="url" href="http://example.com/">http://example.com/</a></li>
</ul>


See also:

Microdata

Microdata marks up HTML with the data it is also displaying, e.g.

<div itemscope>
  <p>My name is <span itemprop="name">Neil</span>.</p>
  <p>My band is called <span itemprop="band">Four Parts Water</span>.</p>
  <p>I am <span itemprop="nationality">British</span>.</p>
</div>


...yes, it serves similar goals to microformats.


Microdata seems explained as a slightly-more-expressive successor, in that...

Microformats reuse existing parts of HTML4 (mostly class and rel), whereas microdata extends HTML5 with specific custom attributes (e.g. itemscope, itemtype, itemprop).
Also, microformats have sort of a fixed existing set, while microdata points at 'use any schema.org thing', allowing community extension.


(side note: that places that sanitize user-sourced HTML are more likely to remove microdata than microformats, due to unknown attributes)

See also:

JSON-LD

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Linked data idea, with keys according to schema.org, which happens to be embedded in a page as JSON. (Not to be confused with LDJSON / JSONL / NDJSON, which are serialization formats)


Consider the recipe example from here :

 <script type="application/ld+json">
    {
      "@context": "https://schema.org/",
      "@type": "Recipe",
      "name": "Party Coffee Cake",
      "author": {
        "@type": "Person",
        "name": "Mary Stone"
      },
      "datePublished": "2018-03-10",
      "description": "This coffee cake is awesome and perfect for parties.",
      "prepTime": "PT20M"
    }
 </script>

It's a machine-parseable form of the page's main content, assuming that main content is relatively bite-sized.

Seems to be developed for and targeted primarily at web crawlers.


Note that this omits the contentful sections like "ingredient", "yield", "instructions", which seems to indicate this was not aimed less knowledge, and more at crawlers caring about the type of page.

So this feels more SEO-adjacent, more so if you consider statements like "All annotated information must be on the page; adding information that is not on the page will likely not show in search results and is against Google guidelines").


Yet it's an open-ish mechanism, and due to the varied existing schemas you can encode whatever nontrivial thing you want to.


See also:

Arguably primarily a way to get away from that doing the same in XML.


Ontologies and knowledge representation

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.




RDF
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


More practically

RDF describes and communicates directed graph data, that frequently stores (subject, predicate, object) triplets.


Out of the box there aren't many suggested constructs, or constraints.

People often roll their own, and often largely for their own consumption.


There are multiple syntax notations to store and communicate RDF. Turtle seems pretty common(verify).



RDF/XML

https://en.wikipedia.org/wiki/RDF/XML

Notation3 (Notation 3, N3)

File extensions used: n3

(not to be confused with N-triples)


Looks like:

@prefix : <http://example.org/people#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:Alice foaf:knows :Bob .

http://en.wikipedia.org/wiki/Notation_3

Turtle

File extensions used: ttl


Turtle is a subset of N3, so Turtle is valid N3 (and the following example is actually exactly the same)

Looks like:

@prefix : <http://example.org/people#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:Alice foaf:knows :Bob .


N-Triples
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(not to be confused with N3)

N-triples is an even simpler simpler subset of N3 and turtle, arguably focused less on things like abbreviation and more on being a basic form that is easy to read, write, and parse.

Looks like:

<http://example.org/people#Alice> <http://xmlns.com/foaf/0.1/knows> <http://example.org/people#Bob> .

See also:

RDFa Lite

RDFa Lite intends to be simpler subset of RDFa that allows most things people want to do, and be easier to deal with

  • vocab
  • property
  • resource
  • typeof
  • prefix



https://www.w3.org/TR/rdfa-lite/#introduction

https://www.w3.org/TR/rdfa-lite/



TriX

TriX stores RDF triples in XML, looking something like:

<TriX>
  <graph>
    <triple>
      <uri>https://example.org/Mary</uri>
      <uri>https://example.org/age</uri>
      <typedLiteral datatype="https://www.w3.org/2001/XMLSchema#integer">32</typedLiteral>
    </triple>
  </graph>
</TriX>


See also:

Embedded RDF (eRDF)

Embedded RDF (eRDF) places RDF inside HTML.

It was apparently inspired by microformats.

It seems effectively obsolete, as people seem to prefer things like RDFa, Microdata, JSON-LD.


See also:


N-Quads
TriG
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Seems to be extension of turtle, that allows context of named graphs (like quads)?

Looks like:

@prefix : <http://example.org/people#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

:Neighbours {
  :Alice foaf:knows :Bob .
}


https://www.w3.org/TR/2014/REC-trig-20140225/#sec-trig-intro

RDFS

RDF Schema (RDFS) builds on top of RDF.


You could say that with RDF your predicates can only add detail to 'instances

RDFS gives you ways to create classes, properties, and hierarchies.


Whether that difference means anything to you depends on whether your tools adhere to RDF and RDFS specs when interpreting the data -- but that would be the point of using them.

https://en.wikipedia.org/wiki/RDF_Schema


RDFa
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

RDFa describes when you express RDF-like content within HTML attributes (hence the a)


The idea is that you could do a bunch of that inline in (X)HTML document, which may be more natural than shoving in triples into a data-like section.


For example, you might start with:

<h2>My page title</h2>

and make that:

<h2 property="http://purl.org/dc/terms/title">My page title</h2>

Note that what this encodes, in terms of triples, would be something like

<http://example.org/thepage/> <http://purl.org/dc/terms/title> "My page title" .


Being aimed at (X)HTML, half of its point is trying to mark up the document in a way that will map to triples about that document, which is also why there are a handful of attributes - they are trying to make a few distinct uses eaiser.

...but if you want you could also dump triples as-is.


http://en.wikipedia.org/wiki/RDFa

https://www.w3.org/TR/rdfa-core/


One real-world example I found was EUR-Lex encoding a bunch of metadata in their HTML pages, e.g. 31965L0001 contains 200+ lines like:

<meta about="http://data.europa.eu/eli/dir/1965/1/oj" typeof="eli:LegalResource"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:uri_schema" resource="http://data.europa.eu/eli/%7Btypedoc%7D/%7Byear%7D/%7Bnatural_number%7D/oj"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" content="31965L0001" lang="" property="eli:id_local"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:type_document" resource="http://publications.europa.eu/resource/authority/resource-type/DIR"/>
<meta about="http://data.europa.eu/eli/dir/1965/1/oj" property="eli:passed_by" resource="http://publications.europa.eu/resource/authority/corporate-body/CONSIL"/>

..which seems to come more from a "We had RDF triples and put them in here for you to parse" angle.



There is more than one specced way to do this, and there notably is XHTML+RDFa that bakes it into a doctype.

SKOS
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

SKOS is a data model that helps with controlled vocabulary, e.g. taxonomies, and also vocabularies, thesauri, subject headings.


Note that it is only the model for a knowledge representation language to use, and is not a knowledge representations language itself.

That is, it describes classes and properties, but does not assert that objects have them.


Its intent seems to be interoperability, to have a shared basis for modelling done in a semantic-web context, ...instead of each rolling their own (as you could with OWL properties and classes, and RDFS),


You could call it a linked open vocabulary - though note those vary wildly in how widely applicable they are https://lov.linkeddata.es/dataset/lov/


SKOS's model is itself defined as as an ontology that is expressed in OWL ontology, but that's a bit of a technicality.


https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System

https://www.w3.org/TR/2009/REC-skos-reference-20090818/

OWL

Web Ontology Language (OWL) is a family of languages for ontologies


https://en.wikipedia.org/wiki/Web_Ontology_Language

Isn't there real overlap between SKOS, OWL, and RDFS?
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Yes.

Welcome to a world of abstract meta-concept confusion, to each thing within that world having an interesting history (and to writing sentences and then wondering what they even mean).


For example, SKOS focuses on classes and properties, and interdependencies between them. This makes it

more than just a controlled vocabulary (because of the relations)
but less than a knowledge representation language, or useful for inference (e.g. because your modeling is mostly limited to hierarchy)


If you need those things, look towards other acronyms, including but certainly not limited to OWL and RDFS.


More potentially relevant standards

BS 8723

ISO 5964

ISO 2788 - Guidelines for the establishment and development of monolingual thesauri

ISO 25964 - international standard for thesauri and interoperability with other vocabularies[1]

apparently came from (and based on) BS 8723, ISO 2788?

Querying

SPARQL
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


An RDF-querying language

that wants to look vaguely like SQL,
and e.g. allows filtering of specific predicte types, specific subjects, etc.


Has roughly four query types:

  • SELECT - fetches values as they are stored, in table form
  • CONSTRUCT - extract and transform into valid RDF
  • ASK query - ask a yes/no question
  • DESCRIBE - doesn't fetch resources, but describes them in a way that the database maintainer rather than you decides (verify) (which might actually just be a fetch sometimes?(verify))


The wikipedia example for SELECT

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name 
       ?email
WHERE
  {
    ?person  a          foaf:Person .
    ?person  foaf:name  ?name .
    ?person  foaf:mbox  ?email .
  }

...though depending on how complex, the more you need to know details know the underlying data model (at least look up its constants).


For example, here is a query towards EUR-Lex (you can try it here)

PREFIX cdm: <http://publications.europa.eu/ontology/cdm#>
select distinct ?work ?type ?celex ?date ?force 
WHERE {
    ?work cdm:work_has_resource-type ?type. 
    FILTER(?type=<http://publications.europa.eu/resource/authority/resource-type/JUDG>)
    FILTER not exists{?work cdm:work_has_resource-type <http://publications.europa.eu/resource/authority/resource-type/CORRIGENDUM>
}
OPTIONAL { ?work cdm:resource_legal_id_celex ?celex. } 
OPTIONAL { ?work cdm:work_date_document      ?date.  } 
OPTIONAL { ?work cdm:resource_legal_in-force ?force. } 
FILTER not exists{?work cdm:do_not_index "true"^^<http://www.w3.org/2001/XMLSchema#boolean>}. }


Note:

  • all namespaces except the first are unused here
  • Amounts to
    • get all things of type JUDG,
    • except if marked do_not_index,
    • and add fields 'type', 'work', 'celex' 'date' and 'force' if they're there
    • OPTIONAL amounts to "add field if it's there, but don't require it"
stripping it of OPTIONAL { } means only solutions with values will be returned


https://en.wikipedia.org/wiki/SPARQL

More narrow-purpose

FOAF

See also:

OGP

Facebook's Open Graph protocol lets you describe your page, and controls e.g. how sites like how it appears when linked from Facebook, Twitter, and the like

So is arguably structured data. But arguably used largely as an SEO thing and/or just for a nicer preview of links.


https://ogp.me/

Unsorted

  • GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a markup language that to extract RDF from XML, such as XHTML, microformat data in XHTML, and such. [2] [3]



  • SKOS [Simple Knowledge Organization System] [5] [6]
  • SIOC [Semantically-Interlinked Online Communities] [7]
  • 'Common Vocabularies' (and vocabulary mapping) refer to settling things between systems so that they can make inferences more easily [8]
  • Semantic Annotation
  • Rules, specifically in the sens used e.g. in...
  • RuleML [9]
  • Rule Interchange Format (RIF) [10]
  • Semantic Web Rule Language (SWRL) (is OWL+RuleML)

And also various document/item metadata formats, in this context most often

  • Dublin Core
  • DOAP (Description Of A Project) [11]
  • Internet Content Description Language


Software

  • Ontology editors
    • Protégé
    • GATE
    • KAON
    • Hozo



http://en.wikipedia.org/wiki/Template:Semantic_Web