Knowledge representation / Semantic annotation / structured data / linked data on the web

From Helpful
Jump to: navigation, search


Generic, not related to library metadata

URN

CURIE

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


A Compact URI (CURIE) amounts to a way way to abbreviate a URL.

It's seen more around XML and XHTML because they're conceptualized from XML namespaces, but could be used elsewhere fairly easily. However, they are only parsed by CURIE-aware things, which isn't many(verify).

Consider:

<html xmlns:wikipedia="http://en.wikipedia.org/wiki/">
   <head></head>
   <body>
       <p>Find out more about <a href="[wikipedia:Biome]">biomes</a>.</p>
   </body>
</html>
</code>


It looks like, but is not like an URL <base>

bases are explicit, and has some more path-related rules

It looks like, but is not a URN

e.g.
[isbn:0393315703]
looks a lot like an URN with scoping, just with some brackets
a CURIE expands into URL/URI, while URN is a and stays a string that just happens to be an identifier


See also:

Microformats

Microformats add semantic markers to HTML.

Emerged at the time of HTML4 and chose to reuse existing attributes (mostly class, rel(verify)) for validation reasons.


For example, showing personal information in HTML and also telling a potential microformat parser what's what according to hCard:

<ul class="vcard">
  <li class="fn">Joe Doe</li>
  <li class="org">The Example Company</li>
  <li class="tel">604-555-1234</li>
  <li><a class="url" href="http://example.com/">http://example.com/</a></li>
</ul>


See also:

Microdata

Microdata marks up HTML with the data it is also displaying, e.g.

<div itemscope>
 <p>My name is <span itemprop="name">Neil</span>.</p>
 <p>My band is called <span itemprop="band">Four Parts Water</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>
</div>


...yes, it serves similar goals to microformats.

Microdata can be seen as a slightly more expressive successor, in that...

Microformats reuse existing parts of HTML4 (mostly class and rel), whereas microdata extends HTML5 with specific custom attributes (e.g. itemscope, itemtype, itemprop).
Also, microformats have sort of a fixed existing set, while microdata points at 'use any schema.org thing', allowing community extension.


(side note: that places that sanitize user-sourced HTML are more likely to remove microdata than microformats, due to unknown attributes)

See also:

JSON-LD

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Seems to

be developed for and targeted primarily at web crawlers.
mean "structured data according to schema.org, stored in a block of JSON",

It's a machine parseable form of the page's main content, assuming that main content is relatively bite-sized.


Consider the recipe example from here :

<script type="application/ld+json">
   {
     "@context": "https://schema.org/",
     "@type": "Recipe",
     "name": "Party Coffee Cake",
     "author": {
       "@type": "Person",
       "name": "Mary Stone"
     },
     "datePublished": "2018-03-10",
     "description": "This coffee cake is awesome and perfect for parties.",
     "prepTime": "PT20M"
   }
</script>


Yes, this omits the actually useful sections like "ingredient", "yield", "instructions"

Maybe that was just for brevity, maybe that's because all that google would really get out of it.

So yeah, this sometimes feels more SEO-adjacent (consider statements like "All annotated information must be on the page; adding information that is not on the page will likely not show in search results and is against Google guidelines") than about strucured or linked data.

Yet due to the varied existing schemas you can code a lot of nontrivial things you can then use.


See also:


Ontologies and knowledge representation

RDF
RDFS
Trix
Embedded RDF

See also:

RDFa, RDFa Lite
Notation3 (Notation 3, N3), Turtle, N-Triples

See also:

SKOS

https://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System

https://www.w3.org/TR/2009/REC-skos-reference-20090818/

Isn't there real overlap between SKOS, OWL, and RDFS?

https://en.wikipedia.org/wiki/Web_Ontology_Language


BS 8723
ISO 2788

More specific-purpose

FOAF

See also:

OGP

Facebook's Open Graph protocol lets you describe your page, and controls e.g. how sites like how it appears when linked from Facebook, Twitter, and the like, so is arguably structured data, but arguably used largely as an SEO thing and/or just for a nicer preview of links.


https://ogp.me/

More library geared

FRBR

FRBR ('Functional Requirements for Bibliographic Records'): refers to a model of semantic interrelations between works, realizations, items, responsible entities, and others, and focused on for bibliographic information.


It makes it easier to do automate meaningful operations around bibliographic information.

However, there is some question whether it is fit as a reference model for bibliographic information, since some types of information found in libraries does does not fit the model well, may lead to different people using different models, which may even break the meaningful-automation nature somewhat.

There are also limits to applying it to existing information, as various relations are hard to extract exactly and authoritatively. Practice shows that automatically dealing with even just sets of works can be quite a bit of work to do accurately.

One has to wonder whether it is just something hip to play with, though it clearly has merits, and has usefully driven people to think about normalized forms of data, collocation-based browsing of materials, and more.


People may use the term loosely, as

as a reference to services built on this (e.g. 'we have a a variant of that book you may be interested in')
or the idea or specific datasets (e.g. LibraryThing's) of mappings between a particular book as a work, and various realizations (e.g. revisions)
for some other types of semantic cross-relations.



Its design separates things into three groups: works and such, people, and relationships for those two (subject/event/place).


Group 1 means to refer to the products of intellectual or artistic endeavour:

  • A Work, a distinct intellectual or artistic creation.
  • An Expression, refers to unique intellectual/artistic form present in a realization of a work.
  • A Manifestation is the physical embodiment of an expression.
  • An Item is a single concrete manifestation. (an exemplar of a manifestation)

An example using books:

  • a book someone writes is a work
  • ...with at least one and usually just the one expression (the original text)
  • Different releases/publications are different manifestations. Consider paperback and hardcover variations, and those from different countries. In this case, a manifestation correlates strongly with 'something that gets an unique ISBN'
  • each physical book is an item

Another example:

  • 'any edition of Alice in Wonderland' is a work
  • 'The Annotated Alice' is an expression


Possible problem cases include translations, annotated editions and such. These can be considered a different expression of the same work, or a different work, which you can argue depends on the specific case - translations that involve/require involved creativity may well be considered a new work.

These sort of cases can break FRBR's potential simplicity (which is not altogether surprising, as it is a fairly simple model of a fuzzy domain).


Group 2 deals with the custodianship of Group 1 entities, mainly:

  • Persons
  • Corporate bodies


Group 3 deals with subject/event/place relations for Group 1 and 2

  • Concepts
  • Objects
  • Events
  • Places


Further notes:

  • Various bibliographical identifiers implicitly do some some FRBRization. For example, XISBN can be seen as a manifestation identifier.

COinS

COinS (ContextObjects in Spans) describes how to embed OpenURL-style citation data in HTML.

It's similar to microformats.


It allows browsers plugins to do things including:

  • link you to full-text via your own institution's OpenURL resolver (e.g. using OpenURL Referrer)


See also:

Persistent and open identifiers

https://en.wikipedia.org/wiki/Persistent_identifier


DOI

DOI is meant to identify pieces of intellectual property, mostly journal articles, research reports, data sets, and official publications.


In a DOI code, for example 10.1000/abc...

  • the 10 designates a DOI
DOI is technically just one implementation of the CNRI Handle system
  • The rest of the code before the slash is the DOI registrant prefix, 1000 in the example above.
You can buy such a prefix from a Registration Agency (or possibly an experimental one from the International DOI Foundation(verify).
  • ...followed by by item identifiers assigned by that registrant. Can be anything.
The combination will always be unique, as the the registrant prefix acts as a namespace)


The combination is like an URN in that it identifies but does not locate. You need a DOI resolver to look it up to what it refers to, which gets you to the actual content it points to (usually a landing page first).

DOI syntax is defined in ANSI/NISO Z39.84 [1].


See also:

ORCID

ORCID is a persistent identifier for individual researchers, that considers that identities are more unique than names, and that e.g. names themselves can change.


See also:



Older / unsorted

Some theory

Digital - Ontologies, semantic web stuff

Note that some of this is glorified metadata, so see also (Library) metadata notes


Query

SPARQL

An RDF-querying language with a design rooted somewhat in SQL


https://en.wikipedia.org/wiki/SPARQL

Unsorted

  • GRDDL (Gleaning Resource Descriptions from Dialects of Languages) is a markup language that to extract RDF from XML, such as XHTML, microformat data in XHTML, and such. [2] [3]



  • SKOS [Simple Knowledge Organization System] [5] [6]
  • SIOC [Semantically-Interlinked Online Communities] [7]
  • 'Common Vocabularies' (and vocabulary mapping) refer to settling things between systems so that they can make inferences more easily [8]
  • Semantic Annotation
  • Rules, specifically in the sens used e.g. in...
  • RuleML [9]
  • Rule Interchange Format (RIF) [10]
  • Semantic Web Rule Language (SWRL) (is OWL+RuleML)

And also various document/item metadata formats, in this context most often

  • Dublin Core
  • DOAP (Description Of A Project) [11]
  • Internet Content Description Language


Software

  • Ontology editors
    • Protégé
    • GATE
    • KAON
    • Hozo



http://en.wikipedia.org/wiki/Template:Semantic_Web

See also