OpenURL notes

From Helpful
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Introduction

In practice, the term OpenURL can refer to a few different things:

  • The NISO Z39.88-2004 standard, also known as OpenURL 1.0 [1] (or the earlier draft, 0.1)
  • A ContextObject (containing information much like a citation)
  • an OpenURL resolver system (a setup, or the software)
  • a URL that transports/contains a ContextObject, and/or looks it up for a specific institution (A base URL plus a request)


In the widest sense, OpenURL eases communication and mediation of resolvable descriptions.

The current most common application for OpenURL is probably that of article and book lookups in libraries and academic (federated) search. In fact, the latter is probably the largest current use because OpenURL's history lies largely with SFX (which has been owned by Ex Libris for a while now), and as such the SFX implementation acts as something of a de facto standard beside the real standard.


OpenURL is useful because it usually implies a resolver - but also ensures that the specific resolver stays separate from the transported metadata.

It can be particularly useful around federated search because records are themselves only metadata, and the services that libraries may present are sensitive to the institution you are accessing from, and can be made both pluggable (the resolver can e.g. ask the local OPAC and repositories).


Users will often see OpenURL as a way they can directly get full-text, though this currently (~2009) still depends on how much an article provider makes or breaks its user's ability to (deep)link to articles.


Most current OpenURL resolvers are actually fairly simple, but in theory they may have fuzzy smartness specific to their area of expertise. For example, they may be able to correct and augment the metadata you give it, such as filling in a ISSN based on a journal name, getting well-formatted details via a DOI lookup, etc.



Other uses

OpenURL's model technically supports a wider set of use cases than the book/article use it currently sees, since it doesn't constrain the type of resource that can be referenced. Some real-world cases could benefit from OpenURL, while other in other ways and cases it is too complex (and sometimes restrictive) to be the best option.

(My personal impression is that it looks standard that was engineered for much more than it was originally indended for, but not used for such, so that it appears overly complex, and it doesn't help that not all of its parts are documented well enogh to meet its model's complexity))


Rough Contents of an OpenURL

An OpenURL mostly contains an ContextObject (abbreviated as CTX, CO, or CTXO), which itself contains a number of Entity objects, which are separate but related descriptions.

Entities themself can be thought of to have mild hierarchies, given the way descriptors work (see below), and (may) contain metadata.


The entities that can appear in a ContextObject:

  • Referent - what resource are we referencing / requesting services about?
    • ...for example the citation to look up
    • the only required Entity in a ContextObject
    • prefix: rft
  • ServiceType - how do we want it - what sort of service are we requesting?
    • allows you to specifically ask for 'full text', an abstract, etc. Not always required, as you can often also figure out the response on the client (requesting) side
    • prefix: svc


  • ReferringEntity - where did we find this / what entity is referencing that Referent?
    • ...allows you to mention the article that cited this one.
    • prefix: rfe


  • Referrer - what/who created this ContextObject?
    • The site or service/application that generated the request
    • prefix: rfr
  • Requester - who is requesting this?
    • ...allows identification of the end user who is placing the request
    • prefix: req


  • Resolver
    • which resolver is this directed at
    • in some ways just the base URL to prepend.
    • ...often hardcoded for users
    • prefix: res


The mentions of 'prefix' above refers to the string that is prepended to various variables related to the entity. For exampe, 'rft' is used as a prefix for referent-related things (e.g. rft_val_fmt) and a base for metadata items (e.g. rft.issn=something)


Serialization

A ContextObject is serialized in one of two ways:

  • KEV (OpenURL standard, Part 2)
    • simpler if all you need is a citation lookup (Referent entity)
    • Puts (short) Keys and (URL-)Encoded Values in an URL
    • See e.g. [2] or [3]
  • XML of the ContextObject (OpenURL standard, Part 3)
    • more verbose/complex, and more powerful in a few cases

Both have to be URL-encoded when handed to a (HTTP) resolver.


Additionally, there are Community Profiles[4], most interestingly:

  • SAP1 (San Antonio Level 1) is built on KEV [5]
  • SAP2 (San Antonio Level 2) is built on XML [6]
  • DCCP (Dublin Core Community Profile) [7]


Also related is CoINS [8], a (KEV-based) way of embedding a ContextObject in a web page, for other utilities to extract.

Informal explanation

Formats and Metadata/Identifiers

In OpenURL talk, Entities are realized using Descriptors, which are one of:

  • Metadata (inline) (verify)
  • Metadata (Reference to a ContextObject served elsewhere)
  • Identifiers (inline)
  • Private data (inline)

Things may be described by multiple desciptors, though there is the (pragmatic) restriction that they should refer to the same thing. This comes down to the ability to use/add (multiple) identifiers (consider articles may have a PMID and DOI).


OpenURL does not itself restrict the format of the transported metadata. OpenURL itself expects you to mention what sort of identifier / metadata you are handing along/referring to, and expects implementations to understand anything they wish to support. The OpenURL Registry mostly acts as a standardizing mediator for existing and new formats.

The currently registered metadata formats are mostly scholarly ones, and defined so that you can use them in KEV and in XML form:

info:ofi/fmt:kev:mtx:book
info:ofi/fmt:xml:xsd:book

info:ofi/fmt:kev:mtx:journal
info:ofi/fmt:xml:xsd:journal		

info:ofi/fmt:kev:mtx:dissertation
info:ofi/fmt:xml:xsd:dissertation	

info:ofi/fmt:kev:mtx:patent
info:ofi/fmt:xml:xsd:patent

Other formats include:

info:ofi/fmt:kev:mtx:dc		Dublin Core
info:ofi/fmt:xml:xsd:oai_dc		(OAI Unqualified Dublin Core version 2.1)

info:ofi/fmt:xml:xsd:MARC21		MARCXML

And in one case more or less to hand along options:

info:ofi/fmt:kev:mtx:sch_svc		Scholarly ServiceTypes

(There are more formats, see the registry)




The value/reference difference makes it easier to do things like creating and serving ContextObjects in a centralized way, although this is probably not currently of that much use.


In an OpenURL, the difference is specified via the use of a *_val_fmt versus a *_ref_fmt (umplying a _ref is present).

For a referent you would use rft_val_fmt when you include the metadata in the same OpenURL, or rft_ref_fmt and also add rft_ref, containing an URL where the ContextObject should be fetched from.

The value for the _fmt argument is the same, describing the metadata format of the ContextObject ((info:ofi/fmt:something).

Book and Article referents

Since these lookups are currently is a common appliations.

A referent has a genre, which is optional.

Genres include:

  • article
  • proceeding
  • bookitem
  • book
  • conference
  • journal
  • issue
  • preprint
  • unknown

Journal, book, and conference are considered bundles. Article, bookitem, proceeding, and preprint are considered items. ...but since there is no hard separation - item lookups will often (necessarily) supply bundle-related details as well - this is not usually very useful information.


You can generally guess which metadata details are useful on a lookup for a specific genre, and resolvers should probably not trip over redundant values. However, there are specific ways of specifying authors depending on the metadata types. You can look these up in the registry, for example for info:ofi/fmt:kev:mtx:journal and info:ofi/fmt:kev:mtx:book.

Of course, if you used DC (e.g. info:ofi/fmt:kev:mtx:dc for KEV-based DC), the variables would be different (For more detail, see the Dublin Core Community Profile).



The following is meant as a quick overview of the parameters you can hand along in book and/or journal article lookups. Note that not all resolvers listen to all parameters, often because they do not identify content, or because of other implementation details.


Metadata details (mostly for referents and referring entities) include:

  • (First) Author:
    • aulast, aufirst (full first name), auinit (first, middle initials), auinit1 (first initial), auinitm (middle initials), au, ausuffix
    • aucorp
  • (Title:)
    • atitle: item title (article, part of a book, preprint, conference, proceeding)
    • title: bundle title (journal, book, conference)
    • stitle: (abbreviated) bundle title
    • btitle: book title (in info:ofi/fmt:kev:mtx:book)
  • date (publication date, iso format: YYYY-MM-DD, YYYY-MM, or YYYY)

Bundle identifiers: (see also Library notes for non-librarians)

  • issn
  • eissn
  • sici
  • coden

Item identifiers:

  • isbn
  • bici

Bundle details:

  • volume
  • issue
  • spage, epage
  • pages page range ('spage-epage')
  • tpages, total pages. (in info:ofi/fmt:kev:mtx:book)

And rarer bundle details:

  • part (bundle part)
  • place (in info:ofi/fmt:kev:mtx:book)
  • pub (in info:ofi/fmt:kev:mtx:book)
  • artnum (item number, a fall back when there is no page range)
  • ssn (season: winter, spring, summer, or fall)
  • quarter (1, 2, 3, or 4)
  • chron (non-normalized parts of the chronology, e.g. "1st quarter" (use data, ssn and quarter instead where possible)

On namespaces, identifier zones, and such

URI, ORI and XRI are prepended to OpenURL namespace references; they are effectively meta-namespaces:

URIs encompass official IANA namespaces (URI Schemes, URN Namespaces).

ORIs encompass namespaces from the OpenURL Registry (which is extensible in that you can have things registered there)

XRIs are for application-local and non-standardized namespaces/identifiers used for internal communication. (For example Amazon's ASIN)


info: is an URI scheme is used within OpenURL (and other library and publishing areas). It seems to mostly imitate URN.

info:ofi/ is how registry identifiers start. These can be looked up in the OpenURL Registry for more information.

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Example

You can specify:

  • that you want to look up Pubmed Identifier 7654321 (Referent)
rft_id       = info:pmid/7654321
  • ...which you found while reading an article in the journal with ISSN 1234-5678, specifically volume 8, issue 3, issued in 1999 (ReferringEntity)
rfe_val_fmt  = info:ofi/fmt:opl:bnf:journal
rfe.issn     = 1234-5678
rfe.date     = 1999
rfe.volume   = 8
rfe.issue    = 3
  • Who you are (The identity of the end user (Requester))
req.id       = uri:mailto:bob.doe@example.com

(There is no such standard as these lists; I am using them for readability)

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

See also

Further OpenURL details and supporting information:

SFX and Metalib

Metalib

Metalib can cooperate with SFX (I'm not sure exactly how, I've not looked at Metalib itself much) to have it use citation information for records from a search. It uses this for the full-text popup (linked to by Metalib, actually generated by SFX) that users of the vanilla Metalib/SFX combination will be familiar with.

We wanted to interface with SFX directly instead of relying on its pop-up, which requires extracting a citation independently of these packages. While the X-Server can expose SFX' results via an undocumented hack (at least in Metalib 3), this seemed too fragile and version-specific.


Metalib can create fields for the MARC, and you can configure it to extract details and place them in fields perhaps named YR, PAG, VOL, ISS, ASU, and others. Parsing these has to be configured inside Metalib and can seem like a lot of complex, ungrateful work as it has to be done per source. While useful, Metalib's manual extraction is not always powerful enough. It can not use any conditional logic, and some things are hard to express exactly in regular expressions without very elaborate constructions.

PurpleSearch has its own extraction based on regexps and a good deal of added logic, taking its information mostly form the 773 field, that it augments if Metalib's values are missing or seem wrong (e.g. if a page range used the numbers around the ISSN's dash). Ideally, this should work as a good catch-all for sources that do not use 773's subfields correctly and have not been polished up via extensive Metalib configuration. In fact, the code allows us to create an OpenURL based entirely on the record itself, and while it can be assisted by SFX and Metalib (which is soimetimes a good idea and quite convenient), it generally does not require it.

SFX

SFX is an OpenURL resolver, now a product developed by Ex Libris.

Aside from XML and KEV, you can also send in data using very minimal sets of just a few URL parameters without any openurl definitions or rft.. A simple example would be:

http://sfx.ub.rug.nl:9003/sfx_local?isbn=9780000000002



SFX adds functionality beyond OpenURL. Perhaps most interestingly, it allows one to query its institution's subscriptions. This allows a fairly educated guess of whether a full-text lookup on a record is likely to yield results before asking SFX to do the actual (potentially slow) lookup.


Popup versus XML result data

Sending in only OpenURL data will result in a HTML page in a popup. Some tweaked Metalib interfaces give more user-friendly persentations, and more control over the values sent to SFX (a past version of PurpleSearch had a form that controlled all values sent), but most still give that same popup.


In this interface we interact with SFX via its XML interface and present the results in our own way. This should make these lookups a little less bothersome, since it allows SFX results to be embedded in the page (while generating it or via AJAX), and also makes other features easier to build, such as the ability for users to create tinyurls to the article full-text links SFX generates -- or to the lookup itself.


SFX takes a sfx.response_type variable, which takes values including:

  • html (the default when omitted), producing the HTML you see in the popup
  • simplexml, produces an XML response
  • multi_obj_xml, like simplexml, but allows multiple query/response values
  • (an openURL 0.1 compatibility reponse type)
  • service_exists: Does basic check whether services exist for the contextobject, but does not do any (potentially slow) calculation of the full serice list.


The XML data contains blocks with results for different services (full text, catalogue lookup, reference, citation file download for a citation app) that your application can choose to use or ignore as it wishes. All types of blocks can presuambly appear multiple times, except probably negative messages like getMessageNoFullTxt. These blocks look something like:

<target>
   <target_name>MESSAGE_NO_FULLTXT</target_name>
   <target_public_name>No full text available for this journal/article <i>online</i></target_public_name>
   <target_service_id>111026921949001</target_service_id>
   <service_type>getMessageNoFullTxt</service_type>
   <parser></parser>
   <parse_param></parse_param>
   <proxy>no</proxy>
   <crossref>no</crossref>
   <note></note>
   <authentication></authentication>
   <char_set></char_set>
   <displayer></displayer>
   <target_url></target_url>
</target>

The most interesting information in each is probably target_public_name, target_url, and service_type. Values for the last include:

  • getFullTxt: full text, or best guess
  • getMessageNoFullTxt: message to note no full text sources were found
  • getHolding: catalogue lookup, of journal (or book by ISBN)
  • getTOC: table of contents, sometimes more
  • getAbstract: abstract
  • getReference: formatted citation

On POST and GET

While SFX's documentation mentions the HTTP interface takes XML via POST requests (and the test forms that SFX itself serves uses POST), GET works as well (also allowing it to be used from the POST-handicapped PHP).

Note that when you hand in XML, POST may be a better idea, since while using GET you would have to put it URL-encoded XML into url_ctx_val.

KEV may generally be easier, since it avoids the need for POST, and is easier to encode.

SFX URLs

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

A 'This is SFX' page:

http://sfx.ub.rug.nl:9003/


The sfx_local directory is the main interface instance (see SFX User Guide at the documentation center).

The base URL for lookups:

/sfx_local?aquery

or

/sfx_local/?aquery


It is also the base path for various other things - images, CGI, and such. The most interesting thngs there are are probably the forms that are effectively the basic and more complex OpenURL lookup forms:

http://sfx.ub.rug.nl:9003/sfx_local/cgi/core/citation-linker.cgi
http://sfx.ub.rug.nl:9003/sfx_local/cgi/core/openurl-generator.cgi


You can also use the XML interface (see SFX documentation for details) to hook into the journal subscription test, at:

http://sfx.ub.rug.nl:9003/sfx_local/cgi/core/journal_subscription.cgi

Note that this service is relatively slow. If you rely on this information to encrich record content, you may want to cache and possibly pre-fetch it.