Identifiers, classifiers, and other codes

From Helpful
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Call numbers and more

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


A call number is basically the thing on the spine of the book.

which is a code assigned to a book to show its location in a library's shelving system
usually mostly the classification used in the library, because they are usually closely tied to the book/room classification system, and also used to reshelve returned books. Somewhat like and address within that particular library. (...or set of libraries, when libraries join, grow, have per-department libraries in universities, or such.)
call numbers often consist of
a categorization code (e.g. Dewey, LCC, or something more local), combined with
something that makes the number unique. Cutter numbers are fairly common.



The numbering/identifying in a library may be more complex than just call numbers. Consider that:

A call number usually refers to a location, rather than an exemplar.
A library may have multiple copies of something (e.g for popular things)
and it may be present in various locations (e.g. in multiple department libraries). As such
there may be multiple call numbers for a single title
there may be multiple exemplars for a single call number
  • A loaning system mostly cares to keep track of specific exemplars (which is more specific than a call number),
while a search system may also wish to try to reason at the level of work or expression, grouping different publications, releases, edits, and (arguably) translations of the same work.
It can for example be nice to see book hits grouped as "8 versions in 3 languages published between 1960 and 1976." or hits from the same journal grouped (although you need nontrivial and correct metadata to do better than just guess at this).


a specific item may be grouped in one of various ways (which may be reflected in the call number)
particularly if that group shouldn't be separated - consider parts of an encycopedia, all journals from, say, a year, and such
Such groups may also be considered items in a looser sense, particularly in catalogs / search systems.



Subject headings; classification

Classification and subject headers are both about grouping, finding, and similarity.


From a distance, subject headings seem the same thing as classification (and in an abstract sense are classifiers),

...but in the library world they are different, mostly through not-so-subtle differences in their goals, and the rules we follow because of those varying goals.


Many classification systems apply just one class (partly just make call numbers useful), while many subject headings can apply to an item so that there are multiple access points in a catalog for an item.

This was useful even in library-card systems, but arguably more so in digital systems (where it doesn't cause a problem through an explosive number of cards).

It also does not require strict exclusive semantic content of the subjects (which is generally quite preferable in classification systems).


Of course, digital approaches have changed and in particular fuzzied the difference -- and e.g. made it less important for people to understand how a (classical) catalog works.

Cutter numbers, Cutter Classification

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Cutter numbers/codes (part of the wider Cutter classification) are usually mentioned in the context of call numbers, and there usually mean "short alphanumeric (letter-number) codes used for anything other than classification"(verify).


It is most often used to code author names (in a soundex-like way(verify)), and can also be used for subjects, places, titles, indication of things like translations, and more.

They are often simple transforms, and while everyday use by reshlvers may be intuitively understood, creating them might require lookup tables, as classification rules often require.

For example, look at the details of LoC's use of cutter numbers. Say, LC call number QH316.5 .B56,

QH316.5 is an LCC - the classification for Biology - General
B56 is a Cutter number (apparently for the author)


Cutter Expansive Classification is a classification system that, at the time, was both simpler and more complete than various other systems.

It looks like a simpler, more general form of LCC, which is mostly because LCC's design was inspired by cutter(verify).

(there are also translation tables that map between Cutter and Dewey, which is sometimes up to the tens sometimes up to specific Dewey codes, e.g. 730 is WC, 812 is YD)


See also:

And perhaps:

Library of Congress something somethings

The most interesting ones are:

  • LCCN, Library of Congress Control Number
  • Library of Congress call numbers (no abbreviation? Though I've seen LC call number)
  • LCC, Library of Congress Classification
  • LCSH, Library of Congress Subject Headings


LCCN

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

LCCN, Library of Congress Control Number, is used to identify:

  • bibliographic records
  • authority records (authority records are those that show the established or preferred form of names, titles, regions, terms, subjects, etc.)
  • classification records ((verify))



The identifier mainly consists of a prefix (if applicable), year digits, and a serial number assigned within that year.

The format changed in 2001.


Up to 2000 (now known as Structure A) the codes could include:

  • Alphabetic Prefix
    • up to three characters (in records: often three reserved spaces, left-justified within blanks, all blanks if no prefix. Blanks can mean spaces and #)
  • Year (two-character. If you want to extract the real year from these digits, see e.g. [1])
  • Serial Number (six-digit)
  • Supplement Number - a single character. Was never used.
  • Suffix / Alphabetic Identifier (optional, variable length, and apparently now deleted)
  • Revision Date (optional, variable length)


Since Jan 2001 (Structure B) codes are only:

  • Alphabetic Prefix (now two characters)
  • Year (now four-digit)
  • Serial Number (six-digit, as in A)



Records, screen formatting and canonicalization

Records may apply specific rules about encoding LCCNs, such as reserving fixed-length space for the prefix and filling any of those not used by prefix characters with spaces.


Screen-formatted LCCNs may

  • add a space between prefix and number leave such a space
  • may add a hyphen between year and serial
  • may abbreviate by stripping zeroes from the left of the serial number.

The following are all structurally valid enough LCCNs:

nb 71-005810/AC/r86
79310919//r86
94-14580/AC/r95
nb71-5810
85-2

The most canonical form seems to be prefix, 2/4-digit year, unabbreviated serial (no zeroes stripped), with no hyphens or spaces, and all suffix information stripped (doesn't affect uniqueness anyway - it's just extra information that is also fairly rarel used). For the above examples:

nb71005810
79310919
94014580
nb71005810
85000002

See also http://www.loc.gov/marc/lccn-namespace.html

This is probably the handiest form to pass around - this unabbreviated-number form is what various services expect (and dashing and abbreviating should not be used in records).

Not all LCCN-related services are clever enough to (re-)canonicalize input they get, so could fail to match things you don't format for them.



Notes on prefixes

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

The alphabetic prefix consisting of one, two, or three characters (though three-character only appears in pre-2001 prefixes/LCCNs) and indicate specific series of LCCNs and/or the type of record.


For example, the gm in gm71002450 ([2]) indicates a series of maps ('...Cataloged by LC, 1968-1972'), while 71002450 does not exist ([3]). LCCNs 68001993, sa68001993, j68001993, and he68001993 point to four different records (sa: southeast asian 1961-, j: japanese 1949-, he: hebrew 1964-).


Some prefixes indicate name or subject authorities, a few are classifications. Pre-2001, most prefixes are for items/bibliographic records. Post-2001, these same prefixes are probably not used anymore, so you now see prefixes mainly on non-bibliographic records(verify). A lack of prefix usually points to item/bibliographic entries (but also holdings, community records). (verify)


Prefixes include: (verify)

  • Authority record prefixes
    • n: name/subject authorities (Library of Congress)
    • no: name/subject authorities (OCLC)
    • nr: name/subject authorities (RLIN)
    • nb: name/subject authorities (British Library)
    • sh: subject headings (Library of Congress)
    • sp: proposed subject (moved to sh if/when approved)
    • sj: "Juvenile subject authority keyed by LC and distributed in the LC Annotated Children's Cataloging Program",
  • Classification record prefixes
    • cf: classification record (Library of Congress)
    • ct: table record (Library of Congress)
  • Holdings and Community Information Record Prefixes



It seems that searches can be finicky/fragile in the face of prefixes - even LoC's own. For example, in my tests:

  • A Z39.50 search for lccn="unk81005124" yields record unk81005124 ([4]) - works
  • A Z39.50 search for lccn="81005124" yields record 81005124 ([5]) - works
  • Attempt to find he68001993 ([6])
    • A Z39.50 search for lccn="he68001993" gave no hits
    • A search for lccn="68001993" returned three records, those identified by sa68001993 ([7]), j68001993 ([8]), and he68001993 ([9])
      • ...and while the prefix-less 68001993 exists ([10]), it doesn't appear in this search

While there is probably a technical explanation for this, it seems to mean you need logic and possibly two searches to find that you want.



See also

LC call numbers

Library of congress's call numbers (no abbreviation beyond "LC call number"(verify)) is a call number for use in the actual library.

It uses LoC classification (LCC), combined with some sort of item-narrowing addition, usually a cutter number, and sometimes a date, sometimes a copy number.

Because these are on the spines of books, they are often formatted vertically, e.g. [11]

BF
1078
.S5
1978
c.1

, and there may be no singular formatting rule about putting it on a single line, but spacing seems common, in this case BF1078 .S5 1978 c.1


For example, QE534.2.B64 (see also [12], where the example was taken from)

  • QE534.2 is the LCC category, apparently Earthquakes, Seismology - General Works - 1970 to Present. For details, see the notes on LCC below.
  • A next letter-number combination, if present, is a listed journals, or a cutter number (verify)




LCC (LoC Classification)

LCC, Library of Congress Classification, is a detailed and layered system that narrows ranged into more specific subjects.

More details for the QE534.2 example above:

  • Q is science
  • QB through QE are physical sciences
  • QE is Geology
  • the QE1-QE996.5 range is Geology
  • the QE500-QE639.5 range is Dynamic and Structural Geology
  • the QE521-QE545 range is Volcanoes and Earthquakes
  • the QE531-QE545 range is Earthquakes, Seismology
  • Apparently QE534.2 lies in a range for Earthquakes, Seismology - General Works - 1970 to Present

See also:


Note that there is an extension to basic LCC by NLM, mostly in QS, QT, QU, QV, QW, QX, QY, QZ, and WA through WZ.

LCSH

LCSH, Library of Congress Subject Headings is a large, fairly wide-coverage, and structured set of subject headings, meaning a controlled keyword set used on items in a catalogue, that is often also searchable.

It is in relatively wide use, and has influenced other controlled keyword sets.



See also:


Semi-sorted

ISBN

ISBNs are assigned to books, but note that paper and electronic forms of the same book may be assigned separate ISBNs.


Hyphenating

A 10-digit ISBN is made of:

  • The Group Identifier (correlated to countries)
  • The Publisher Identifier / Registrant (ranges registered per group identifier)
  • The Title Identifier (that which is leftover, minus the check digit)
  • The Check Digit.

It is however nontrivial to correctly hyphenate an ISBN; the group identier, publisher identifier, and title identifier all vary in length depending on preceding items, and based on knowledge external to the number, not just calculation on it.

See e.g. the references below.


A 13-digit ISBN is part of EAN. Currently, they are ISBN10s placed in a single pseudo-country (978, 'Bookland'). By implication, the check digit is calculated in a different way.


EANs are not hyphenated(verify), but ISBN13s typically are, by adding a hyphen after the three-digit country code, and following the ISBN10 rules for the rest.

..at least in theory; I've seen books hyphenating the two differently.



ISBN13

As mentioned above, thirteen-digit ISBNs are EAN codes.

The EAN form has been used for a while in barcodes.

Since 2007, ISBN have been required to be in ISBN13 form.

An EAN containing an ISBN is occasionally referred to as an ISBN13, while ten-digit ISBNs are occasionally retro-named ISBN10s for contrast.


The current EAN ISBNs are said to be in 'Bookland' (978), as EANs must contain a country reference (GS1 prefix). This means that all current EAN/ISBN13s are simply the ISBN10 with 978 prepended and with its check digit (which is also in the last position in EANs) recalculated using the EAN method instead of the ISBN method.

This also implies that ISBN13s starting with 978 also have an equivalent ISBN10 form that you can easily convert between - add/chop off the three bookland digits, recalculating the check digit in the proper way.

The move to EAN is also an expansion: The 979 block has already been allocated for future use, and others may follow (though 979 will probably be enough for a while).


See also

Introduction:

References:

Data for hyphenation:

Further details and links:



There are a few potentially interesting sources to look up ISBNs for shopping, reviews, information and more:

Reviews:

Shop search:

Information and 2.0ness:

Others suggestions:


  • Ottobib: format citation as MLA, APA format, etc.




ISSN

This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)


Internationally standardized with ISO 3297.


Used for periodicals, whether print or electronic, and note that any periodical may exist in either or both forms.

If both exist, they often have separate ISSNs for the version in print and the electronic version, and 'eISSN' (and sometimes 'ESSN') is sometimes used to refer to the electronic version.

In these cases, the pair of ISSNs should perhaps be seen as equivalent in many practical contexts.


Abbreviations

Abbreviations of serial/journal names vary a bit in practive.

That is to say, different institutions abbreviate different ways, mainly in terms of specific abbreviations of words/phrases, and when being picky, also in the usage of punctuation. Different methods/sources of abbreviations are trusted to different degrees in different areas.

The standard way of abbreviation seems to be defined in ISO 4

See also:


Unsorted

Note that this mixes general, wider use, union system, and fairly specific stuff. And is a list that can never be complete. I just wanted to give myself an start at an idea of the kind of identifiers out there.

TODO: start separating these into their own sections

Name, abbreviation Identifies/enumerates what? Local or generic? Further notes Interesting links
ARK (Archival Resource Key) 'Objects' Persistent identifiers, URN-like in setup [13] [14]


arXiv identifier Articles(verify) mostly system-local [15] (narrower use)
Astrophysics Bibcode Articles mostly system-local Used in the Astrophysics Data System and some others. [16]
BICI, Book Item and Contribution Identifier Monographs (In development) apparently an ISBN-compatible, SICI-style system for books [17]
Bliss Classification classification/subjects ? [18]
Canadiana number (narrower use) Records/items system-local Used by the National Library of Canada
CODEN Serials Six-character alphanumeric code. Not used that much. [19]
CNRI Handle 'Objects' DOI is one system based on Handle, there are others. [20]
DAI, Digital Author Identification identities Also known as Digitale Auteur Identifier, and as NTA number (Nationaal Thesaurus voor Auteurs){{verify{{
Dewey classification/subjects fairly wide-spread Known as Dewey Decimal System, Dewey Decimal Classification (DDC), and more. A three-number, tiered classification system [21]
Document Object Identifier, DOI 'Objects', often articles theoretically general-purpose Persistent object identifier system, usable (resolvable) on the internet.

In the context of libraries, DOIs usually refer specifically to articles.

DOI
EAN products/objects widespread In the library context this reference to EANs usually refers to bookland (978) EANs, which encapsulates ISBNs [22]
ERIC number (narrower use) Records/items Narrow; to/in system
ISAN 'audiovisual works' narrow? 'International Standard Audiovisual Number' [23]


ISBN Monographs Note that thirteen-digit ISBNs (required of publishers since January 2007) are actually part of EAN-13 barcodes[24], which absorbed ISBNs into bookland (978 prefix). (Currently, there is only one prefix, so there is currently a 1:1 mapping between ISBN10 and ISBN13) [25]
ISM (ISM Library Information Services, formerly Utlas)
ISMN (International Standard Music Number) Printed music narrow? [26]
ISNI, International Standard Name Identifier Identities Under development [27]
ISRC (International Standard Recording Code) Sound recordings,
music video recordings
[28]


ISSN (International Standard Serial Number) serials Note: an ISSN for which it is known that it refers to and electronic form (particularly amongst alternatives) is sometimes called an eISSN (sometimes ESSN) [29]


ISWC (International Standard Musical Work Code) 'musical works' narrow Works, not recordings. Mostly used by copyright collectives? [30]


JACS (Joint Academic Classification of Subjects) academic [31] [32]


LCC: Library of Congress Classification classification/subjects Various other libraries use it (see also below) [33]
LCCN: Library of Congress Control Number bibliographic (most), also (name?) authority records, classification(?) in LoC, but also beyond (see also below) interesting as wider identifier, mostly simply because LCCNs are a large set [34]
LCSH (Library of Congress Subject Headings) subjects (see also below) [35]
MeSH: Medical Subject Headings classification/subjects A detailed chemical/medical subject system, from NLM [36]
NAL number (National Agricultural Library) Mostly local
'NBC BCL, BasisCLassificatie classification Dutch university libraries, KB.nl mostly used in [37]
NBN (National Bibliography Number) Records/items a few systems (mostly in Scandinavia?) (narrower use) an URN-like item identifier [38] [39]
NLM: 'National Library of Medicine' mostly local Can also refer to the NLM classification system, which is an extension to LCC. [40]
OAI identifiers mostly local are URNs that unambiguously identify a resource within a repository, and typically contain/imply the maintaining/source organization for a record (so can be used in a globally unique way), roughly in the form of oai:domainname:itslocalidentifier [41]
OCLC number' (OCLC control number) records/items In worldcat, and beyond An item identification code used in OCLC WorldCat to refer to books, articles, journals, CDs, video, computer files, and more. A pretty large set (union of many libraries), so useful as a switchboard sort of identifer (like LCCN) [42]
PMID, Pubmed ID Articles System-local A record identifer used in Pubmed and some derived sources [43]. Works as a PURL-type service through http://www.ncbi.nlm.nih.gov/pubmed/pmid [44]
PPN (Pica Production Number) Articles, books, identities, subjects (more?) System-local, a few union catalogs (narrower use) An identifier for (OCLC) Pica catalogues, used to refer to books as well as serials, people, and subjects (more?). In some areas it has been adopted to be unique throughout specific union catalogs (e.g. in the Netherlands, Germany), but not between them (as such union catalogues assign numbers separately of each other) [45]
SICI Serials ? (NISO Z39.56): Serial identifier that is variable-length and contains information whether the serial is electronic/paper/microformat, and allows identification of derivatives. [46]
SISO classification ? Dutch equivalent for DDC (Dewey) (derivative or equivalent?(verify)) [47]
ResearcherId Identities [48]
SuDocs, SUperintendent of DOCumentS classification system classification/subjects Tries to group government documents by authors/organization/agency/department. [49] [50] [51]
UDC (Universal Decimal Classification) classification/subjects a system that uses symbols to combine and relate concepts from Dewey. [52].