(Library) search software

From Helpful
Jump to: navigation, search

Software soing, supporting, or related to searching.


For more articles related to library systems, see the Library related category. Some of the main articles:

Search engines, catalogues & ILS, supporting libraries

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Some do search, some search federation, some are 'integrated library systems' meaning they do loaning and such,

  • BiblioteQ [1] - an ILS
  • Blacklight OPAC[2] (open source) - catalogue, built around Solr
  • DRIVER, DRIVER II [5]
  • Emilda[6] - an ILS
  • Evergreen[7] (open source) - an ILS
  • Ex Libris[8] offers (among others)
    • SFX[9]: An OpenURL resolver
    • Metalib[10]: a federated search aggregator
    • Aleph [11]
    • Voyager [12] - an ILS
    • Primo [13]
      • Primo Central [14]
  • Fedora Commons[16] (unrelated to the linux distro), Java-based
  • Fedora Learning Objects Repository Interface (Flori) [17]
  • Hyper Estraier[18] - full-text search (written in C)
  • JAFER[19] - base for Z39.50 clients and servers (written in Java)
  • Java Z3950 Tookit[20] (Z39.50, Java)
  • Koha[21] (open source) - an ILS
  • LibLime[22] supports/delivers Koha, Evergreen
  • LibraryFind[23] (open source) - federated search(?)
  • mnoGoSearch[24] (GPL)
  • MasterKey[25] (open source) - search aggregating system. Can search Z39.50, SRU, PazPar2, a local Zebra index, and via HTML scraping. Harvests and presents it.
  • Meresco[26] is mostly a SRU(/SRW) interface around a OAI-PHM fetcher (also has a web crawler, OAI export of its data, RSS import/export, etc., but they seem of secondary importance)
  • Metaproxy [27] - Frontend/switchboard that makes it easier to search Z39.50, SRU, SRW, and Solr (via webservice). Does result merging, filtering, caching. Exposed as Z39.50, SRU, and SRW.
  • Net::Z3950::ZOOM[28] (Perl)
  • OAI toolkit[29] (Perl)
  • OCLC-Pica, an apparently now deprecated OPAC
  • OpenER[31] (EduCommons RSS → OAI)
  • OpenSiteSearch[32]
  • Pazpar2[33], a Z39.50-based federated search server (YAZ-based, but no SRU/SRW)
  • PhpMyBibli (PMB)[34]
  • Proai[37] (Java-based)
  • PyZ3950[40] - Z39.50
  • QtZ39.50[42] (uses PyZ3950)
  • VB ZOOM[47] (Z39.50)
  • WebFeat
  • Xapian[49], a C++ search engine. (Has wrappers, like xappy[50])
  • YAZ[51]: C(/C++) toolkit for Z39.50, and also SRU and SRW
  • Zebra[53], an indexer/search/retrieve server (OAI, with a Z39.50 interface)
  • Zedkit
  • ZMARCO[54] (Z39.50, MARC, OAI)
  • Zoom .NET[55] (Z39.50)


See also:



Unsorted

E-learning:

  • Sites like Moodle, Teletop, StudieWeb, Sharepoint
  • May use some specific formats (e.g. the Dutch CZP)


Interoperability policies often include

  • use of open standards
  • harvesting+indexing, not federated search
  • no supplier-specific / software-specific (proprietary) features


Notes on (non-)centralization of search

On effective limitations of remote search

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Whenever data you have a data source for which any part can be described as being remote, there are limitations involved.

The most obvious case is that where you just consume the result of searches that happen remotely, for example in web search and aggregated / federated search, but you can see more distinctions if you wish:

  • access to the the search system and its backing data (technical and legal access)
  • access to remote index (allowing custom search, but no control of the indexing process)
  • remote search, with some controllable features
  • remote search, with basic retrieval and some analysis on that
  • remote search, just presentation of results
  • no real means of result presentation, but the ability to link to a search page (sometimes used in federated search when databases are deemed important but have no search interface)

(These distinctions are fairly fuzzy, as the qualities lie in something of a continuum)


In many of these cases, you mostly just send a query to a system that does most of the work for you, including making many of the decisions of what parts of a record to search, how to do so (in terms of synsets, fuzziness, and such), scoring, and sometimes up to and including search presentation.

As such, things happen without very much control control, ability to verify, and without knowledge what is happening in the process. You have to trust that these searches will

  • return relevant results
  • have useful relevance rating
  • don't leave out relevant hits because of things like spelling variations
  • don't return too many hits because they search the entire book


Details like these are hard enough to manage in a single system, although you can reach an acceptable level of quality there.

Most of these problems are much stronger when searched systems have no relation to each other, such as when aggregating searches, because you usually have no choice but to assume that none of those problems will happen, and/or that behaviour won't be significantly different between sources (if they're regular enough, the results may be convincing enough).

Even when systems try to allow for things, such controls are far from uniform in presence, nature, or means of control. All the interfacing between systems means it may be hard enough to express/convey what you want well enough in the query.


It is quite relevant that scoring and relevancy aren't well defined between systems, because as a result, neither is merging results while maintaining relevance ordering. Systems that merge aggregated results (often with the theory that it's better than nothing) place blind trust in relevance rating (and scale). Unless the scoring methods are compatible (very hard to compare, harder to ensure), merging tends to work out little better than a riffle (in fact, interleaving results may be the best defined method to cover the most cases, but it's hard to really argue for any strategy).

More on the problems in merging

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Union systems and data warehousing

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

To do better you would want a union system, because you would get control over various aspects of behaviour and uniformity you may want to guarantee.

There have been movements to warehouse data to groups , with licensing pretty much as it was as when the same data was federated, meaning it is mediates licenses to members of a group (sometimes licensing to the whole group).

From a search process view, this means you have more data sources in raw form (sometimes, it seems, in indexed form for a specific search system; I can imagine the wish for businesses to tie customers into their products). It does not guarantee you will search it significantly better than the data providers will, but chances are merging and sorting is better if only because they're handled automatically.