Usage of the OAI interface
OAI services overview
GEI-Digital enables data to be requested using the OAI-PMH protocol via a web interface.
The aim of OAI (Open Archives Initiative) is to define an open interface for the exchange of metadata. Communication over such an interface takes place between GEI-Digital as the data provider and a service provider that retrieves the data. Data acquisition takes place automatically through an ‘OAI harvester’.
The protocol used for the communication is known as OAI-PMH (OAI-protocol for metadata harvesting). OAI enables continuous synchronisation of large amounts of data, which requires the import of data from a current inventory into a separate database.
OAI-interface standard
Protocol: OAI-PMH Version 2.0
OAI-PMH Protocol
The OAI-PMH protocol is a web-based service. The OAI harvester works with simple requests via HTTP-GET or HTTP-POST and receives an HTTP reply from the data provider. This reply contains the requested metadata, embedded in an XML structure. One advantage of this process is that requests can also be sent to an OAI repository via a web browser.
OAI-Harvester
In order to use OAI to compare data held by GEI-Digital and a service provider, the service provider must have implemented an OAI harvester (e.g. OAI-PMH Harvester Manager). The OAI harvester keeps calling itself in an infinite loop. In doing so it carries out a ListRecords order that is restricted to the dataset (catalogue) for that service provider. The ListRecords order also receives the time of the last call via a time stamp. This process ensures that:
- no changes are missed
- changes are sent to the service provider’s database promptly
- no irrelevant data is passed to the service provider.
OAI-Functions
The OAI-PMH protocol contains six basic functions, which are attached to the baseURL (https://gei-digital.gei.de/viewer/oai) with "?verb=" .
- Identify: displays general information about the OAI repository
- ListSets: contains information about all datasets (catalogues) available in the OAI repository.
- ListMetadataFormats: lists all data formats available in the OAI repository
- GetRecord: calls individual data records using an ID. This can only take place if the identification number (PPN, URN) of the required dataset is known.
- ListRecords: harvests datasets using information regarding the timeframe (from/until) and/or the dataset.
- This is a core command of the OAI. It enables selective harvesting, that is to say, the harvester can limit its request to datasets from a particular catalogue, and that were created or changed within a certain timeframe.
- The time is given in coordinated universal time (UTC).
Access requirements
No registration or authorisation is necessary to access the OAI interface through GEI-Digital.
Examples of OAI requests
- Repository information: https://gei-digital.gei.de/viewer/oai?verb=Identify
- Request for the ID number PPN65627140X (PPN) in the format MARC21/XML: https://gei-digital.gei.de/viewer/oai?verb=GetRecord&metadataPrefix=marcxml&identifier=PPN65627140X
- Request for the ID number PPN65627140X (PPN) in the format METS: https://gei-digital.gei.de/viewer/oai?verb=GetRecord&metadataPrefix=mets&identifier=PPN65627140X
- The URLS in section mets:fileGrp USE="ABBYYXML are linked to the OCR full texts of the digitised material (page by page). These must be copied into a new browser window.
- Request for all titles in the collection 'Geschichtsbücher Kaiserreich' in the timeframe 01.04.2015–08.07.2015 in the format MARC21/XML: https://gei-digital.gei.de/viewer/oai?verb=ListRecords&from=2015-04-01&until=2015-07-08&metadataPrefix=marcxml&set=kaiserreichgeschichtsschulbuecher