Retrieving BibTex Entries for ISBNs

#!/bin/bash

Overview

One of the primary purposes of this site it to act as a place to collect sources of information. BibTeX has been adopted as a fairly ubiquitous and lightweight format to store such sources, but some of the more sophisticated BibTeX functionality is built into tools which carry other weight which I’d rather avoid.

One such feature is the ability to retrieve a usable BibTeX entry for a book. The relevant information for a book can be retrieved from a number of sources where the ISBN acts as a usable identifier for specific material. While I don’t have a lot of domain knowledge, two seemingly good sources of such information are the Google Books API(1). and more direct operations on MARC records. Conversion of a selected record from a MARC file to BibTeX may be a safer route as it is more closely aligned with canonical sources such as the Library of Congress, but in the short term I’ll use the route of retrieving the information through Google Books as it offers an approach which is likely to be far closer to that which is used for other types of sources.


set -euo pipefail

Utilities

Some commonly used functionality isn’t provided by bash and so needs to be defined or sourced.

die

Output the provided message and then exit with an error status. For now the status will be hardcoded and officially shouldn’t be considered as anything other than not successful.


my::die() {
    echo "${@}"
    exit 1
} >&2

Dependencies

Initially this will be written using bash, curl, and jq(2) and will be revisited over time as I’m tinking with technologies and as patterns emerge across source types.

Defining variables for the dependencies provides a means to validating that they are available and also allows the caller to override what is used.


declare -r CURL="${CURL:-$(which curl)}"
[[ -x "${CURL}" ]] || my::die 'curl is not found or cannot be executed, aborting!'

declare -r JQ="${JQ:-$(which jq)}"
[[ -x "${JQ}" ]] || my::die 'jq is not found or cannot be executed, aborting!'

Volume with ISBN

The call to Google Books will make use of the volume search behavior provided by the API(3), passing in the single provided parameter as the isbn keyword.

Currently as this is only accessing public information it does not need to provide any form of credentials.

my::volume_with_isbn() {
    ${CURL} -s "https://www.googleapis.com/books/v1/volumes?q=isbn:${1}"
}

Enforce Single Result

The logic expects a single matching result so here we fail if that’s not the case and unwrap the result if it is.

This is expected to be called as a pipe and stores the value of stdin in a variable so that it can be operated on multiple times.


my::enforce_single_result() {
    local -r in=$(</dev/stdin)
    local -r count=$(echo "${in}" | jq '.totalItems')
    [[ "${count}" == '1' ]] || my::die "Received ${count} results, cannot continue!"
    echo "${in}" | jq '.items[0].volumeInfo'
}

To BibTeX

The API call will return a JSON formatted document which then needs to be converted to BibTeX format. The interpolation functionality provided by jq can handle that.

The key will use the title transformed by lowercasing letters and replacing whitespace with hyphens. This is done inline for simplicity though it would be slightly more expressive to split this out into a jq or bash function.

The raw publishedDate field will be returned rather than parsed as it can apparently be returned in different precisions and therefore will need some logic that surpasses what is readily available. Supporting this is potentially the point at which at least this transformation should be replaced by other technologies, I’ll likely explore the option of splitting up the transformation and rendering to explore the options.


my::to_bibtex() {
    jq -r '"
@book{\(.title | ascii_downcase | gsub("\\s+";"-")),
  title         = {\(.title)},
  author        = {\(.authors | join(","))},
  publisher     = {\(.publisher)},
  publishedDate = {\(.publishedDate)},
  isbn13        = {\(.industryIdentifiers[] | select(.type == "ISBN_13" ) | .identifier)},
  isbn10        = {\(.industryIdentifiers[] | select(.type == "ISBN_10" ) | .identifier)},
  url           = {\(.canonicalVolumeLink)},
}
"'
}

Main

The final pipeline consists of retrieving the volume information, verifying and unwrapping the single result, and then transforming it to BibTeX format.

main acts as the entrypoint so it will be called with the required single parameter.


my::main() {
    my::volume_with_isbn "${1}" | my::enforce_single_result | my::to_bibtex
}
my::main "${1:?Must provide the ISBN number as an argument!}"
1.
Overview | google books APIs | google developers [online]. 22 May 2022. Available from: https://developers.google.com/books/docs/overview
2.
Jq manual (development version) [online]. 4 June 2022. Available from: https://stedolan.github.io/jq/manual/
3.
Using the API | google books APIs | google developers [online]. 23 May 2022. Available from: https://developers.google.com/books/docs/v1/using