3. Connecting the APIs together
View on GitHub | Run in Google Colab
So far, we've only looked at the /works
API, but Wellcome Collection has a few more which we can make use of. As well as /works
, we can also use /images
and /concepts
.
In this notebook, we'll look at how we can use these APIs together to get more complete picture of the data in the catalogue.
base_url = "https://api.wellcomecollection.org/catalogue/v2/"
We've already seen what the works API can do - let's fetch a work and have a look at the images and concepts which are linked to it.
import requests
work = requests.get(
base_url + "works/zfhdzwm2",
params={"include": "subjects,images"},
).json()
list(work)
3.2 Concepts
We can scan through the list of subjects on the work and see which concepts they're composed of.
for subject in work["subjects"]:
print("Subject:", subject["label"])
print("Concepts:")
for concept in subject['concepts']:
print("-", concept['label'])
print()
unique_concepts = set()
for subject in work["subjects"]:
for concept in subject['concepts']:
unique_concepts.add(concept['label'])
unique_concepts
Each of these concepts has a unique identifier, which we can use to look up the concept in the /concepts
API.
concept_ids = [
concept['id']
for subject in work["subjects"]
for concept in subject['concepts']
]
first_concept_id = concept_ids[0]
response = requests.get(
base_url + "concepts/" + first_concept_id
).json()
response
This tells us what Wellcome Collection knows about that concept, and where it appears in other controlled vocabularies. We now know that Materia medica
has the ID k6zqasmn
in Wellcome Collection's APIs, and is known by sh85082055
in the Library of Congress Subject Headings (LCSH) scheme. Some concepts will also include alternative names (alternativeLabels
) and equivalent concepts (sameAs
), which can be useful for searching.
3.3 Images
Now, let's have a look at the images on the work.
work['images']
Again, each one has an ID which corresponds to a document in the /images
API. Let's fetch one of these images and see what we can find out about it.
response = requests.get(
base_url + "images/" + work['images'][0]['id']
).json()
response
We're looking at data about the image in the context of Wellcome Collection here - the title of the work it's from (source.title
), the rights statements associated with it (locations[0].license.label
), its average colour (averageColor
) and aspect ratio (aspectRatio
).
Let's look at the average colour of the images which are associated with this work.
for image in work['images']:
response = requests.get(
base_url + "images/" + image['id']
).json()
print(image['id'], response['averageColor'])
3.4 Fetching actual images
In addition to the first-class APIs for /works
, /images
and /concepts
, the Wellcome Collection site use a few auxiliary APIs for different purposes.
For example, the /images
API returns a list of image metadata, but not the actual images themselves. To get the images, we need to use the IIIF (that's International Image Interoperability Framework) API. The IIIF specification is a standardised way of fetching images from a server, which is used by many cultural institutions.
Let's use one of our images from the last section to see how this works.
image_id = work['images'][0]['id']
response = requests.get(base_url + "images/" + image_id).json()
response
As well as the metadata we saw in the last section, we can also see a URL which will lead us to the image itself (thumbnail.url
).
iiif_url = response["thumbnail"]["url"]
iiif_url
response = requests.get(iiif_url).json()
response
Again, this gives us some more metadata about the image, but not the image itself! This time, the metadata is about the specific digital image (eg. the size of the image, the format, etc.) rather than the work that the image is from.
We can augment our IIIF URL using a structured set of parameters (documented here) to get the image in the format we want.
The following line assembles a URL which requests:
- the full image (
full
), rather than a specific region - 640 pixels wide, and at the corresponding height which preserves its aspect ratio (
640,
) - without rotation (
0
) - in colour (
default
), rather than greyscale, bitonal, etc. - in
.jpg
format (jpg
)
thumbnail_url = iiif_url.replace("info.json", "full/640,/0/default.jpg")
response = requests.get(thumbnail_url)
We can use a couple of Python libraries to display the image in our notebook.
from PIL import Image
from io import BytesIO
image = Image.open(BytesIO(response.content))
image
image.size
3.5 Visually similar images
The images API also allows us to specify some extra parameters. One of them return images which are visually similar to the one we've just fetched.
Let's use our image from the last section as an example.
image_id = work['images'][0]['id']
response = requests.get(
base_url + "images/" + image_id,
params={"include": "visuallySimilar"},
).json()
Each of the results in the response's visuallySimilar
field is another image, with the same structure as our source image. We can use the same IIIF API to fetch the images themselves.
for image in response['visuallySimilar']:
thumbnail_url = image['thumbnail']['url'].replace(
"info.json", "full/640,/0/default.jpg"
)
thumbnail_response = requests.get(thumbnail_url).content
image = Image.open(BytesIO(thumbnail_response))
display(image)
3.6 Getting IIIF images for digitised works
We can use a similar approach to fetch images for digitised works (eg individual pages of a fully digitised book). Works which have been digitised will all have an items
field, which contains a URL for a IIIF presentation of the work.
We can filter the works API for works which have a workType
of a
(aka "Books") and items.locations.locationType
of iiif-presentation
.
response = requests.get(
base_url + "works",
params={
"query": "woodblock",
"workType": "a",
"items.locations.locationType": "iiif-presentation",
"include": "items",
},
).json()
response['totalResults']
digitised_work = response['results'][0]
digitised_work['id']
list(digitised_work)
Let's get the IIIF presentation for the digitised work, and have a look at the IIIF response.
for item in digitised_work["items"]:
for location in item["locations"]:
if location["locationType"]["id"] == "iiif-presentation":
presentation_url = location["url"]
break
presentation_url
presentation_response = requests.get(presentation_url).json()
We want the canvases
from this response, which contain the images for each page.
canvases = presentation_response["sequences"][0]["canvases"]
len(canvases)
Each canvas contains an image resource, which we can use to get the IIIF image for that page, as we did in the last section.
iiif_image_urls = [
canvas['images'][0]['resource']['@id']
for canvas in canvases
]
iiif_image_urls[:5]
Let's display the first few images
for iiif_image_url in iiif_image_urls[:5]:
image_bytes = requests.get(iiif_image_url).content
image = Image.open(BytesIO(image_bytes))
display(image)
Exercises
- Display the next 5 pages of our digitised work.
- Have a look at the developers documentation and figure out how to filter an image search by colour. See if you can find some pink elephants (hint:
#b23f72
is the hex code for a nice bright pink). - Find an image's visually similar images, and then find the visually similar images for all of those images.
- Find a concept which includes some
alternativeLabels
. See whether you can find any works which have been tagged with those alternative labels. - Find another work which has a
workType
ofa
(aka "Books") anditems.locations.locationType
ofiiif-presentation
. Fetch the IIIF presentation for the digitised work, and explore its images.