Skip to main content

Datasets

We have two datasets available for download for use in your research.

Catalogue snapshot

This dataset provides a daily snapshot of the catalogue that describes our museum and library collections. Downloads are line-delimited JSON, with each line providing one resource in the same serialisation format as the Catalogue API.

DescriptionSizeDownload
All works as JSON1.3 GBworks.json.gz

London MOH reports

This dataset brings together around 5800 Medical Officer of Health (MOH) reports from the Greater London area. This includes the present-day City of London, 32 London boroughs and the predecessor local authorities for these boroughs, including urban and rural district councils and sanitary districts. Full text of the reports is included, along with around 275,000 tables that have been extracted as individual files. The extracted tables have undergone extensive quality assurance checks, but due to the volume of the data, we cannot promise 100% accuracy. The data is licensed under a Creative Commons Attribution 4.0 International Licence.

DescriptionSizeDownload
Full text corpus (raw text)215 MBFulltext.zip
All report tables as CSV340 MBAll_report_tables.csv.zip
All report tables as HTML412 MBAll_report_tables.html.zip
All report tables as XML536 MBAll_report_tables.xml.zip
All report tables as TXT422 MBAll_report_tables.txt.zip