Datasets
We have two datasets available for download for use in your research.
Catalogue snapshot
This dataset provides a daily snapshot of the catalogue that describes our museum and library collections. Downloads are line-delimited JSON, with each line providing one resource in the same serialisation format as the Catalogue API.
Description | Size | Download |
---|---|---|
Works | 1.5GB | works.json.gz |
Images | 32MB | images.json.gz |
London MOH reports
This dataset brings together around 5800 Medical Officer of Health (MOH) reports from the Greater London area. This includes the present-day City of London, 32 London boroughs and the predecessor local authorities for these boroughs, including urban and rural district councils and sanitary districts. Full text of the reports is included, along with around 275,000 tables that have been extracted as individual files. The extracted tables have undergone extensive quality assurance checks, but due to the volume of the data, we cannot promise 100% accuracy. The data is licensed under a Creative Commons Attribution 4.0 International Licence.
Description | Size | Download |
---|---|---|
Full text corpus (raw text) | 215 MB | Fulltext.zip |
All report tables as CSV | 340 MB | All_report_tables.csv.zip |
All report tables as HTML | 412 MB | All_report_tables.html.zip |
All report tables as XML | 536 MB | All_report_tables.xml.zip |
All report tables as TXT | 422 MB | All_report_tables.txt.zip |