Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Pre-Digitisation Curation Case Studies

Table of contents

Overview

The case studies on this page give examples of some of the steps of pre-digitisation curation. They link closely to the Pre-Digitisation Curation checklist

Get in touch with us on our GitHub if you would like to contribute a case study to this page.

Pre-Digitisation Curation Checklist

Setting Natural Science Data Free: Scoping UK Collections (DiSSCoUK)

Summary of work
As a first step towards improving natural science digitisation, we sought to gain insight into the breadth and depth of UK natural science collections, and the extent to which these collections have been digitised. The initial challenge of this scoping exercise was identifying all natural science collections in the UK. Using regional museum development groups, existing contact lists, and online searches, we collated a list of over 150 institutions with public natural science collections, consisting of museums, herbaria, university collections, and research societies. While not every natural science collection was accounted for, and some did respond to our request for collections data, we received survey responses from 84 institutions.

Inventory of Collections
The scoping exercise was based on the SYNTHESYS+ survey to maintain standardisation across the similar DiSSCo led projects. The key difference with our national survey was the range of participating organisations, all with differing capacities for completing the survey. Large institutions with dedicated natural science curators and a digitisation team will have greater capacity to provide a detailed summary of their collections when compared to small institutions with no dedicated digitisation team or scientific expertise. To obtain as much detail as possible while not deterring participation from smaller collections, we made the survey graded, allowing different levels of granularity. All institutions were required to complete the collection overview which asked for specimen count and digitisation level estimates for 9 key natural science disciplines (Anthropology, Botany, Extraterrestrial Objects, Geology, Microorganisms, Palaeontology, Zoology Invertebrates, Zoology Vertebrates, and Other Geo/Biodiversity). Where possible, we also asked institutions to provide a finer level of detail for their collections. There were options to provide specimen quantity and digitisation level estimates broken down by taxonomic group (45 taxonomic groups listed), preservation type (57 preservation types listed), and stratigraphy. This is particularly useful in identifying the areas to focus resources and create training materials for.

Estimation of your collections
When asking for estimates of the number of specimens within a collection, the accuracy of this estimate will vary considerably across institutions, depending on the size of the collection, staff experience and expertise, and the digital infrastructure available to the institution. For instance, some institutions surveyed had no natural history curator, no online database, and described their collection estimates as ‘best guesses’. To account for estimate uncertainty, all institutions were asked to provide a confidence interval for every estimate. This was recorded as a percentage to reflect the true number of specimens within the collection. For example, a 10% confidence interval for a 1000 specimen estimate indicated that the true number of specimens lies between 900 and 1100.

Conclusion
The scoping survey produced our most up-to-date and accurate understanding of what UK collections hold. It revealed that most organisations lack support in digitising their natural science collections and are unable to mobilise their data to be utilised by the scientific community. The results of the survey have been used to create a blueprint for a national digitisation programme, to improve national digitisation and unlock the full scientific potential of UK natural science collections.

Detailed inventory of the collections for DiSSCo Flanders

This case study can be found in this journal article

Estimation of the numbers of the African and Belgian Herbarium Collection at Meise Botanic Garden

In 2015, Meise Botanic Garden received a grant from the Flemish Government to digitise all the central African (Congo DR, Rwanda and Burundi) and Belgian herbarium specimens within 3 years. The first step in this mass digitisation project called DOE! (Digitally unlocking the heritage collection) was a 10% count of the whole African vascular plant collection, which is kept as a separate subcollection. In the African herbarium of BR, brown folders are used to mark the specimens collected in Congo DR, Rwanda and Burundi. All specimens collected in other African countries are stored in green folders. We wanted to know the percentage of central African collections and see if it was worthwhile to only digitise these specimens or digitise the whole African herbarium at once. For every row of cupboards in the herbarium, the first cabinet was completely counted. A division was made between specimens kept in green or in brown folders and all the type specimens for each colour of folders was noted as well. This was necessary because all the types were digitised in previous projects and it was decided not to digitise them again.

When we extrapolated the numbers for the whole African collection, we arrived at a number of around 900 000 sheets, 100 000 less than presumed before the count. We have also found that 57% of all the specimens were collected in central Africa, 43% was non central African material - Note that in BR the African herbarium only holds specimens collected South of the Sahara- . Based on these results, we decided to go for the digitisation of the whole African collection because it would cost us too much time to only extract the central African specimens instead of digitising them all. The number of digitised specimens at the end of the project was very similar to the 900 000 of the estimate.

As the Belgian herbarium is kept separately and was almost completely barcoded with a numbering system that allowed us to know how many holdings we have, we didn’t have to conduct a 10% count.

Estimation of collection size at the NHM London

It is quite hard to accurately estimate the size of a collection. The success of estimation depends on many factors, including:

  • Is there previous experience digitising part of that collection or a collection that is similar?
  • Age of the collection
  • Origin of the collection
  • Type of the collection

For example, in the NHM, we have had several projects digitising entomological slide collections, therefore we have good estimates on how many slides can fit into a full drawer. Knowing this, before each new slide digitisation project, we audit the collections - meaning we go in, look at the drawers and using an eyeball estimate, we estimate the fullness of the drawers. Using that information and the existing data we have we can then make quite accurate estimations on the size of the collection.

It is more difficult if we have no pre-existing experience working with the collection and using a similar collection does not always work.

Estimating the size of the herbarium for digitisation purposes is a more complex task. We can start building the estimate on how many cabinets we have, how many sheets can fit into a cabinet and how full the cabinets are. But we are disregarding factors that are affecting the estimation, such as a) bulkiness of the specimens, b) multi-specimen sheets.

Multi-specimen sheets are quite tricky as without looking at the actual specimens, we can’t determine how many specimens are on one single sheet, it can be one, two, three or twenty even. If we have a lot of these sheets in the collection. If we estimate the numbers based on the number of sheets, we will underestimate the size of the collection (and therefore the time taken to digitise).

The first mass digitisation project in the NHM herbarium was digitising the Brassicales order. The actual size of the collection was twice that of the original estimation. An accurate estimate requires a good knowledge of the collection. There are certain factors that can help us in the estimation process that comes from understanding the history of the collection. It is useful to have knowledge of when the collection was acquired, where it was collected and in what era. If we have a huge collection from relatively recent times, e.g. 1980s, we can safely estimate the number of sheets, as the multispecimen sheet practice was not in use at this time. Information that can help includes the collector (e.g. are their collections often mounted together with someone else’s specimens?), the region collected, and whether the paper was in short supply or expensive.

Knowing the collection, its history and origins can help us estimate the size better. But it is also a good practice to leave around a 20% variation if we are talking of a project larger than 40-50,000 estimated records.

Decontamination of parts of the Herbarium at Meise Botanic Garden

The vascular plant collection of the herbarium BR at Meise Botanic Garden was treated in the past with mercury. The AWH collection, incorporated in the BR collection in 2006, is poisoned with nitrobenzene.

For the second mass digitisation project DOE!2 when these collections were going to be digitised, a risk analysis on the use of these chemicals was added to the tender to make sure that the external company was aware of the risks so they could take the necessary precautions.

Before we outsourced the digitisation of this AWH collection, we removed the jars with nitrobenzene out of the metal boxes which contained the specimens. A protocol was written for this as well:

Removing jars of nitrobenzene and airing the Van Heurck collection (AWH)

  • Install ventilation and make it operational
  • Safety Precautions
    • Full face mask with filter A2B2P3
    • Yellow disposable pack (pesticide)
    • Disposable gloves (polyvinyl alcohol)
  • Supplies
    • Safety clothing (see point two)
    • Jar with double closure
    • Container for chemical waste
    • Mobile scaffold
    • Free workbench
  • Working method
    • Requires a minimum of 2 persons who can pass along boxes that are at a higher height.
    • Take box by box off the rack. Use mobile scaffolding for boxes at a higher height.
    • Open boxes and place the lid under the box (for faster ventilation).
    • Remove the jar of nitrobenzene from the box and place it in the double-closing jar.
    • Replace boxes in the same order. Slide the bottom box all the way to the back, the box that comes on top is slightly slanted and stepped in the other places in such a way that both boxes can air sufficiently.
    • Dispose the closed jar containing jars of nitrobenzene in the chemical waste container.
    • The safety officer will take care of the disposal of this container of chemical waste.

Here you can find what was added in the tender:
The following measures should be applied when working with herbarium specimens inside and outside the collection areas:

  • Wear a lab coat and gloves (polyvinyl alcohol)
  • Wash your hands after working with herbarium specimens
  • Do not eat or drink in the collection
  • Keep the doors of the collection areas closed
  • Walk away from the cabinet doors after opening and wait at least 1 minute before starting to work in the cabinet.

Pregnant women and women who wish to become pregnant are advised NOT to enter the collection areas and to avoid contact with herbarium specimens. Breastfeeding women are also NOT allowed to enter the collection areas and must avoid contact with the herbarium specimens.

Measures to work in the collection:
Herbarium material is susceptible to an attack by pests, especially various species of beetles and silverfish. Today, pest damage is prevented by regular freezing.

In order to keep the risk of ‘contamination’ (= damage by pests) as small as possible, a number of measures should be taken with regard to the collection areas.

In the collection areas (storage and working spaces) it is prohibited to:

  • Eat or drink (only a bottle of water with a ‘drinking cap’ is allowed)
  • Bring food
  • Bring objects or persons without the prior consent of the collection manager or their replacement
  • Open windows without consultation of the collection manager or their replacement
  • Leave herbarium specimens unprotected, put them back in the herbarium cabinets or in closed boxes as soon as possible
  • Leave room doors, cupboard doors and boxes open unnecessarily
  • Move herbarium material between the different collection rooms.
Exposure Risk - Operation
Low (non-contaminated material) High (material contaiminated / in possible contact with contaminated material) Very High (heavily contaiminated or suspected heavy contamination
Exposure Risk: Room Low (no use of chemicals in the room) Mounting new incoming material Re-mounting old material, imaging intercalation, collection consultation, transcription All actions
Offices
High (use of chemicals in the room) Mounting new incoming material Re-mounting old material, imaging intercalation, collection consultation, transcription All actions
Herbarium rooms

Table Key: Wear a lab coat and gloves:

  • Blue: Not neccessary
  • Grey: Recommended
  • Orange: Obliged

Staff List for Mass Digitisation Project DOE! at Meise Botanic Garden

  • Project manager (0.8 Full-Time Equivalent (FTE))
  • IT specialists (hardware, software, storage) (1.5 FTE)
  • Collection manager (daily management and follow up of the restoration/preparation) (0.5 FTE)
  • Collection technicians (restoration, preparation, imaging in house, transcription in house, quality control external transcribed label data, pest management) (8 FTE)
  • Database manager (for daily management and QC) (0.5 FTE)
  • QC manager images (for automated and visual checks) (0.6 FTE)
  • Data publisher (publishing images and data to different portals, maintenance) (0.5 FTE)
  • IT Developer (external)

Quality Control Procedure of Meise Botanic Garden for the Mass Digitisation Project DOE!

To determine the extent to which label transcription meets quality requirements the following will be examined:

  • The method that will be used for quality control
    • The common mistakes, to which an error weight is assigned, ranging from 0.1 to 0.5 penalty points (error calculation)
    • The measuring standards that reflect the acceptance levels.

Method
The quality will be measured using a sub-sample of the data file. The sub-sample size depends on the size of the data file. The sub-sample size is determined using the table under point 3.

Types of Errors
Two types of errors are distinguished:

  • Identification errors occur when:
    • Data is entered into the wrong field or incorrect data is entered in a field
    • Data has not been entered despite it being present on the label.
  • Transcription errors occur when data have not been correctly transcribed from the label(typos).

Error Calculation method
An overview of the penalty calculation for each field is given in the tables below. The calculation is determined on the retrievability of the collections.

The Locality field has been further divided to distinguish a number of errors.

Certain input errors will be dealt with using the following rule:

  • When the incorrect date is selected, the fields COLL_DT_DY, COLL_DT_MN and COLL_DT_YR will be counted as a single error with a maximum penalty of 0.5 points.
  • When data has been entered in the wrong column, resulting in another field being left blank, this will be counted as a single error with a maximum penalty of 0.5 points.
  • A maximum of 1,0 penalty points per herbarium sheet. When this maximum penalty has been reached, checking will cease for the herbarium sheet and the following sheet will be checked.
  • When the Comments field has been justifiably completed (more likely with handwritten labels as opposed to typed labels), we will not award penalty points for said error(s).

Filing name on herbarium covers\

Field Transcription Error Identification Error
FILING_NAME 1 1

Minimal and additional label information\

Field Transcription Error Identification Error
Wrong data/field Not entered data
BARCODE N/A N/A N/A
COLL_ID 0.5 0.5 0.5
COLLECTOR 0.5 0.5 0.5
COLL_NUM 0.5 0.3 0.5
COUNTRY_AS_GIVEN 0.5 0.5 0.5
COUNTRY_CODE 0.5 0.5 0.5
PHYTOREGION 0.5 0.5 0.5
COLL_DT_DY 0.3 0.3 0.3
COLL_DT_MN 0.3 0.3 0.3
COLL_DT_YR 0.5 0.5 0.5
DATE_AS_GIVEN 0.3 0.3 0.3
LOCALITY 0.1-0.3 0.3 0.1-0.3
ALTITUDE 0.3 0.3 0.3
ALTITUDE_UNIT 0.3 0.3 0.3
LAT_DEG 0.3 0.3 0.3
LAT_MIN 0.3 0.3 0.3
LAT_SEC 0.3 0.3 0.3
LAT_DIR 0.3 0.3 0.3
LONG_DEG 0.3 0.3 0.3
LONG_MIN 0.3 0.3 0.3
LONG_SEC 0.3 0.3 0.3
LONG_DIR 0.3 0.3 0.3
COORDINATES_AS_GIVEN 0.5 0.5 0.5

Measuring standards that reflect the acceptance levels (ISO 2859)\

The acceptance or rejection of a file is determined with reference to the table below. Acceptance table: when a file has a batch size of 450 records, for example, the sub-sample batch size of 500 records will be used. We will use the test level II-Normal, which has the identification letter H. For this code letter a sample size of 50, where penalty points <2 are approved (G1) and ≥2 are rejected (A1). In other words, 1.9 penalty points are approved and 2 penalty points are rejected.

Suppose a batch comprises 500 records, thus a sub-sample of 50 records. A single record may not have a penalty point greater than 1. For example, if a single record has 10 mistakes giving it a total of 3.8 penalty points this still counts as 1 penalty point. If only one error was counted at 0.5 penalty points then this counts 0.5. The sum of all penalty points determines whether a batch is accepted or rejected.

No. records N/A ≤ 150 ≤ 280 ≤ 500 ≤ 1200 ≤ 3100 ≤ 10000
Sub-sample size (see table below) 20 32 50 80 125 200
Accepted when errors < 1 < 1 < 2 < 3 < 4 < 6

Shows Sample Size Code Letters for Normal and Special Inspection Levels by Batch/Lot Size
Shows Acceptable Quality Levels for a Normal Inspection by Sample Size Code Letter

Ordering Supplies for Pinned Insect Digitisation: Natural History Museum, London

There are a wide variety of supplies required to ensure the smooth running of any pinned insect digitisation project. This can range from more substantial items - cabinetry, drawers, cameras etc. to consumables - pins, UID barcodes, EVA foam etc. and the ability to suitably plan to have these available for any project is contingent on several factors:

  • What is currently already available to be utilised?
  • What is the accuracy of the estimate of specimen numbers for a specific project?

It is useful to build in contingency to any order of regularly used materials but potential future issues and delays can be alleviated the more confident you are in any estimate.

  • Is there a budget available for required items? Is this ring fenced for the project or more general?
  • Are certain items known to have long lead times?
  • Are there any items that are difficult to source/no longer available and will suitable substitutes need to be found?

A recent pinned insect digitisation projects at the NHMUK shows a variety of issues that may be encountered when ordering supplies.

One large digitisation project involved rehousing the collection from old, cork-lined drawers to unit trays in new drawers prior to imaging. At the beginning of the project, there was a supply of both unit trays and new drawers to be used and it was known that these would likely need to be reordered before the culmination of the work.

This project was externally funded but the terms of the funding did not extend to consumables so provision for these became the responsibility of the collections budget. New drawers and unit trays are regularly ordered for the entomology collections to allow for rehousing/recuration and expansion and is normally done in bulk to make benefit of related savings to orders at scale.

Unfortunately, this bulk ordering meant that there was a period when suitable, new collection drawers ran out as the latest outstanding order was yet to be fulfilled (it appears that the drawer manufacturer had scaled back their workforce due to a downturn in business during the pandemic causing increased lead times).

In order to be able to continue with the project, it was necessary to source a temporary storage alternative until the arrival of the new drawers. Fortunately, there were drawers of a different size available that could be used as a stop gap to store the newly rehoused specimens, in unit trays, in the collections.

Tracking System: Collection Move, Naturalis

Labelling containers with future storage location

  • Efficient tracking system for objects and containers of objects (location, condition, quantity)
  • Use of barcodes of RFID tags. Barcodes don’t need to be physically attached to the objects themselves but placed in move trays and supports.rolls of double barcodes were produced - one to put on the worksheet and another to place on the container.
  • Items/crates can be scanned at a number of points e.g. when an item is taken off a shelf, when it is packed, when it is placed in a crate, when the crate is put into/and taken out of a lorry, and when the item is placed in store or at its final destination.
  • Barcodes may be stuck directly onto boxes or packing materials, or onto slips of paper which can be inserted into collection items
  • Knowledge of drawer contents
  • Current image

Authors

Sofie De Smedt, Ann Bogaerts, Krisztina Lohonya, Tara Wainwright, Peter Wing
Meise Botanic Garden, Natural History Museum London.

Contributors

Lisa French, Frederik Berger, Rob Cubey, Helen Hardy, Anne Koivunen, Sabine von Mering, Laurence Livermore

References

Hardisty A, Livermore L, Walton S, Woodburn M, Hardy H (2020) Costbook of the digitisation infrastructure of DiSSCo. Research Ideas and Outcomes 6: e58915. https://doi.org/10.3897/rio.6.e58915
Nieva de la Hidalga A, Rosin PL, Sun X, Bogaerts A, De Meeter N, De Smedt S, Strack van Schijndel M, Van Wambeke P, Groom Q (2020) Designing an Herbarium Digitisation Workflow with Built-In Image Quality Management. Biodiversity Data Journal 8: e47051. https://doi.org/10.3897/BDJ.8.e47051

Content Last Updated

30 August 2022