RBGE Herbarium Sheet Mass Digitisation

Table of contents

Overview
Workflow
Requirements
- Hardware
- Software
Authors
Contributors
References
Citation
Licence
Document Control
- Edit This Page

Overview

This workflow is designed for the digitisation of flat herbarium sheets, undertaken as part of an in-house mass digitisation programme at The Royal Botanic Garden Edinburgh (RBGE). The workflow is based on the concepts outlined in early publications for creating minimal data specimen records. The data element of the workflow results in minimal data records, equivalent to MIDS-1, in this first stage of digitisation. The enhancement of these records will then be achieved as part of subsequent digitisation workflows. The physical curation element includes a level of specimen curation and conservation identified as a balance between achieving high throughput rates and maintaining best practice curation standards.

The workflow includes a level of automation to create the data records with associated metadata and to process the image files with associated metadata. The image processing pipeline includes Optical Character Recognition (OCR) which is carried out on all images.

Herbarium flat sheets

Workflow

These sections are used to add BPMN workflows. Text can also be included to help someone follow the workflow. Not all subheadings will be required for every workflow.

Pre-Digitisation Curation

There are three key tasks that form the pre-digitisation task cluster at RBGE:

Genus and Species Covers
Taxonomy and Names
Conservation Assessment

Genus and Species covers

A key part of the pre-digitisation curation is to ensure that key information for the finding of specimens is present and correct on both the genus and species covers. The collections are organised systematically firstly by family number. Within families the specimens are ordered by genus number. Genera are then organised by geographic regions that the specimens were collected. Within each genus cover the specimens are in species covers, with the species epithet on the bottom right hand corner of the cover.

Genus cover overview

Species cover overview

If either genus or species folders are too full they are split into additional folders to help ensure that the specimens are stored well, to help ensure their long term preservation.

An over full Genus cover

If any specimens are particularly large or bulky, they should be placed at the top of the specimen cover to reduce the risk of specimens warping and becoming damaged.

A bulky specimen causing warping of surrounding specimens

Taxonomy and Names

For this project the names are entered as they are found in the cabinets, a later project will assess all of the names in use and determine whether they should be use. The key principal is that the name used should allow the specimens to be found.

If a name is not present in the database it needs to be added. We are recommending the use of certain online resouces for this purpose:

Other resources if the name cannot be found above include:

Conservation Assessment

Whilst this step is included as part of the Pre-digitisation curation task cluster it is generally carried out after the Electronic Data Capture prior to Specimen Image Capture.

Where possible the repair work is carried out prior to the specimen being imaged.

Examples of conservation interventions:

Replacing non-brass paperclips or pins - non-brass materials can become rusty and cause damage to the specimen
Placing a fragile specimen in a protective cover - if a specimen is on fragile or brittle paper it can be placed in a protective cover to prevent further damage

Examples of repairs:

Specimens mounted on very small sheets of paper, or attached with pins - these specimens can cause damage to other specimens within the species cover and need to be mounted on to a larger sheet.

A very small specimen that needs to be remounted A small specimen that has been attached to a larger sheet with a pin

Specimens with loose or broken pieces - if a specimen is fragile or mounted on too flexible a mounting board it can cause the specimen to break. Depending on the extent of the damage it is possible for small pieces of loose material to be added to a capsule if present on the sheet. If the damage is more extensive it needs to be repaired by the conservator
Specimens with cellophane covering material or fixed with sellotape - where possible these are imaged following the repair.
Specimens with insect damage - this can be either recent or historic. If the damage appears to be recent specimens are treated to ensure that there is no continuing activity. In either case the specimen is cleaned prioe to imaging.

Electronic Data Capture

The electronic data capture task results in minimal data records equivalent to MIDS-1. The workflow captures information on the specimen folders, shared by all specimens within, along with the barcode for each specimen. This results in a record for each specimen which contains a unique identifier and the specimen filing location represented by the folder information.

A barcode is applied to each specimen. Digitisers are provided guidance on the best placement of the barcode. Barcodes are assigned to specimens using the procedure below:

Each separate, loanable item will be assigned a unique barcode. This includes each individual sheet, where a single specimen has been mounted over several herbarium sheets.
Each specimen with different collecting event information or a different identification will be assigned a unique barcode. This includes each specimen where multiple specimens have been mounted on a single herbarium sheet, and each element of a mixed gathering where a single collection has been later split into two different species.

A software process for the electronic data capture was developed and implemented externally as a standalone application and then internally in the institutional collection management system.

The user enters the following data:

Taxon filing name (Higher taxon, Family, Genus, Species)
Geographical filing region
Barcode

The application enters the following metadata:

User name
Record created date
Institution code
Type of specimen (eg, Herbarium sheet)

The application checks the barcode to ensure that a record does not already exist, and then creates a specimen record for each new barcode, entering the data and metadata.

More complex specimens, including multiple specimens mounted on a single sheet, or when a single specimen is mounted across multiple sheets, have additional information entered manually into the specimen records.

Multiple specimens mounted on a single sheet A single specimen mounted on multiple sheets

Image Capture

At present there are 3 different imaging stations being used for the imaging of flat herbarium sheets. Whilst each has some slight differences the overall principals for the image capture workflow remain the same.

An example of an imaging station

General principals of the image capture workflow:

Specimens are placed on a backboard which have a fixed colour chart and scale bar.
The barcode is used as the filename for the image and this allows for later linking to the data record for display on our online catalogue and export to data partners.
Barcodes are scanned manually using a barcode scanner.
The processing of RAW files to TIFFs is done automatically by the software with a small number of software based modifications e.g. the application of sharpening

Image Processing

The image processing workflow has been designed to use a folder structure to derive metadata for user and equipment, with image metadata being derived from the image files themselves. This has created a workflow where there is a minimal amount of input needed by the digitisers to process their specimen images.

An initial check is run by the digitiser to ensure that the specimens do not already have an image, using a website created inhouse to query the image database. If there are any duplicate images they can be removed from the batch prior to processing.

Files are copied to watched folder, the folder structure of which creates the operator and equipment metadata.

Image processing file structure

The image processing script takes image pairs (a RAW file and TIFF) and processes them to create a JPG and a tiled image for use on the online catalogue. A copy of the TIFF file is created and sent to an OCR pipeline, the results of this are written to an OCR output database. The RAW and TIFF files are archived.

As part of this process some basic QC checks are performed on the image files:

The filename is checked for length it should be an E followed by 8 numbers. If it is too long, short or doesn’t follow this pattern it is returned to an errors folder. Suffixes can be added to image files using a _.
File size is checked - if it is too large it is returned to an errors folder

Once the images have been processed the digitisers carry out a second check using the online tool to ensure that they have all been successfully processed prior to deletion of the files.

Preserving and Publishing Data

The data is automatically harvested from the CMS once every 24h hours (usually 4am) and written to a holding database using an export template and code to re-combine data into multiple appropriate formats for downstream uses. One format is a set of linked DwC (Darwin Core files) that are automatically copied and moved to a public-facing server. These DwC files can be harvested (automatically or manually) by our data partners who aggregated data appropriately.

Requirements

Hardware

Software

BG-BASE
Capture One photo editing software
Adobe Photoshop Elements

We also use a number of applications and tools that have been built in house

Authors

Elspeth Haston, Robert Cubey, Robyn Drinkwater, Sally King

Contributors

This work is built upon the working practices of the RBGE staff.

References

Haston, E, Cubey, R & Harris, DJ (2012). Data concepts and their relevance for data capture in large scale digitisation of biological collections. International Journal of Humanities and Arts Computing, 6:1-2, 111-119. DOI: https://doi.org/10.3366/ijhac.2012.0042

Haston, E, Cubey, R, Pullan, M, Atkins, H, & Harris, D (2012). Developing integrated workflows for the digitisation of herbarium speci-mens using a modular and scalable approach.ZooKeys, 209,93–102. DOI: https://doi.org/10.3897/zookeys.209.3121

Drinkwater R, Cubey R, Haston E (2014) The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys 38: 15-30. DOI: https://doi.org/10.3897/phytokeys.38.7168

Nelson G, Paul D, Riccardi G, Mast AR. Five task clusters that enable efficient and effective digitization of biological collections. Zookeys. 2012;(209):19-45. DOI: https://doi.org/10.3897/zookeys.209.3135

Citation

Haston, E., Cubey, R., Drinkwater, R. & King, S. (2022) DiSSCo Digitisation Guides: Royal Botanic Garden Edinburgh Herbarium Sheet Mass Digitisation workflow. version 1.0 Available at: https://dissco.github.io/HerbariumSheets/RBGEHerbariumSheet.html

Licence

CC-BY 4.0

Document Control

Version: 1.0
Changes since last version: N/A
Last Updated: 13 January 2022

Edit This Page

You can suggest changes to this page on our GitHub