Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Software

Software will be needed in various aspects of the digitisation process. In many instances free software is available in addition to paid software. This page includes examples of software actively being used by institutions, but is not an exhaustive list.

Pre-Digitisation Curation

Barcode Label Creation

Barcode labels can be used to encode information in a machine-readable format, thus enabling automation. They can be generated in-house or using external companies, and can be permanent or temporary, for example:

  1. Permanent - encoding specimen/object unique identifiers
  2. Temporary (for digitisation purposes) - encoding information relating to the specimen/object in the collection, such as location in collection, taxonomy, type status etc. See Slides - Mass Digitisation for a workflow example.
Software Cost Barcode Label Creation Notes
BarTender paid X  
Collection Management System* free & paid X Most robust way of generating unique identifiers as it ensures a 1:1 match with the database
EntomoLabels free X Can be used for all collection and preservation label types
Label Designer free X Tool used in Kotka and Notebook. Guide for Label Designer available here

* various collection management systems are available

Camera Control

Mass digitisation of specimens/objects is generally carried out using a camera or flat-bed scanner, while specialised imaging systems are used for high resolution/diagnostic imaging. More information about cameras, imaging setups and requirements can be found in Image Capture.

Tethered Camera Control

Tethered photography for specimen/object image capture has many benefits, such as live view on a monitor, controlling camera settings from the connected device and remote trigger.

Software Cost Single Camera Control Multiple Camera Control Basic Metadata Writing Image Processing
EoS Utility (Canon Only) free X   X  
Helicon Remote paid X   X  
LUMIX Tether (Panasonic only) free X   X  
Capture One paid X   X X
LeafCapture* (Leaf only) free X   X X
digiCamControl free X X X  
Breeze Multi-Camera Array (DSLR Remote Pro Multi-Camera) paid   X X  

* available through third party websites

Focus Staking (Extended Depth of Field Imaging)

Certain specimens/objects or imaging approaches may require a deeper depth of field than a single image can provide. This can be achieved using an automated focus rail that moves the camera/specimen or altering the focus of the lens. Some cameras can do in-camera stacking and/or focus bracketing.

Software Cost Lens/Focus Rail Control
digiCamControl free X
Helicon Remote paid X

Image Processing

Images may require some form of processing, such as renaming, basic/advanced editing, stacking, photogrammetry etc. The use of hot folders and automated actions can enable activities, such as image capture and processing, to happen consecutively, significantly reducing post-processing time. More information can be found in Automating Image Processing and a workflow example can be found in Slides - Mass Digitisation.

File Renaming

Batch and automated file renaming can increase the efficiency of digitisation workflows as well as reduce the potential for human error. One example is to use software to rename image files by extracting information from barcode(s) present in the images.

Software Cost Automated Renaming Batch Renaming* Notes
BardecodeFiler paid X X Reads multiple barcodes in image and renames file. Guidance available here
Gouda (decodes_barcodes.exe) free X X Reads single barcode in image and renames file
Bulk Rename Utility free   X  
PowerRename (Microsoft PowerToys) free   X  

* scripts can also be used for batch renaming

Post-Processing

Post-processing of image files can be carried out either in bulk using batch processing or consecutively using hot folders.

Software Cost Basic Editing Advanced Editing Focus Stacking* Notes
XnConvert free X      
Adobe Lightroom paid X X   See guidance
Adobe Photoshop paid X X    
GIMP free X X    
ImageMagick free X X   Stitch multiple images **
ImageJ free X X X Focus stacking available as a plugin (Stack Focuser)
Helicon Focus paid     X  
Zerene Stacker paid     X  
Inselect free       Automated image segmentation

* some cameras can do in-camera stacking and/or focus bracketing
** scripts can also be used for bulk stitching of images

Electronic Data Capture

Information about the specimen/object on handwritten or typed labels can be captured manually or using optical character recognition (OCR). More information about transcription project planning, impact versus effort, and the challenges can be found in Manual Transcription. The use of data standards such as Darwin Core (DwC), which is primarily based on taxa and their occurrence, will provide a stable, standard format for sharing information (Wieczorek et al., 2012).

Data Capture and Extraction

Software Cost Manual Data Capture OCR Notes
Crowdsourcing Platforms* free & paid X   Web applications
ABBYY FineReader Server (ABBYY Recognition Server) paid   X  
Amazon Textract free & paid   X Cloud service
Azure OCR (Microsoft) paid   X Cloud service
Cloud Vision API (Google) free & paid   X Cloud service
Tesseract OCR free   X NYU libraries ‘Tesseract OCR Software Tutorial’
Text Extractor (Microsoft PowerToys) free   X  
Voyant Tools free   X ‘Reading and analysis environment for digital texts’. Web application
ChatGPT free     Parsing OCR output text into Darwin Core fields

* various online platforms are available

Georeferencing

Georeferencing is a key process that enables specimens to be used in a variety of ways from geospatial analyses to understanding the history of collections. More information about the importance, pitfalls and recommendations can be found in Georeferencing Checklist.

Software Cost Georeferencing Notes
Ali-Bey free X ‘An open collaborative georeferencing web application’ (Marcer et al., 2022). Web application, API and dockerized version
GEOLocate free X ‘Platform for georeferencing Natural History Collections data’. Web application.
GeoPick free X ‘Online companion tool for easy georeferencing following best practices’. Web application - Marcer A, Escobar E, Uribe F, Chapman AD, Wieczorek JR (in development).
Georeferencing Calculator free X Web or desktop application - Wieczorek C, Wieczorek J (2021). Georeferencing calculator manual (Bloom et al., 2020).

Data Quality

As there are many methods and resources (from established to new/evolving) for identifying errors and cleaning data, we will only highlight a few examples here. More information about data quality and methods of data cleaning and be found in the following GBIF reports: ‘Principles of Data Quality’ (Chapman, 2005a) and ‘Principles and Methods of Data Cleaning’ (Chapman, 2005b).

Data Cleaning and Quality Checks

Software*   Data Cleaning Data Quality Checks Notes
OpenRefine free X   Desktop application. Cleaning data, transformation of data, parsing data, reconciliation etc.
Global Names Verifier free   X Verifies scientific names against biodiversity data-sources. Web application, command line or through an API (Application Programming Interface).

* scripts can be used alongside manual data checks and data cleaning processes.

References and Further Reading

Bloom DA, Wieczorek JR, Zermoglio PF (2020). Georeferencing Calculator Manual. Copenhagen: GBIF Secretariat. https://docs.gbif.org/georeferencing-calculator-manual/1.0/en/

Chapman AD (2005a). Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. https://assets.ctfassets.net/uo17ejk9rkwj/2gupj7dJIw62UeOUYiqSsm/0a4bb732bd7fd8cf28f7703dc20a43ba/Data_Quality_-_ENGLISH.pdf

Chapman AD (2005b). Principles and Methods of Data Cleaning – Primary Species and SpeciesOccurrence Data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. https://assets.ctfassets.net/uo17ejk9rkwj/46SfGRfOesU0IagMMAOIkk/1c03ea3e21fcd9025cc800d786890e72/Principles_20and_20Methods_20of_20Data_20Cleaning_20-_20ENGLISH.pdf

Marcer A, Escobar A, Garcia-Font V, Uribe F, Marcer A, Uribe F (2022). Ali-Bey - an open collaborative georeferencing web application. Biodiversity Data Journal; 10, http://doi.org/10.3897/bdj.10.e81282

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, Giovanni R, Robertson T, Vieglais D (2012). Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. https://doi.org/10.1371/journal.pone.0029715

Software Citations

Wieczorek C, Wieczorek J (2021). Georeferencing Calculator. Available: http://georeferencing.org/georefcalculator/gc.html. Accessed [2023-01-13].

Marcer A, Escobar E, Uribe F, Chapman AD and Wieczorek JR (v.1.0.2023-Beta, in development). GeoPick: an online companion tool for easy georeferencing following best practices [Web application].

Authors

Louise Allan

Contributors

Anne Koivunen, Laurence Livermore, Arnald Marcer, Deb Paul

Licence

CC-BY-4.0

Citation

Allan, L. (2022) DiSSCo Digitisation Guide: Software. Version 1.0 Available at: https://dissco.github.io/DataManagement/Software/Software.html

Document Control

Version: 1.0
Changes since last version: N/A
Last Updated: 20 March 2023

Edit This Page

You can suggest changes to this page on our GitHub