The QMEE CDT Project proposal database

Welcome to the QMEE CDT Project proposal database. This is a live list of projects proposals put forward by PIs across the CDT partner institutions

PIs/Supervisors will continue to add projects to this list over the next few months, so do keep checking back! You can search the projects using the box below: simply enter some text and press Search to do a text search across all the database fields. If you want to search more finely, the search tool also allows you to search on particular details of the project descriptions: you will see these finer search options appear if you click on the search box.

Click on the view button next to a project to get the full proposal description. If you want to download project details, either for all projects, or for a subset you have searched for, then click on the 'Download details' button.

Predictive text mining for global biodiversity indicators and models
Biodiversity indicators and models are important for policy decisions, scientific understanding and public engagement. Although advances in remote-sensing are revolutionising the production of indicators of ecosystem structure and function, biodiversity indicators still rely on bottom-up aggregation and synthesis of data from many local surveys. Current ways of doing this are very time-consuming, not because relevant data are scarce - they are not - but because finding and extracting them from the burgeoning literature is still a slow, manual process. This project would use new developments from text mining to greatly increase the rate of data flow into some of the leading biodiversity indicators and models - and see how this changes the picture they give. The Living Planet Index (LPI -, based at ZSL, synthesises population trend data from thousands of vertebrate species, across terrestrial, freshwater and marine habitats worldwide, into one of the most widely-used and reported indicators of biodiversity (see e.g. LPI has been adopted by the Convention on Biological Diversity (CBD) as an indicator of progress towards its 2020 target to ‘take effective and urgent action to halt the loss of biodiversity.’ The PREDICTS project (, based at the NHM, collates ecological assemblage data from sites worldwide that face different pressures relating to land use - the dominant current driver of terrestrial biodiversity loss - and estimates from them the current global status of a range of indicators. Among these, the Biodiversity Intactness Index (BII: has been adopted by IPBES as a core indicator of biodiversity. Both the LPI and PREDICTS databases - despite being global - have key spatial and thematic gaps. These gaps undermine attempts to produce indicators that are representative of biodiversity, rather than reflecting the data’s geographic, taxonomic or ecological biases. Text mining can not only greatly expand the respective data bases, but do so in a way that adds the data where it is most valuable. Crucially, both databases are significant annotated datasets, making them excellent candidates for the development and assessment of supervised classification methods. This project will aim to develop models that identify and extract biodiversity data, validate them on these large existing databases, then deploy them on large repositories such as NCBI/Pubmed, PLoS, and the holdings at Imperial/NHM/ZSL to identify new data. Recent pilot work at ZSL has suggested that high degrees of accuracy (precision & recall > 0.9) may be possible in identifying articles containing abundance trends. Many refinements are possible, meaning the studentship can develop in any of several directions. Possibilities include: Extracting taxonomic/geographical information to enable targeted extraction of data facilitating gap-filling within the biodiversity databases. Automated/semi-automated extraction of trend data, leveraging a range of existing tools (e.g. Golden Gate Imagine - part of and crowd-sourcing (e.g. through a Zooniverse project). By expanding and improving the databases from which LPI and BII are estimated, this project has the potential to literally change how biodiversity researchers - and policymakers - perceive the natural world.
Andy Purvis
Robin Freeman
Computing, Quantitative data analysis, Ecological observations / data collection
Andy Purvis
• Text mining/natural language processing • Machine learning (supervised/unsupervised classification) • SQL, Python • GAMs, time-series analysis, GLMMs (when data go into the indicator frameworks)
LPI and PREDICTS are already field-leading approaches to understanding the status and trends of biodiversity, but are strongly limited by the rates of finding and extracting suitable data and suffer from geographic & thematic data biases; text-mining and machine-learning have the potential to massively speed up data integration while improving taxonomic and geographical representativeness.
Many possible questions including: How can the ecological forecast horizon be applied to indicators? When can spatial and temporal data be used interchangeably? How does spatial scale affect population and assemblage trends? Are land-use impacts on aboveground and belowground biodiversity linked? How strong is phylogenetic signal in species responses?
All global biodiversity datasets have strong geographical biases at both large and small scale; this project would strengthen them all, improving conservation monitoring and planning. WWF have offered to be a CASE partner on this project because of its real-world usefulness.
LPI and PREDICTS feed into international policy processes such as IPBES and CBD. Improving the way they reflect the true state of nature will greatly enhance both approaches and will raise the bar for biodiversity indicators more generally. The approach will also make it easier to build ‘pinpoint indicators’ to meet new needs.
This project will take approaches developed within the field of artificial intelligence and apply them to find and extract biodiversity data to strengthen biodiversity models and indicators that inform management and policy decisions.
Conservation ecology, Population ecology, Ecosystem-scale processes and land use
Text mining/machine learning (IoZ/NHM/Imperial); working with library holdings (NHM/Imperial); statistical modelling (Imperial/NHM/ZSL); indicators/models (NHM/ZSL); transferable skills (through Imperial)
Imperial, NHM and ZSL
2017-10-02 20:11:41