Invited sessions aim to bring together experts to discuss emerging trends, present cutting-edge research, and engage in meaningful discussions on specialised topics that will be of interest to the biometrics community, over a 90 minute period.
Statistics for Biosecurity Surveillance
Organiser and Chair: A/Prof Robert Clark
Prof Andrew Robinson.
University of Melbourne, CEO of CEBRA
“Surveillance for counterfactual scenarios of invasive species - why it’s useful and a convenient way to do it”
The impact of invasive species is affected by a range of factors, many of which can be anticipated in advance – for example, the prevalence of host material, climate suitability, the size of affected agricultural resources, and so on. One factor that cannot be anticipated in advance is the size of the incursion at the time of its detection. Unfortunately, the impact of an incursion is tightly tied to its maturity at detection, ranging from a single seed, for example, to a 50,000 hectare infestation of plants.
We propose a simple probability model for the detection of an invasive species that can either capture or integrate out the consequent uncertainty of the maturity of the incursion. We represent the relationship between surveillance and the detection of the organism using survival analysis: the detection of the incursion is analogous to the survival event; it is a binary occurrence that happens at some point in time, and once it has happened it does not happen again.
Under such a model, we can connect the distribution of the size of the infestation at the time to detection to the probability of detecting the incursion given that it has not already been detected, namely, the hazard function. For example, a popular model for the detection of an incursion of size x with number of traps t and probability of detecting a single pest \(p\) is:
\(h(x,t,p) = 1 - (1 - p)^{tx}\)
Algebra leads us to a size-at-detection pdf. Other corrections are also applied as needed. The outcome is a pdf that is a function of process parameters, enabling straightforward assessment of different surveillance choices. Parameter estimates for the distribution can be derived from first principles, field experiments, or expert elicitation.
In this presentation we will derive and demonstrate the use of the survival-based incursion size at detection pdf and discuss its implications and challenges.
Dr Mahdi Parsa and Dr Belinda Barnes
Department of Agriculture, Fisheries and Forestry
“Surveillance for counterfactual scenarios of invasive species - why it’s useful and a convenient way to do it”
Effective eradication of invasive species over large areas requires strategic allocation of resources between control measures and surveillance activities. This study presents an analytical Bayesian framework that integrates stochastic modelling and explicit measures of uncertainty to guide decisions in complex eradication scenarios. By applying Shannon entropy to quantify uncertainty and incorporating the expected value of perfect information (EVPI), the framework identifies conditions under which investment into control or surveillance becomes worthwhile. Findings show that strategies which hedge against uncertainty can markedly improve the robustness of eradication outcomes with only marginal increases in expected costs. This approach offers practical tools for designing more cost-effective and reliable eradication programs and for prioritising data collection to reduce uncertainty where it has the greatest impact.
Dr Sumon Das and A/Prof Robert Clark
Australian National University
“Bayesian clustered ensemble prediction for multivariate time series”
Group testing plays a vital role in biosecurity operations worldwide, particularly in minimising the risk of introducing exotic pests, contaminants, and pathogens through imported agricultural products. A common screening strategy involves pooling items from consignments and testing each group for contamination presence, with consignments typically rejected if any group tests positive. Although screening designs often target a high probability of detection assuming a fixed minimum prevalence, analysing the historical results of these tests to infer the extent of contamination in non-rejected consignments (referred to as leakage) is less common.
This study advances censored beta-binomial (BB) models to address contamination risk in frozen seafood imports into Australia, incorporating imperfect tests. Motivated by the characteristics of our case study, we develop a new class of BB models that impose a minimum positive consignment propensity threshold, capturing scenarios where contamination is either absent or exceeds a known minimum level. To fit these models, we propose a Metropolis-Hastings (MH) algorithm conditioned on prior distributions for sensitivity and specificity, allowing efficient estimation of quantities related to contamination levels. We analyse historical testing data under multiple scenarios using the proposed MH algorithm, yielding novel insights into both contamination risk and leakage.
Finally, we use model-based simulations to communicate risk levels, providing key insights into potential undetected contamination.
Dr Raphael Trouvé
University of Melbourne, Senior Research Fellow in Forest Dynamics
“Optimal sampling in border biosecurity”
Border biosecurity faces mounting pressure from increasing global trade, requiring cost-effective inspection strategies to reduce the risk of importing pest and diseases. Current international standards recommend inspecting all incoming consignments (full census) with fixed sample sizes (e.g., 600 units) for high-risk pathways, but this may be overkill for lower-risk pathways with established compliance records. When should agencies use skip-lot sampling (SLS, sometimes called continuous sampling plan), which adaptively reduces inspections based on recent compliance history, over full census inspection?
We developed a propagule pressure equation for SLS in overdispersed pathways and used Lagrange multipliers to derive a solution. Results show the choice depends on pathway overdispersion, sampling costs, and budget constraints. Optimal sample sizes are typically smaller than current recommendations, with better returns from inspecting a larger proportion of consignments rather than larger samples per consignment. This framework provides biosecurity agencies with data-driven guidance for implementing adaptive sampling strategies.
A cluster of modern clustering methods for Biometrics
Organiser and Chair: A/Prof Francis Hui
Dr Skipton Wolley
CSIRO Data 61
“Species archetype models for presence-only data”
Joint species distribution modelling is a recently emerged and potentially powerful statistical method for analysing biodiversity data. Despite the plethora of presence-only occurrence data available for biodiversity analysis, there remain few examples of presence-only multiple-species modelling approaches. We present a mixture-of-regressions model for understanding how groups of species are distributed based on presence-only data. Our approach extends Species Archetype Models using a point process framework and incorporates joint estimation of sighting bias based on all species occurrences included in the model. We demonstrate that our method can accurately recapture mean and variance of parameters from simulated data sets and provide better estimates than those generated from multiple single species presence-only species distribution models. We apply our approach to a Myrtaceae presence-only occurrence dataset from New South Wales, Australia. We show that presence-only Species Archetype Models allow for the propagation of variance and uncertainty from the data through predictions, improving inference made on presence-only biodiversity data for multiple species.
Dr Louise McMillian
School of Mathematics and Statistics, Victoria University of Wellington
“The difficulties of clustering categorical or mixed data”
I will discuss techniques for clustering categorical data that go beyond treating the data as numerical or converting the categories to dummy variables. I will talk about a range of approaches to clustering categorical data, including the R package for clustering ordinal data, which uses model-based clustering via likelihoods and the EM algorithm. Then I will discuss a Bayesian approach from population genetics that may be extendable to general mixed datasets and is the subject of my latest research.
Dr Shonosuke Sugusawa
Faculty of Economics, Keio University
“Bayesian clustered ensemble prediction for multivariate time series”
We propose a novel methodology called the mixture of Bayesian predictive syntheses (MBPS) for multiple time series count data and apply the methodology to predict the numbers of COVID-19 inpatients and isolated cases in Japan and Korea at the subnational level. MBPS combines a set of predictive models and partitions the multiple time series into clusters based on their contribution to predicting the outcome. In this way MBPS leverages the shared information within each cluster and avoids using a multivariate count model, which is generally cumbersome to develop and implement. Our data analyses demonstrate that the proposed MBPS methodology has improved predictive accuracy and uncertainty quantification.
Methods and Practice in Agricultural Analytics
Organiser and Chair: Dr Emi Tanaka
Dr Shuwen Hu
RMIT University
“Leveraging Statistical Modelling and Machine Learning in Animal Science”
Live body weight gain is an important measurement of animal performance. In this study, we predict the cattle’s daily weight gain from five core cattle behaviours and a measure of total daily activity based on accelerometer data. To collect data, we conducted an experiment equipping a herd of 60 Brahman steers with research collars containing triaxial accelerometers over nearly one month in Australia. We used the accelerometer data, which represents the intensity of animal movement, to compute an activity metric within a five-minute window. In addition, we use pre-trained accelerometer-based machine learning models to classify cattle behaviour into grazing, ruminating, resting, walking, drinking or other classes over five-second time windows. Daily behaviour profiles were constructed for each animal and experiment day by aggregating the behaviour predictions over every calendar day. Our objective was to explore how to use behaviours and activity metrics to predict the cattle’s daily weight gain. The daily activity values ranging from 5.44g to 23.69g. The average daily time spent grazing, ruminating, resting, walking and drinking was 8.97±1.12, 7.78 ±1.03, 5.83±1.05, 1.00±0.4, and 0.18±0.12 hours, respectively. Some weather information data are combined in the model to predict the cattle live-weight gain. The best R-squared value is 0.467, with a minimum root mean square error (RMSE) of 0.867 from the linear regression model. The live-weight gain could not be fully explained by measurements taken in this study, but we showed how these factors can influence the variability in cattle performance.
A/Prof Gota Morota
Univeristy of Tokyo
“Evaluating the impact of trait measurement error on genetic analysis of computer vision-based phenotypes”
Quantitative genetic analysis of image- or video-derived phenotypes is increasingly being performed for a wide range of traits. Pig body weight values estimated by a conventional approach or a computer vision system can be considered as two different measurements of the same trait, but with different sources of phenotyping error. Previous studies have shown that trait measurement error, defined as the difference between manually collected phenotypes and image-derived phenotypes, can be influenced by genetics, suggesting that the error is systematic rather than random and is more likely to lead to misleading quantitative genetic analysis results. Therefore, we investigated the effect of trait measurement error on genetic analysis of pig body weight (BW). Calibrated scale-based and image-based BW showed high coefficients of determination and goodness of fit. Genomic heritability estimates for scale-based and image-based BW were mostly identical across growth periods. Genomic heritability estimates for trait measurement error were consistently negligible, regardless of the choice of computer vision algorithm. In addition, genome-wide association analysis revealed no overlap between the top markers identified for scale-based BW and those associated with trait measurement error. Overall, the deep learning-based regressions outperformed the adaptive thresholding segmentation methods. This study showed that manually measured scale-based and image-based BW phenotypes yielded the same quantitative genetic results. We found no evidence that BW trait measurement error could be influenced, at least in part, by genetic factors. This suggests that trait measurement error in pig BW does not contain systematic errors that could bias downstream genetic analysis.
Dr Elle Saber
Australian National University
“Fishing for Heritability in the Gill Microbiome: Why Statisticians Should get out into the field”
Host‐associated microbiomes are increasingly recognised as integral to health, yet the extent to which host genetics shapes these communities remains unclear. While heritable components of gut and skin microbiomes have been documented in several vertebrates, evidence for the gill microbiome of fish is scarce. This exploratory study sampled the gill microbiome of Atlantic salmon within a Tasmanian breeding program to investigate potential genetic influences. Despite careful planning, study participants did not always behave, and the practical constraints of a commercial operation prevented complete data capture. Having participated in the data collection I was better prepared to understand the limitations of the dataset when doing the downstream analysis. The take home message is that, even if we feel like a fish out of water, time spent among the data can be just as valuable as time spent our desks.