Dr Beattie was lead researcher on a large multi-faculty interdisciplinary consortium investigating potential uses of optical spectroscopy in clinical settings. A consistent recurring ‘feature’ of those datasets was the presence of intense and very variable background unrelated to the portion of the signal that was of interest.
A wide range of cutting edge baseline correction methods were investigated and none were able to reliably and consistently eliminate the background signal. This prevented extraction of clinically useable information from the data.
While it was clear that the intense backgrounds were highly variable and strongly depressing the utility of the data, it was also clear from multivariate analysis of the data that the variations were not random.
Since there was clearly some consistency within the dataset, it must be possible to more consistently correct the problem. Dr Beattie could see the solution in front of his eyes; calculate the baseline correction on the multivariate results. But the software packages he was using were unable to allow manipulation of the type he needed.
Need: reliable processing in noisey data
The solving of this conundrum required a multi pronged approach:
Understand the mathematical basis of multivariate analysis, especially PCA
Learn how to manipulate data more flexibly
Learn how to manipulate a PCA and apply that to new data
A strategy was also devised for testing the approach, which required a number of different experimental approaches to control critical aspects of the baseline and signal. It was essential to understand when the approach was needed and what benefits it may impart
All this had to be implemented in parallel with a full program of research with an emphasis on clinical utility, not interesting and novel methodologies. This meant that each stage of development needed to be applied incrementally and iteratively so that small gains in the signal processing could be exploited for the advancement of the clinical understanding and continuously demonstrate progress and relevance.
At the time of starting this project I was processing my data in several different software platforms. In order to make demonstrable progress to justify development I had to find a method of proving the concept using those same programs, fitting into a similar workflow.
The data would be recorded in one program, processed in another, collated into a spreadsheet, imported into two different analysis packages and then results imported back into the processing software for interpretation.
Aim: separate signal and background
To achieve a basic proof of concept the following steps had to be implemented:
extract the relevant results (‘loadings) from the PCA software,
import them into spectral processing software to correct the baselines
import that into spreadsheets
calculate the contribution of each background to new data
reconstruct the background for each sample
Subtract the background from the original data
This was replacing just one step in the workflow, but it transformed the results obtained from the data. It allowed handling of data from a very complex tissue located in the human eye and measure a range of critical protein modifications implicated in pathologies associated with the ageing process, as detailed here.
The proof was not elegant, but it had the minimal impact on the main project as it exploited existing workflows as far as was possible. It was however clear that the procedures developed were incapable of scaling up to meet the demands of 6 projects across 5 departments. The success of the proof of concept made it possible to justify the resource needed to develop a workable solution.
The development needed to streamline the entire data handling workflow and it was quickly recognised that a programmatic approach was required. Dr Beattie undertook extensive training in the Matlab platform to learning how to more flexibly exploit his data.
A single unified workflow was gradually built up in order to minimise disruption to the main projects. Time devoted to understanding the underlying math of PCA allowed identification of more efficient ways to implement the solution used in the PoC.
Multiple software into 1 platform
The next step was to automate the importing of the data into Matlab, replacing the need to manually build datasets in a spreadsheet. These two steps alone reduced the typical time required to collate and process a dataset from 8 days to 2 hours.
The next phase was to build further tools to allow improved control of critical options, enhanced range of background correction methods and new tools for interpretation of results.
The final phase was to build tools to incorporate analysis options on the processed data to make a complete package to simplify and streamline the workflow.
The end result of this phase was a tool that allowed entire datasets collected over months (or in some cases, years) to be analysed within a day. This proved critical in extracting clinically useful information across a range of projects. These included oncology, lung function, ophthalmic disorders and pharmaceutical formulation.
While the new workflow allowed improved results, it was not possible to apply it for scientific publications until it was validated rigorously.
A set of experiments were devised to probe the utility of the approach compared to a range of common methods. These tests were designed to stretch the method to breaking point to better understand its true utility and relevance. Without understanding where a method breaks down it is impossible to truly optimise.
Result: Insensitive to Noise
The culmination of the tests revealed a number of key facets of the approach:
It is insensitive to noise - it could reliably correct the baseline even when the signal to noise ratio was below the detection threshold for a signal
It is reproducible - it always returns the same estimate for repeated measures of the same fixed stable sample
It handles very complex situations - it teases apart all sources of variation allowing each to be handled in isolation
It allows improvement in correction by increasing dataset size, rather than improving signal quality - this is critical in situations where power must be kept low or exposure time short
Performance can be maintained while reducing technical or methodological specifications saving time and money
The approach offers unparalleled reliability and reproducibility, opening up new possibilities in the field of analytics based on signals.
Osentia materials including test kit and contents plus typical reports adapted to the customer’s results
Shortly after completing my postdoctoral research I was approached by analytics company who had a problem, their test didn’t work well enough for the market to accept. This is discussed in more detail in the Osentia tab above, but one of several underlying problems was a huge variability in the broad background underlying the signal of interest (as shown in the figure above and in this paper).
The use of the advanced signal processing workflow was a critical stepping stone in improving the test performance. The variability in the background makes it unreliable for testing data, inducing too much uncertainty in the predicted outcome. Traditional methods of correcting the background did not improve matters much, largely because of the number of different background shapes and the extreme shape of some of them.
The advanced signal processing was able to tailor a background correction to each of the sources and therefore reliably and accurately correct for the broad background. This meant that the variation from the signal of interest dominated the final data, allowing reliable accurate results.
“Dr Beattie was the lead postdoc in our consortium project focused on developing clinical and pharmacological analyses. The initial results were slower than anticipated, as many of the projects had technical difficulties. Dr Beattie approached solving these difficulties in a rigorous and thorough manner. He juggled the demands of showing progress in the projects while simultaneously solving critical technical problems. The consortium proved to be a huge success, generating many novel applications, publications and follow on grants.
The advanced signal processing workflows that Dr Beattie developed have been applied to a wide range of biomedical solutions, repeatedly demonstrating considerable improvements in performance.
Extracting meaningful information from complex real world data.
Clarity - cuts through the clutter
Reliability - consistent and traceable
Reproducible - same results every time
Allowing new insight into critical pathologies.
Analyse lipid distribution and metabolism in the lung
Analyse biochemical distribution of the retina and how it is perturbed by retinitis pigmentosa
Analyse the degradation of critical structural proteins in the eye implicated in ageing
Enabling commercial health check test
Achieving statistically significant improvements vs competitor products