Throughout my research career I have had to work with difficult samples in which the raw signal obtained from my sample have been dominated by a broad, non-Raman background signal. Traditionally this has been corrected by acquiring each signal to a high quality and performing a background estimation on this signal. The estimated background is subtracted from the raw data to leave baseline corrected data (1,2). However, it is not always realistic to accumulate the signals to a sufficiently high quality to reliably correct the background on each signal. In high volume or high speed mapping experiments a small increase in acquisition time can have a very large increase in the total accumulation time. In kinetic experiments the acquisition time is determined by the required time resolution. In clinical in-vivo analysis a considerable range of limitations come into play including low maximum permitted laser powers and consideration of patient comfort. I was tasked with developing the use of Raman spectroscopy for the detection of ocular protein modifications with the ultimate aim of developing an in-vivo diagnostic. This requires very restrictive laser powers and short accumulation times, with the consequence that the analysis must make use of very poor quality signals. Figure 1 shows some raw data from donor samples, and it is clear that the raw signal is dominated by a broad non-Raman background, and indeed in some samples the background was even higher, accounting for 99% of the signal. I have developed a method of correcting the non-Raman background that can consistently and reliably correct the baseline of the data as is shown in Figure 2 (3). Details of this approach can be read in full in that publication, here I will discuss the implications of the approach.
Noise Insensitive Background Signal
Subtraction (NIBSS) is not a new method of estimating the uninformative background
signal, rather it is a new method of applying existing estimation methods. Put
simply, NIBSS estimates the background on the results of a multivariate
analysis of a dataset rather than the traditional approach of estimating it on
the individual signals. This has crucial implications.
1) The background shape is estimated a limited number of times, determined by the complexity of the dataset rather than the number of signals recorded.
2) The accuracy of the background correction improves as the dataset grows, in contrast to the conventional paradigm whereby the accuracy is solely dependant on the individual signal quality.
The consequences of these facts are:
1) Estimates are where uncertainty and variability enters the processing, so fewer estimates reduces the variation induced.
2) The background correction of low quality signals can be dramatically improved by gathering more data rather than by improving the quality of the individual signals. This becomes critical in situations where power or time limitations prevent the acquisition of good quality signals. As alluded to above, my own application I am developing fitted this scenario precisely, where I am employing lasers to measure from patient’s eyes. Here we are very limited in the maximum laser power and the amount of time a patient can reasonably be expected to accept a laser being shone on their eye. By gathering a larger dataset of low quality signals it is possible to perform the background correction with high accuracy and reproducibility.
Figure 3 shows the variation in the prediction of heme content from unnormalized data corrected using the traditional per-signal correction and the equivalent data corrected using the unnormalized data corrected using NIBSS. This figure has displayed results from unnormalized data so that the effects of this step do not confuse the trends discussed. The most striking difference between the two approaches, apart from the size of the scaling factor for the best fit trend (5.5 times higher for traditionally corrected data), is the power of the trendline. The prediction variance would be expected to be directly proportional to the noise level of the signal, i.e. inversely proportional to the signal to noise ratio (a power of -1). The traditionally corrected data however varies against the power of -0.83 suggesting that the prediction variance rises faster than the noise level in the signal. In contrast the power of the best fit line for NIBSS is -0.99, not significantly different from -1, demonstrating that the reproducibility of the background correction is insensitive to noise as the prediction variance varies only in response to the shot noise of the signal, not any variation induced by processing.
Background correction is a subtractive process, while normalisation involves dividing the signal by a normalisation factor. This means that normalisation has the potential to induce multiplicative effects on the prediction variance. Figure 4 compares the prediction variance against signal to noise ratio for the Raman data processed using the traditional approaches and SVD based approaches (NIBSS and SVD based normalisation). Upon normalisation the prediction variance is decreased by a factor of 2 for the low quality signals (SNR 0.8) normalised using an SVD based approach after NIBSS. The power of the trend is now slightly higher than -1 suggesting that there is a slight interaction between the shot noise and SVD based normalisation but the overall magnitude of the prediction variation is decreased substantially throughout the range studied (SNR down to 0.8). In contrast the relationship between traditionally processed (linear interpolation background correction and band area normalisation) data is much greater than -1 at -1.6. The prediction variance is greater for these traditionally corrected signals throughout the range studied but the difference becomes considerably more magnified at lower SNR. Indeed the ratio of the variances from the two methods follows almost a square relationship (power 1.82) with SNR demonstrating that the variance induced by the normalisation step is multiplying with the variance induced by the baseline correction. NIBSS induces no additional variation (variance induced is a factor of 1) in contrast to the traditional per-signal correction approach, hence there is no induced variation to magnify during the normalisation step.
In terms of implications for experimental methodology the results are striking. Table 2 lays out the effort required to achieve a benchmark confidence in the data of 3%, a figure widely considered as an ideal target for laboratory analysis. Confidence intervals depend on the ratio of the standard deviation (times 1.96 for 95 % intervals) ratioed to the square root of the number of samples measured. Consequently any differences in reproducibility between two methods becomes magnified considerably (to the power of 2), making the effort required to achieve parity grow to the power of two. With very poor quality signals (SNR = 1) the relative variance from traditionally applied background correction to NIBSS is a factor of 17. It requires just 61 parity signals to be 95% sure the true mean is within 3% of the reported prediction when using NIBSS. In order to be as confident about traditionally processed data the experimenter would need to acquire 17,000 spectra – 290 times more effort than for NIBSS processed data. An alternative would be to seek to improve the signal quality to reduce the number of replicates required (not always an option due to experimental, kinetic, safety or comfort reasons). In order to be confident that a single measurement will produce a prediction within 3% of the true value requires a signal to noise ratio of just 10 for NIBSS and 21 for traditionally processed data, requiring 4.4 times more effort to achieve the same reliability per signal. It isn’t until the signal to noise ratio reaches 50 that the traditional and NIBSS approaches give the approximately the same level of prediction variance.
Because NIBSS reduces the effort required to achieve a determined level of accuracy, these gains can be exploited in a number of ways. The aim I was originally aiming for was to use the efficiency gains to make a cutting edge application feasible and the use of NIBSS has brought this within a clinically feasible realm. The measurements can be carried out using a power (1 mW) below the maximum permissible exposure (1.4 mW) and still achieve a reliable prediction in under 15 s (few patients would accept laser probing for several minutes, even when using near infrared wavelengths).
Anyone interested in finding out about using the algorithm can read up on the prototype here
1. Beattie, J. R., Bell, S. E. J., Borgaard, C., Fearon, A. and Moss, B. W. Prediction of adipose tissue composition using Raman spectroscopy:average properties and individual fatty acids. Lipids (2006) 41, 287-294
2. Beattie, R. J., Bell, S. J., Farmer, L. J., Moss, B. W. and Desmond, P. D. Preliminary investigation of the application of Raman spectroscopy to the prediction of the sensory quality of beef silverside. Meat Science (2004) 66, 903-913
4. Beattie, J.R., McGarvey J.J., Estimation of Signal backgrounds on Multivariate loadings improves model generation in face of complex variation in backgrounds and constituents. J Raman Spec. (2013) 44.p. 329-338
5.Beattie, J.R. Multivariate Analysis for the Processing of Signal, OGST (2014),DOI: 10.2516/ogst/2013185
|Home | About Me | Publications | Resources | External Links | Sitemap|