## Abstrakti

The computational recognition and resolution of spectra is usually preceded by Pre-Processing (PP) operations to improve the signal quality and highlight the information of interest. However, little systematic study has been carried out on how the combined use of different PP methods affects the result, often leaving the researcher to rely on common sense when deciding the computational strategy.

This work addresses the issue through a simulation experiment. Fictitious spectra of mixtures of five components at varying concentrations and corrupted by different types of noise were processed by various combinations of PP techniques: smoothening, baseline correction, normalization and reduction to Principal Components (PC). The original mixtures were then recognized by k-means Cluster Analysis and the quality of this recognition as a function of the PP procedure and the distance metric was quantified in terms of the Rand and silhouette coefficients. These simulated spectra were designed emulating data commonly encountered in Raman imaging, but these results are applicable to other types of spectroscopy as well. Among the considered PP combinations, the one that yielded the best Rand coefficient employed a polynomial baseline correction method [1], Whittaker smoother, Manhattan normalization, PC transformation explaining 80% of variance and clustering using either Euclidean or city-block distance. A few other combinations had a very similar outcome, whereas certain PP sequences produced a clearly incorrect clustering. The robustness of each PP combination with respect to particular types of noise is also discussed.

[1] Carlo G. Bertinetto, Tapani Vuorinen. “Automatic Baseline Recognition for Fast Correction Using Continuous Wavelet Transform (CWT)”. Applied Spectroscopy. Submitted.

This work addresses the issue through a simulation experiment. Fictitious spectra of mixtures of five components at varying concentrations and corrupted by different types of noise were processed by various combinations of PP techniques: smoothening, baseline correction, normalization and reduction to Principal Components (PC). The original mixtures were then recognized by k-means Cluster Analysis and the quality of this recognition as a function of the PP procedure and the distance metric was quantified in terms of the Rand and silhouette coefficients. These simulated spectra were designed emulating data commonly encountered in Raman imaging, but these results are applicable to other types of spectroscopy as well. Among the considered PP combinations, the one that yielded the best Rand coefficient employed a polynomial baseline correction method [1], Whittaker smoother, Manhattan normalization, PC transformation explaining 80% of variance and clustering using either Euclidean or city-block distance. A few other combinations had a very similar outcome, whereas certain PP sequences produced a clearly incorrect clustering. The robustness of each PP combination with respect to particular types of noise is also discussed.

[1] Carlo G. Bertinetto, Tapani Vuorinen. “Automatic Baseline Recognition for Fast Correction Using Continuous Wavelet Transform (CWT)”. Applied Spectroscopy. Submitted.

Alkuperäiskieli | Englanti |
---|---|

Tila | Julkaistu - 2013 |

Tapahtuma | Scandinavian Symposium on Chemometrics - Stockholm, Ruotsi Kesto: 17 kesäkuuta 2013 → 20 kesäkuuta 2013 Konferenssinumero: 13 |

### Conference

Conference | Scandinavian Symposium on Chemometrics |
---|---|

Lyhennettä | SSC13 |

Maa | Ruotsi |

Kaupunki | Stockholm |

Ajanjakso | 17/06/2013 → 20/06/2013 |