Advances in Methods of Anomaly Detection and Visualization of Multivariate Data

Jaakko Talonen

Research output: ThesisDoctoral ThesisCollection of Articles

Abstract

Successful machine learning applications have been developed in almost all fields where measurable data exists. For example, computers can learn the best treatment for a particular disease from medical records and self-customized programs can recommend different products for customers. In this thesis, statistical and machine learning methods have been applied in both time series and static multivariate data sets, which have unknown and potentially useful information. Data can be understood better by developing new methods because a large number of data samples and variables makes it difficult to interpret the research materials. The research material for the development of anomaly detection methods and presenting the results consisted of process signal data from Olkiluoto nuclear power plant, the results of the Parliamentary elections and the answers of the voting advice application, and aggregated car inspection data. The process state changes can be detected by the procedures and the visualization techniques developed in this research. These potential anomalies should be detected as soon as possible and in an early stage using the signal measurements. Challenges related to stochastic processes have been solved using recursive models and neural networks. The results related to the static multivariate data demonstrate that the combination of principal component analysis and probability distributions makes it possible to estimate missing values and understand the dependencies of the observations. A significantly larger number of missing data can be estimated by the recommender system and thus the resulting complete data can be explored by other machine learning methods e.g. by a self-organizing map. These methods make it possible to analyze the missing value dependencies of the multivariate data sets and thus improve the detection of anomaly observations. Applying the machine learning methods discussed in this thesis; dramatically increasing information can be utilized more effectively. Data can be modified into an understandable form, detect existing anomalies in it and thus used as decision support regardless of the research area.
Translated title of the contributionEdistysaskeleet monimuuttujadatan poikkeavuuksien tunnistamis- ja visualisointimenetelmissä
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
Supervisors/Advisors
  • Simula, Olli, Supervising Professor
  • Sirola, Miki, Thesis Advisor
  • Sulkava, Mika, Thesis Advisor
Publisher
Print ISBNs978-952-60-6111-5
Electronic ISBNs978-952-60-6112-2
Publication statusPublished - 2015
MoE publication typeG5 Doctoral dissertation (article)

Keywords

  • machine learning
  • data mining
  • process monitoring
  • anomaly detection
  • multivariate data
  • variable selection
  • dimensionality reduction
  • self-organizing map
  • modeling
  • visualization
  • nuclear power plant
  • political data
  • car inspection data

Fingerprint

Dive into the research topics of 'Advances in Methods of Anomaly Detection and Visualization of Multivariate Data'. Together they form a unique fingerprint.

Cite this