Finding Value in Big Data - Statistical Analysis of Large Data Sets with Applications in Electric Power Systems

Matti Koivisto

Research output: ThesisDoctoral ThesisCollection of Articles


A growing volume of data is becoming available in the field of electric power systems. The hourly automatic meter reading (AMR) electricity consumption data available from small customers, such as households and small businesses, is a significant new data source. For example, geographic data, wind speed data and phasor measurement unit data add to both the quantity and the significant variety in the available data. This thesis presents how these large data sets can be utilized in power system studies using statistical methodology. A visualization and clustering of a large AMR data set is presented, and consumption models are then estimated for the discovered clusters, i.e., consumer groups. Statistical modelling is applied to wind speed and wind generation data from multiple locations, with the emphasis on understanding the effect of the geographical distribution of wind power. In addition, combined statistical modelling of stochastic distributed generation (e.g., wind and solar power) and electricity consumption is presented, which allows the effects of stochastic generation to be analysed at the distribution system level. Interesting system operation conditions (e.g., power flows, consumption, wind generation) affecting the expected damping of the 0.35 Hz inter-area oscillation in the Nordic power system are identified, and their use in the short term prediction of damping is demonstrated using statistical methods. Several different geographically varying risk factors affecting the expected fault rates in power distribution systems are also identified, and the use of the estimated fault rates in automatic network planning is presented. It is argued that the statistical analysis of electricity consumption and generation can also be used in automatic network planning. Although the volume and variety of data are important in enabling data analyses, the value that can be extracted from the data using appropriate data analysis methods is fundamentally the most important aspect. In this thesis, multiple data visualization techniques are presented for finding patterns in the large data sets. The discovered patterns are then modelled using statistical data models. The need to model the probability distributions of the relevant random variables in detail is emphasized. This is especially important in wind power modelling, and was achieved using Monte Carlo simulation.
Translated title of the contributionSuurten data-aineistojen tilastollinen analyysi ja soveltaminen sähkövoimajärjestelmissä
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Lehtonen, Matti, Supervisor
  • Lehtonen, Matti, Advisor
  • Mellin, Ilkka, Advisor
Print ISBNs978-952-60-6609-7
Electronic ISBNs978-952-60-6610-3
Publication statusPublished - 2015
MoE publication typeG5 Doctoral dissertation (article)


  • copula
  • damping forecast
  • data analysis
  • electricity consumption
  • fault statistics
  • Monte Carlo simulation
  • multiple regression model
  • statistical modelling
  • vector autoregressive model
  • wind power

Fingerprint Dive into the research topics of 'Finding Value in Big Data - Statistical Analysis of Large Data Sets with Applications in Electric Power Systems'. Together they form a unique fingerprint.

  • Cite this