Abstract
A growing volume of data is becoming available in the field of electric power systems. The hourly automatic meter reading (AMR) electricity consumption data available from small customers, such as households and small businesses, is a significant new data source. For example, geographic data, wind speed data and phasor measurement unit data add to both the quantity and the significant variety in the available data. This thesis presents how these large data sets can be utilized in power system studies using statistical methodology. A visualization and clustering of a large AMR data set is presented, and consumption models are then estimated for the discovered clusters, i.e., consumer groups. Statistical modelling is applied to wind speed and wind generation data from multiple locations, with the emphasis on understanding the effect of the geographical distribution of wind power. In addition, combined statistical modelling of stochastic distributed generation (e.g., wind and solar power) and electricity consumption is presented, which allows the effects of stochastic generation to be analysed at the distribution system level. Interesting system operation conditions (e.g., power flows, consumption, wind generation) affecting the expected damping of the 0.35 Hz interarea oscillation in the Nordic power system are identified, and their use in the short term prediction of damping is demonstrated using statistical methods. Several different geographically varying risk factors affecting the expected fault rates in power distribution systems are also identified, and the use of the estimated fault rates in automatic network planning is presented. It is argued that the statistical analysis of electricity consumption and generation can also be used in automatic network planning. Although the volume and variety of data are important in enabling data analyses, the value that can be extracted from the data using appropriate data analysis methods is fundamentally the most important aspect. In this thesis, multiple data visualization techniques are presented for finding patterns in the large data sets. The discovered patterns are then modelled using statistical data models. The need to model the probability distributions of the relevant random variables in detail is emphasized. This is especially important in wind power modelling, and was achieved using Monte Carlo simulation.
Translated title of the contribution  Suurten dataaineistojen tilastollinen analyysi ja soveltaminen sähkövoimajärjestelmissä 

Original language  English 
Qualification  Doctor's degree 
Awarding Institution 

Supervisors/Advisors 

Publisher  
Print ISBNs  9789526066097 
Electronic ISBNs  9789526066103 
Publication status  Published  2015 
MoE publication type  G5 Doctoral dissertation (article) 
Keywords
 copula
 damping forecast
 data analysis
 electricity consumption
 fault statistics
 Monte Carlo simulation
 multiple regression model
 statistical modelling
 vector autoregressive model
 wind power
Fingerprint Dive into the research topics of 'Finding Value in Big Data  Statistical Analysis of Large Data Sets with Applications in Electric Power Systems'. Together they form a unique fingerprint.
Cite this
Koivisto, M. (2015). Finding Value in Big Data  Statistical Analysis of Large Data Sets with Applications in Electric Power Systems. Aalto University. http://urn.fi/URN:ISBN:9789526066103