Data science for sociotechnical systems - from computational sociolinguistics to the smart grid

Sanja Šćepanović

Research output: ThesisDoctoral ThesisCollection of Articles


We live in the Information Age characterized by the exponential growth of the technological capacity to produce and store data (big data) and to process them towards information and knowledge (data science). In particular, large amounts of data are produced during the interaction between people and technology in diverse sociotechnical systems. Data science, as a set of theories and techniques to distill knowledge from data, is recognized as an effective tool to support sociotechnical systems. This dissertation consists of four projects, in which we apply data science for monitoring and interventions in concrete sociotechnical systems: human dynamics, social networks, smart grid and Web cybersecurity. By analyzing mobile phone communication from a developing country, we show how people's socio-economic factors correlate with their dynamics inferred from the data. Consequently, we demonstrate how monitoring mobile phone network can serve as a proxy for census statistics. In developing countries, where censuses are rare and infrequent, this can prove important. Using the Twitter data, we investigate two social phenomena: homophily and the happiness paradox. In addition to finding the evidence for respective sociological theories, we also provide interesting hypotheses for further investigation. In another, theoretical study, we propose an epidemic spreading model for multiplex networks (representing, for instance, user engagement is several social networks). The simulations reveal when the spreading dynamics of the whole system is slower compared to any individual layer. Our model can be employed by the governments, companies, and others who aim to spread information using several social media.In the project on the residential smart grid, we design an intervention targeting improved sustainability. We develop a social energy app to teach and engage people in efficient practices. In data centers, a better understanding is needed of the interplay between computation and energy consumption, before interventions can be proposed. Our results are a step towards such better understanding. In the final project, given a Web crawl, we first show how the underlying distributions in this complex system differ between malicious and clean websites. Then we demonstrate how such knowledge can support detecting malware-affected websites.We conclude this dissertation by presenting a systematic overview and lessons learned from the data science process undertaken in each project.
Translated title of the contributionData science for sociotechnical systems - from computational sociolinguistics to the smart grid
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Gionis, Aristides, Supervisor
  • Hui, Pan, Advisor, External person
  • Nurminen, Jukka, Advisor
Print ISBNs978-952-60-7834-2
Electronic ISBNs978-952-60-7835-9
Publication statusPublished - 2018
MoE publication typeG5 Doctoral dissertation (article)


  • data science
  • human mobility
  • smart grid
  • computational sociolinguistics
  • Web cybersecurity

Fingerprint Dive into the research topics of 'Data science for sociotechnical systems - from computational sociolinguistics to the smart grid'. Together they form a unique fingerprint.

  • Cite this