Materials Informatics: Augmenting Materials Research with Data-driven Design and Machine Learning

Lauri Himanen

Research output: ThesisDoctoral ThesisCollection of Articles


Materials science is the systematic study and development of materials and their properties. Materials informatics and data-driven materials science are umbrella terms for the scientific practice of systematically extracting knowledge from data produced in materials science. This practice differs from traditional scientific approaches in materials research by the volume of processed data and the more automated way information is extracted. This data-driven approach — sometimes referred to as the 4th paradigm of science — is largely driven by the use of modern hardware and software for data production and storage, the Open Science movement and the methodological developments in data mining and machine learning. This dissertation reviews how materials informatics can be effectively applied to accelerate materials science, focusing on computational, atomistic materials modelling. The topic is divided into two different areas: how the data-driven design and tools are being used to re-imagine the life-cycle of materials data and how machine learning, in particular, can be used to complement existing research methodologies in materials science. These topics are explored by investigating the historical development of materials informatics and by highlighting the modern tools and techniques. This discussion provides a guide for anyone interested in deploying these methods in their research and also covers some of the key challenges that the field of materials informatics still faces. After this overview, the original materials informatics research performed during the studies is summarized. First, the open-source software libraries developed for materials informatics are introduced. These libraries deal specifically with tasks related to the automated structural classification of complex atomistic geometries and the efficient description of materials for machine learning. Next, the studies related to materials discovery using data mining and machine learning are discussed. The first study leverages materials databases in the search for optimal coating materials for perovskite-based photovoltaics while the second study focuses on using machine learning for identifying catalytically active sites on nanoclusters.
Translated title of the contributionMateriaali-informatiikka: datalähtöinen suunnittelu ja koneoppiminen materiaalitieteen tukena
Original languageEnglish
QualificationDoctor's degree
Awarding Institution
  • Aalto University
  • Rinke, Patrick, Supervising Professor
  • Foster, Adam, Thesis Advisor
Print ISBNs978-952-60-8950-8
Electronic ISBNs978-952-60-8951-5
Publication statusPublished - 2020
MoE publication typeG5 Doctoral dissertation (article)


  • materials informatics
  • materials science
  • machine learning
  • data-driven science


Dive into the research topics of 'Materials Informatics: Augmenting Materials Research with Data-driven Design and Machine Learning'. Together they form a unique fingerprint.

Cite this