DScribe: Library of descriptors for machine learning in materials science

Tutkimustuotos: Lehtiartikkelivertaisarvioitu



  • Nanolayers Research Computing Ltd
  • Norwegian University of Science and Technology
  • Technische Universität München
  • Graduate School Materials Science in Mainz
  • Kanazawa University


DScribe is a software package for machine learning that provides popular feature transformations (“descriptors”) for atomistic materials simulations. DScribe accelerates the application of machine learning for atomistic property prediction by providing user-friendly, off-the-shelf descriptor implementations. The package currently contains implementations for Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Function (ACSF) and Smooth Overlap of Atomic Positions (SOAP). Usage of the package is illustrated for two different applications: formation energy prediction for solids and ionic charge prediction for atoms in organic molecules. The package is freely available under the open-source Apache License 2.0. Program summary: Program Title: DScribe Program Files doi: http://dx.doi.org/10.17632/vzrs8n8pk6.1 Licensing provisions: Apache-2.0 Programming language: Python/C/C++ Supplementary material: Supplementary Information as PDF Nature of problem: The application of machine learning for materials science is hindered by the lack of consistent software implementations for feature transformations. These feature transformations, also called descriptors, are a key step in building machine learning models for property prediction in materials science. Solution method: We have developed a library for creating common descriptors used in machine learning applied to materials science. We provide an implementation the following descriptors: Coulomb matrix, Ewald sum matrix, sine matrix, Many-body Tensor Representation (MBTR), Atom-centered Symmetry Functions (ACSF) and Smooth Overlap of Atomic Positions (SOAP). The library has a python interface with computationally intensive routines written in C or C++. The source code, tutorials and documentation are provided online. A continuous integration mechanism is set up to automatically run a series of regression tests and check code coverage when the codebase is updated.


JulkaisuComputer Physics Communications
TilaJulkaistu - 2019
OKM-julkaisutyyppiA1 Julkaistu artikkeli, soviteltu

Lataa tilasto

Ei tietoja saatavilla

ID: 37482191