Post-hoc modification of linear models: combining machine learning with domain information to make solid inferences from noisy data

Tutkimustuotos: Lehtiartikkeli

Standard

Harvard

APA

Vancouver

Author

Bibtex - Lataa

@article{445c8bdb18e4472eb503328865165efe,
title = "Post-hoc modification of linear models: combining machine learning with domain information to make solid inferences from noisy data",
abstract = "Linear machine learning models “learn” a data transformation by being exposed to examples of input with the desired output, forming the basis for a variety of powerful techniques for analyzing neuroimaging data. However, their ability to learn the desired transformation is limited by the quality and size of the example dataset, which in neuroimaging studies is often notoriously noisy and small. In these cases, it is desirable to fine-tune the learned linear model using domain information beyond the example dataset. To this end, we present a framework that decomposes the weight matrix of a fitted linear model into three subcomponents: the data covariance, the identified signal of interest, and a normalizer. Inspecting these subcomponents in isolation provides an intuitive way to inspect the inner workings of the model and assess its strengths and weaknesses. Furthermore, the three subcomponents may be altered, which provides a straightforward way to inject prior information and impose additional constraints. We refer to this process as “post-hoc modification” of a model and demonstrate how it can be used to achieve precise control over which aspects of the model are fitted to the data through machine learning and which are determined through domain information. As an example use case, we decode the associative strength between words from electroencephalography (EEG) reading data. Our results show how the decoding accuracy of two example linear models (ridge regression and logistic regression) can be boosted by incorporating information about the spatio-temporal nature of the data, domain information about the N400 evoked potential and data from other participants.",
keywords = "multivariate analysis, linear model, prior knowledge, event-related potentials, N400, EEG",
author = "{van Vliet}, Marijn and Riitta Salmelin",
year = "2019",
month = "9",
day = "26",
doi = "10.1016/j.neuroimage.2019.116221",
language = "English",
volume = "204",
pages = "1--14",
journal = "NeuroImage",
issn = "1053-8119",

}

RIS - Lataa

TY - JOUR

T1 - Post-hoc modification of linear models

T2 - combining machine learning with domain information to make solid inferences from noisy data

AU - van Vliet, Marijn

AU - Salmelin, Riitta

PY - 2019/9/26

Y1 - 2019/9/26

N2 - Linear machine learning models “learn” a data transformation by being exposed to examples of input with the desired output, forming the basis for a variety of powerful techniques for analyzing neuroimaging data. However, their ability to learn the desired transformation is limited by the quality and size of the example dataset, which in neuroimaging studies is often notoriously noisy and small. In these cases, it is desirable to fine-tune the learned linear model using domain information beyond the example dataset. To this end, we present a framework that decomposes the weight matrix of a fitted linear model into three subcomponents: the data covariance, the identified signal of interest, and a normalizer. Inspecting these subcomponents in isolation provides an intuitive way to inspect the inner workings of the model and assess its strengths and weaknesses. Furthermore, the three subcomponents may be altered, which provides a straightforward way to inject prior information and impose additional constraints. We refer to this process as “post-hoc modification” of a model and demonstrate how it can be used to achieve precise control over which aspects of the model are fitted to the data through machine learning and which are determined through domain information. As an example use case, we decode the associative strength between words from electroencephalography (EEG) reading data. Our results show how the decoding accuracy of two example linear models (ridge regression and logistic regression) can be boosted by incorporating information about the spatio-temporal nature of the data, domain information about the N400 evoked potential and data from other participants.

AB - Linear machine learning models “learn” a data transformation by being exposed to examples of input with the desired output, forming the basis for a variety of powerful techniques for analyzing neuroimaging data. However, their ability to learn the desired transformation is limited by the quality and size of the example dataset, which in neuroimaging studies is often notoriously noisy and small. In these cases, it is desirable to fine-tune the learned linear model using domain information beyond the example dataset. To this end, we present a framework that decomposes the weight matrix of a fitted linear model into three subcomponents: the data covariance, the identified signal of interest, and a normalizer. Inspecting these subcomponents in isolation provides an intuitive way to inspect the inner workings of the model and assess its strengths and weaknesses. Furthermore, the three subcomponents may be altered, which provides a straightforward way to inject prior information and impose additional constraints. We refer to this process as “post-hoc modification” of a model and demonstrate how it can be used to achieve precise control over which aspects of the model are fitted to the data through machine learning and which are determined through domain information. As an example use case, we decode the associative strength between words from electroencephalography (EEG) reading data. Our results show how the decoding accuracy of two example linear models (ridge regression and logistic regression) can be boosted by incorporating information about the spatio-temporal nature of the data, domain information about the N400 evoked potential and data from other participants.

KW - multivariate analysis

KW - linear model

KW - prior knowledge

KW - event-related potentials

KW - N400

KW - EEG

UR - https://aaltoimaginglanguage.github.io/posthoc

U2 - 10.1016/j.neuroimage.2019.116221

DO - 10.1016/j.neuroimage.2019.116221

M3 - Article

VL - 204

SP - 1

EP - 14

JO - NeuroImage

JF - NeuroImage

SN - 1053-8119

M1 - 116221

ER -

ID: 34657509