Efficient predictive models and data analysis techniques for the analysis of photometric and spectroscopic observations of galaxies are not only desirable, but also required, in view of the overwhelming quantities of data becoming available. We present the results of a novel application of Bayesian latent variable modelling techniques, where we have formulated a data-driven algorithm that allows one to explore the stellar populations of a large sample of galaxies from their spectra, without the application of detailed physical models. Our only assumption is that the galaxy spectrum can be expressed as a linear superposition of a small number of independent factors, each a spectrum of a stellar subpopulation that cannot be individually observed. A probabilistic latent variable architecture that explicitly encodes this assumption is then formulated, and a rigorous Bayesian methodology is employed for solving the inverse modelling problem from the available data. A powerful aspect of this method is that it formulates a density model of the spectra, based on which we can handle observational errors. Further, we can recover missing data both from the original set of spectra which might have incomplete spectral coverage of each galaxy, or from previously unseen spectra of the same kind. We apply this method to a sample of 21 ultraviolet-optical spectra of well-studied early-type galaxies, for which we also derive detailed physical models of star formation history (i.e. age, metallicity and relative mass fraction of the component stellar populations). We also apply it to synthetic spectra made up of two stellar populations, spanning a large range of parameters. We apply four different data models, starting from a formulation of principal component analysis (PCA), which has been widely used. We explore alternative factor models, relaxing the physically unrealistic assumption of Gaussian factors, as well as constraining the possibility of negative flux values that are allowed in PCA, and show that other models perform equally well or better, while yielding more physically acceptable results. In particular, the more physically motivated assumptions of our rectified factor analysis enable it to perform better than PCA, and to recover physically meaningful results. We find that our data-driven Bayesian modelling allows us to identify those early-type galaxies that contain a significant stellar population that is ≲1-Gyr old. This experiment also concludes that our sample of early-type spectra showed no evidence of more than two major stellar populations differing significantly in age and metallicity. This method will help us to search for such young populations in a large ensemble of spectra of early-type galaxies, without fitting detailed models, and thereby to study the underlying physical processes governing the formation and evolution of early-type galaxies, particularly those leading to the suppression of star formation in dense environments. In particular, this method would be a very useful tool for automatically discovering various interesting subclasses of galaxies, for example, post-starburst or E+A galaxies. © 2005 RAS.
- galaxies: elliptical and lenticular
- galaxies: stellar content
- methods: data analysis