Abstract:
In many audio processing tasks, such as source separation, denoising or com- pression, it is crucial to construct realistic and flexible models to capture the physical properties of audio signals. This can be accomplished in the Bayesian framework through the use of appropriate prior distributions. In this thesis, we describe two prior models, Gamma Markov chains (GMCs) and Gamma Markov random fields (GMRFs) to model the sparsity and the local dependency of the energies of time-frequency expan- sion coefficients. We build two audio models where the variances of source coefficients are modelled with GMCs and GMRFs, and the source coefficients are Gaussian con- ditioned on the variances. The application area of these models are not limited to variance modelling of audio sources. They can be used in other problems where there is dependency between variables, such as the Poisson observation models. In single- channel source separation using non-negative matrix factorisation (NMF), we make use of GMCs to model the dependencies in frequency templates and excitation vectors. A GMC model defines a prior distribution for the variance variables such that they are correlated along the time or frequency axis, while a GMRF model describes a non-normalised joint distribution in which each variance variable is dependent on all the adjoining variance variables. In our audio models, the actual source coefficients are independent conditional on the variances and distributed as zero-mean Gaussians. Our construction ensures a positive coupling between the variance variables, so that signal energy changes smoothly over both axes to capture the temporal and/or spectral continuity. The coupling strength is controlled by a set of hyperparameters. Inference on the overall model, i.e., GMC or GMRF coupled with a Gaussian or Poisson observation model, is convenient because of the conditional conjugacy of all of the variables in the model, but automatic optimisation of hyperparameters is crucial to obtain better fits. In GMCs, hyperparameter optimisation can be carried out using the Expectation-Maximisation (EM) algorithm, with the E-step approximated with the posterior distribution estimated by the inference algorithm. In this optimisation, it is important for the inference algorithm to estimate the covariances between the variables inferred, because the hyperparameters depend on them. The marginal likelihood of the GMRF model is not available because of the in- tractable normalising constant. Thus, the hyperparameters of a GMRF cannot be optimised using maximum likelihood estimation. There are methods to estimate the optimal hyperparameters in these cases, such as pseudolikelihood, contrastive diver- gence and score matching. However, only contrastive divergence is readily applicable to models with latent variables. We optimised the hyperparameters of our GMRF- based audio model using contrastive divergence. We tested our audio models that are based on GMC and GMRF models in denois- ing and single-channel source separation problems where all the hyperparameters are jointly estimated given only audio data. Both models provided promising results, but the reconstructed signals by the GMRF model were slightly better and more natural sounding. Our third model makes use of Gamma and GMC prior distributions in an NMF setting for single-channel source separation. The hyperparameters are again optimised during the inference phase and the model needs almost no other design decisions. This model performs substantially better than the previous two models. In addition, it is less demanding in terms of computational power. However, it is designed only for source separation, i.e., it is not a general audio model as the previous two models.