Abstract:
It is increasingly more common that an occasion is recorded by multiple individuals with the proliferation of recording devices such as smart phones. When properly aligned, these recordings may provide several audio and visual perspectives to a scene which leads to several applications in restoring, remastering and remixing frameworks in various fields. In this study, we interpret the problem of aligning multiple unsynchronized audio sequences in a probabilistic framework. In this manner, we propose a novel, model based approach where we define a template generative model. We define 6 different generative models using this template covering basically all kinds of features (real valued, positive, binary and categorical). Proper scoring functions that evaluates the quality of an alignment are derived from each model where we are able to penalize non-overlapping alignments and alignment of a single sequence against a pre-aligned sequences. Having defined a cost or score function, a heuristic sequential search algorithm and a Gibbs sampler approach are proposed to find the optimum alignment of sequences on the surfaces defined by derived score functions. In addition we propose a multi resolution alignment algorithm where we combine Sequential Monte Carlo (SMC) samplers and proposed sequential search method. The models and appropriate features are exhaustively evaluated with artificial and real-life data sets. The simulation results suggest that the approach is able to handle difficult, ambiguous scenarios and partial matchings where simple baseline methods such as correlation fail.