Özet:
In this thesis, we deal with probabilistic methods to track the pitch of a musical instrument in real-time. Here, we take the pitch as a physical attribute of a musical sound which is closely related to the frequency structure of the sound. Pitch tracking is the task where we try to detect the pitch of a note in an online fashion. Our motivation was to develop an accurate and low-latency monophonic pitch tracking method which would be quite useful for the musicians who play lowpitched instruments. However, since accuracy and latency are conflicting quantities, simultaneously maximizing the accuracy and minimizing the latency is a hard task. In this study, we propose and compare two probabilistic models for online pitch tracking: Hidden Markov Model (HMM) and Change Point Model (CPM). As opposed to the past research which has mainly focused on developing generic, instrumentindependent pitch tracking methods, our models are instrument-specific and can be optimized to fit a certain musical instrument. In our models, it is presumed that each note has a certain characteristic spectral shape which we call the spectral template. The generative models are constructed in such a way that each time slice of the audio spectra is generated from one of these spectral templates multiplied by a volume factor. From this point of view, we treat the pitch tracking problem as a template matching problem where the aim is to infer the active template and its volume as we observe the audio data. In the HMM, we assume that the pitch labels have a certain temporal structure in such a way that the current pitch label depends on the previous pitch label. The volume variables are independent in time, which is not the natural case in terms of musical audio. In this model, the inference scheme is standard, straightforward, and fast. In the CPM, we also introduce a temporal structure for the volume variables. In this way, the CPM enables explicit modeling of the damping structure of an instrument. As a trade off, the inference scheme of the CPM is much more complex than the HMM. After some degree, exact inference becomes impractical. For this reason, we developed an approximate inference scheme for this model. The main goal of this work is to investigate the trade off in between latency and accuracy of the pitch tracking system. We conducted several experiments on an implementation which was developed in C++. We evaluated the performance of our models by computing the most-likely paths that were obtained via filtering or fixed-lag smoothing distributions. The evaluation was held on monophonic bass guitar and tuba recordings with respect to four evaluation metrics. We also compared the results with a standard monophonic pitch tracking algorithm (YIN). Both HMM and the CPM performed better than the YIN algorithm. The highest accuracy was obtained from the CPM, whereas the HMM was the fastest in terms of running time.