Abstract:
This study describes and evaluates the techniques we developed for the question analysis and information retrieval (IR) module of a closed-domain Turkish factoid Question Answering (QA) system that is intended for high-school students to support their education. Question analysis, which involves analyzing the questions to extract the necessary information for determining what is being asked and how to approach answering it, is one of the most crucial components of a QA system. Therefore, we propose novel methods for two major problems in question analysis, namely focus extraction and question classi cation, based on integrating a rule-based and a Hidden Markov Model (HMM) based sequence classi cation approach, both of which make use of the dependency relations among the words in the question. We also investigate the IR module, which is another critical aspect of a QA system, and introduce the IR module to e ciently gather the relevant information to a given question, with which the answer will be determined. IR module searches for the relevant documents and passages through the combined use of search engines Indri and Apache Lucene. Solution to these problems constitute the framework, on top of which a whole QA system can easily be built with only an addition of an answering module. Comparisons of all solutions with baseline models are provided. This study also o ers a manually collected and annotated gold standard data set for further research in this area.