dc.description.abstract |
Web mining is defined as the process of using data mining techniques to automatically discover and extract information from semi-or unstructured Web documents and services. This study on Web mining consists of two sections, covering Web structure mining and Web content mining. In the first section, mostwidely accepted focused crawling algorithms and simple tree traversing algorithms are compared based on their page relevance, keyword predicate satisfaction and hitratio criteria. Using the URL tokens as an input resulted in higher performances for all criteria. In the second part, an automatic topic finding methodology through Web pages is proposed. Processing only list items on HTML pages returned from a search engine, it is expected to find related key concepts on a user-defined topic. The methodology is experimented using different parameters, such as number of pages, different keywords, stemming implementations, etc. The candidate concepts ordered in relevancy scores represent a high precision on user-defined topic. |
|