WEB mining issues: topic finding and focused crawling evaluation

Uluhan, Eray.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Sosyal Bilimler Enstitüsü
→
Yönetim Bilişim Sistemleri
→
M.A. Theses
→
View Item

dc.contributor	Graduate Program in Management Information Systems.
dc.contributor.advisor	Badur, Bertan Yılmaz.
dc.contributor.author	Uluhan, Eray.
dc.date.accessioned	2023-03-16T12:52:00Z
dc.date.available	2023-03-16T12:52:00Z
dc.date.issued	2006.
dc.identifier.other	MIS 2006 U48
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/18197
dc.description.abstract	Web mining is defined as the process of using data mining techniques to automatically discover and extract information from semi-or unstructured Web documents and services. This study on Web mining consists of two sections, covering Web structure mining and Web content mining. In the first section, mostwidely accepted focused crawling algorithms and simple tree traversing algorithms are compared based on their page relevance, keyword predicate satisfaction and hitratio criteria. Using the URL tokens as an input resulted in higher performances for all criteria. In the second part, an automatic topic finding methodology through Web pages is proposed. Processing only list items on HTML pages returned from a search engine, it is expected to find related key concepts on a user-defined topic. The methodology is experimented using different parameters, such as number of pages, different keywords, stemming implementations, etc. The candidate concepts ordered in relevancy scores represent a high precision on user-defined topic.
dc.format.extent	30cm.
dc.publisher	Thesis (M.A.)-Bogazici University. Institute for Graduate Studies in Social Sciences, 2006.
dc.relation	Includes appendices.
dc.relation	Includes appendices.
dc.subject.lcsh	Data mining.
dc.subject.lcsh	Web databases.
dc.subject.lcsh	Management information systems.
dc.title	WEB mining issues: topic finding and focused crawling evaluation
dc.format.pages	x, 70 leaves;