Abstract:
A web crawler can be defined as automated software that extracts website maps by visiting all the links in a website. Website map extraction process can be used to build a basis for a web attack. Hence, crawling plays an important role in automated attacks. The most automated vulnerability scanners perform crawling before vulnerability tests in order to determine overall map and attack surface. Besides automated scanning features, crawlers can also be used for content theft. By utilising a crawler, one can copy all the pages and content of a website by visiting all pages in an orderly manner. Anti-crawling can be defined as a set of mechanisms that prevents websites from being crawled by automated crawlers. In this thesis, a set of anti-crawling mechanisms are combined into an Apache web server module called mod_antiCrawl. mod_antiCrawl is developed in C language by using Apache API and it has crawler detection and inhibition capabilities to protect servers from malicious crawlers. The performance of mod_antiCrawl has also been studied and our results show that website map discovery by crawlers decreases at least 70% after mod_antiCrawl is activated. This ratio increases to 90% by enabling different functionalities of the module.