Abstract
Nowadays Web robots can be used to perform a number of useful navigational goals, such as statistical analysis, link check, and resource collection. On one hand, Web crawler is a particular group of users whose traverse should not make part of regular analysis. Such disturbance affects site decision making in every possible way: marketing campaigns, site re-structuring, site personalization or server balancing, just to name a few. Therefore, it is necessary to correctly detect various robots as soon as possible so as to let the robots to be used under the security policy. In this paper, we come up with a crawler guard to detect and block unauthorized robots under the security policy. It can immediately differentiate various robots based on their functions (navigational goals) to ensure that only the welcome robots which obey the security policy are allowed to view the protected Web pages. Our experiment focuses on how the crawler guard could identify precisely the viewing goal of the robots under certain limits of Web page hits. The experimental results show that the request count is smaller than 8 while the accuracy of detection is 100%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tan, P.-N., Kumar, V.: Discovery of Web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery 6(1), 9–35 (2002)
Guo, W., Ju, S., Gu, Y.: Web robot Detection Techniques Based on Statistics of their Requested URL Resources. In: Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, vol. 1, pp. 302–306 (2005)
Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: An Investigation of WWW Crawler behavior: Characterization and Metrics. Computer Communications 28(8), 880–897 (2005)
Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web Robot Detection - Preprocessing Web Log files for Robot detection. New Developments in Classification and Data Analysis, 113–124 (2006)
Spider_trap, http://en.wikipedia.org/wiki/Spider_trap
Kadakia, Y.: Automated Attack Prevention, http://www.acunetix.com/vulnerability-scanner/yashkadakia.pdf
Doran, D., Gokhale, S.S.: Discovering New Trends in Web Robot Traffic Through Functional Classification. In: Seventh IEEE International Symposium Network Computing and Applications, pp. 275–278 (2008)
Benedikt, M., Freire, J., Godefroid, P.: VeriWeb: Automatically Testing Dynamic Web Sites. In: Proceedings of the 11th International Conference on the World Wide Web (2002)
Raghavan, S., Garcia-Molina, H.: Crawling the hidden Web. In: Proceedings of the 27th VLDB Conference, pp. 129–138 (2001)
Park, K., Pai, V.S., Lee, K.W., Calo, S.B.: Securing Web Service by Automatic Robot Detection. In: Proceedings of the 2006 USENIX Annual Technical Conference (2006)
Ollmann, G.: Stopping Automated Attack Tools, http://www.ngssoftware.com/papers/
Sun, Y., Councill, I.G., Lee Giles, C.: BotSeer: An automated information system for analyzing Web robots. In: Proceedings of the Eighth International Conference on Web Engineering, pp. 108–114 (2008)
Geens, N., Huysmans, J., Vanthienen, J.: A Probabilistic Reasoning Approach for Discovering Web Crawler Sessions. In: Advances in Data Mining 2013. LNCS, vol. 4065 (2006)
Dikaiakos, M.D., Stassopoulou, A.: Web robot detection: A probabilistic reasoning approach. Computer Networks 53(3), 265–278 (2009)
Dikaiakos, M.D., Stassopoulou, A., Papageorgiou, L.: Characterizing Crawler Behavior from Web Server Access Logs. In: Bauknecht, K., Tjoa, A.M., Quirchmayr, G. (eds.) EC-Web 2003. LNCS, vol. 2738, pp. 369–378. Springer, Heidelberg (2003)
Kandula, S., Katabi, D., Jacob, M., Berger, A.: Botz-4-sale, Surviving organized ddos attacks that mimic flash crowds. In: Proceedings of the 2nd Symposium on Networked Systems Design and Implementation (2005)
Nakao, K., Inoue, D., Eto, M., Yoshioka, K.: IEICE Transactions on Information and Systems E92-D(5), 787–798 (2009)
Kim, S., Shin, S.-J., Kim, H., Kwon, K.H., Han, Y.: Hybrid Intrusion Forecasting Framework for Early Warning System. IEICE Transactions on Information and Systems E91-D(5), 1234–1241 (2008)
Du, P., Abe, S., Ji, Y., Sato, S., Ishiguro, M.: A Traffic Decomposition and Prediction Method for Detecting and Tracing Network-Wide Anomalies. IEICE Transactions on Information and Systems E92-D(5), 929–936 (2009)
Koster, M.: A method for Web Robots control. Network Working Group - Internet Draft (1996)
Calzarossa, M.C., Massari, L.: Characterization of crawling activities of commercial Web robots. LNEE. Springer (2012)
Kwon, S., Kim, Y.-G., Cha, S.: Web robot detection based on pattern-matching technique. Journal of Information Science 38(2), 118–126 (2012)
Balla, A., Stassopoulou, A., Dikaiakos, M.D.: Real-time Web Crawler Detection. In: 18th International Conference on Telecommunications, pp. 428–432 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, JM. (2013). A Crawler Guard for Quickly Blocking Unauthorized Web Robot. In: Wang, G., Ray, I., Feng, D., Rajarajan, M. (eds) Cyberspace Safety and Security. CSS 2013. Lecture Notes in Computer Science, vol 8300. Springer, Cham. https://doi.org/10.1007/978-3-319-03584-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-03584-0_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03583-3
Online ISBN: 978-3-319-03584-0
eBook Packages: Computer ScienceComputer Science (R0)