Volume 4, Issue 4 (12-2012)                   itrc 2012, 4(4): 33-42 | Back to browse issues page

XML Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Golshani M A, Zareh Bidoki A M. IECA: Intelligent Effective Crawling Algorithm for Web Pages . itrc 2012; 4 (4) :33-42
URL: http://journal.itrc.ac.ir/article-1-171-en.html
Abstract:   (3423 Views)

Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Webin a reasonable amount of time.Several Crawling algorithms such as Partial PageRank,Batch PageRank, OPIC, and FICA have been proposed, but they have high time complexity or low throughput. To overcome these problems, we propose a new crawling algorithm called IECA which is easy to implement with low time O(E*logV)and memory complexity O(V) -Vand Eare the number of nodes and edges in the Web graph, respectively. Unlike the mentioned algorithms, IECA traverses the Web graph only once and the importance of the Web pages is determined based on the logarithmic distance and weight of the incoming links. To evaluate IECA, we use threedifferent Web graphs such as the UK-2005, Web graph of university of California, Berkeley-2008, and Iran-2010. Experimental results show that our algorithm outperforms other crawling algorithms in discovering highly important pages.

Full-Text [PDF 928 kb]   (2350 Downloads)    
Type of Study: Research | Subject: Information Technology

Add your comments about this article : Your username or Email:
CAPTCHA

Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.