Automatic Detection of Illegitimate Websites with Mutual Clustering
Abstract
In the websites the contents will be are similarity when we compared with other search engines. So to check the similar content in the websites and its web contents we created a overhead to the search engine which will severely effect its performance & quality. So to detect the silmilar or same content or web documenattion some techniques are implemented by web crawling research community. So it is one of major factor for the search engines to provide some applicatory data to users in the first page itself. So to avoid such issues we proposed a methodlogy called Automatic Detection of illegitimate websites with Mutual Clustering (ADIWMC) paper we are presenting a peculiar and efficacious path for the detection of similarities in the web pages in web clustering. Detection of same and similar web pages and web content will be done by storing the crawled web pages into depository. Initially the adwords will be extracted from the crawled pages and similarity checking will be done between the two pages based in the usage of adwords. So a threshold value is set for this, if the similarity checking percentage is greater than the threshold then similarity content is reduced and improves the depositary and improves the search engine quality. In the sections of existing analysis and the proposed analysis we are clearly exploring how it works.
Keywords
Illegitimate; Mutual clustering; web crawled; phising
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v6i3.pp995-1001
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).