Algorithm of Web Page Similarity Comparison Based on Visual Block

Xingchen Li1, Weizhe Zhang1,2, Desheng Wang1, Bin Zhang2 and Hui He1

  1. School of Computer Science and Technology, Harbin Institute of Technology
    Harbin,China
    16s003084@stu.hit.edu.cn, (wzzhang,wangdesheng0821,hehui)@hit.edu.cn
  2. Cyberspace Security Research Center
    Peng Cheng Laboratory, Shenzhen, China
    bin.zhang@pcl.ac.cn

Abstract

Phishing often deceives users due to the relative similarity to the true pages on a layout and leads to considerable losses for the society. Consequently, detecting phishing sites has been an urgent activity. By researching phishing web pages using web page screenshots, we discover that this kind of web pages use numerous web page screenshots to achieve the close similarity to the true page and avoid the text and structure similarity detection. This study introduces a new similarity matching algorithm based on visual blocks. First, the RenderLayer tree of the web page is obtained to extract the visual block. Second, an algorithm that will settle the jumbled visual blocks, including the deletion of the small visual blocks and the emergence of the overlapping visual blocks, is designed. Finally, the similarity between the two web pages is assessed. The proposed algorithm sets different thresholds to achieve the optimal missing and false alarm rates.

Key words

phishing, similarity comparison, visual block, web rendering

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS180915028L

Publication information

Volume 16, Issue 3 (October 2019)
Recent Advances in Information Processing and Security
Year of Publication: 2019
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Li, X., Zhang, W., Wang, D., Zhang, B., He, H.: Algorithm of Web Page Similarity Comparison Based on Visual Block. Computer Science and Information Systems, Vol. 16, No. 3, 815–830. (2019), https://doi.org/10.2298/CSIS180915028L