Machine Learning for Phishing Website Detection

Yuan, Ying

Phishing attacks are on the rise and phishing websites are everywhere, denoting the brittleness of security mechanisms reliant on blocklists. Prior work proposed en- hancing Phishing Website Detectors (PWD) to mitigate this threat with data-driven techniques powered by Machine Learning (ML). The main advantage of ML models is their intrinsic ability of noticing weak patterns in the data that are overlooked by a human, and then leveraging such patterns to devise ‘flexible’ detectors that can counter even adaptive attackers. This dissertation addresses three significant aspects arising from the interaction between machine learning and phishing website detection: (i) Adversarial attack for machine learning-based phishing website detection (ML-PWD), (ii) User percep- tions of Phishing webpages, and (iii) Phishing website detection in multi-language environment (i.e., Chinese and Western) The first part presents the security of ML-based phishing website detection. Ex- isting literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfor- tunately, little consideration is given to the actual cost of the attack or the defense. We formalize the “evasion-space" in which an adversarial perturbation can be intro- duced to fool a ML-PWD and propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage. Our contribution paves the way for a much needed re-assessment of adversarial attacks against ML systems for cy- bersecurity. The second part of the dissertation presents a study to understand user perceptions of phishing and adversarial phishing webpages. Adversarial phishing webpages containing perturbations can easily fool ML-based PWD, but it remains uncertain whether these perturbations enhance individuals’ ability to identify phish- ing webpages. Our study indicates adversarial phishing webpages containing typos are more likely to be perceived by users. The third - and last - part of the dissertation reveals the gap between Chinese and Western ML-based PWD, aiming to urge that future work in PWD should take into account the applicability of multilingual envi- ronments and pave the way for PWD systems that can protect users having different backgrounds.

Machine Learning for Phishing Website Detection / Yuan, Ying. - (2024 Mar 07).

Machine Learning for Phishing Website Detection

YUAN, YING

2024

Abstract

Phishing attacks are on the rise and phishing websites are everywhere, denoting the brittleness of security mechanisms reliant on blocklists. Prior work proposed en- hancing Phishing Website Detectors (PWD) to mitigate this threat with data-driven techniques powered by Machine Learning (ML). The main advantage of ML models is their intrinsic ability of noticing weak patterns in the data that are overlooked by a human, and then leveraging such patterns to devise ‘flexible’ detectors that can counter even adaptive attackers. This dissertation addresses three significant aspects arising from the interaction between machine learning and phishing website detection: (i) Adversarial attack for machine learning-based phishing website detection (ML-PWD), (ii) User percep- tions of Phishing webpages, and (iii) Phishing website detection in multi-language environment (i.e., Chinese and Western) The first part presents the security of ML-based phishing website detection. Ex- isting literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfor- tunately, little consideration is given to the actual cost of the attack or the defense. We formalize the “evasion-space" in which an adversarial perturbation can be intro- duced to fool a ML-PWD and propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage. Our contribution paves the way for a much needed re-assessment of adversarial attacks against ML systems for cy- bersecurity. The second part of the dissertation presents a study to understand user perceptions of phishing and adversarial phishing webpages. Adversarial phishing webpages containing perturbations can easily fool ML-based PWD, but it remains uncertain whether these perturbations enhance individuals’ ability to identify phish- ing webpages. Our study indicates adversarial phishing webpages containing typos are more likely to be perceived by users. The third - and last - part of the dissertation reveals the gap between Chinese and Western ML-based PWD, aiming to urge that future work in PWD should take into account the applicability of multilingual envi- ronments and pave the way for PWD systems that can protect users having different backgrounds.

Scheda breve

Scheda completa

Scheda completa (DC)

	Titolo in inglese
	
				Machine Learning for Phishing Website Detection
			
	Anno di discussione
	
				7-mar-2024
			
	Citazione
	
				Machine Learning for Phishing Website Detection / Yuan, Ying. - (2024 Mar 07).
			
	Appare nelle tipologie:
	
				08.01 - Tesi di Dottorato UNIPD (Deposito Legale)

File in questo prodotto:

File	Dimensione	Formato
tesi_Ying_Yuan.pdf accesso aperto Descrizione: tesi_Ying_Yuan Tipologia: Tesi di dottorato Licenza: Altro Dimensione 7.21 MB Formato Adobe PDF Visualizza/Apri	7.21 MB	Adobe PDF	Visualizza/Apri