From fdd21069351628d32682e319a2139ae112a1568d Mon Sep 17 00:00:00 2001 From: Pekka Helenius Date: Fri, 12 Mar 2021 00:56:26 +0200 Subject: [PATCH] Update README --- README.md | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7bb6fea..af72acf 100644 --- a/README.md +++ b/README.md @@ -64,13 +64,53 @@ The following screenshots are generated with `matplotlib` ![](screenshots/domain_figure_tsfi.png) +**Purpose - WHOIS query lookup**: +- Phishing campaigns register domains of websites from the same registrar + +### Other analysis would reveal more + +Other analysis may give better insights such as: + +- **Initial and final URL**: + - "Even if victim realizes he/she is visiting phishing website, he/she will be likely to report the randomly-generated URL of the visited website, and not that of the redirecting one, which makes blacklisting unable to stop the scam" + - Phishing URLs may use multiple redirections to avoid blacklist detection + +- **Domain timestamps**: + - Domains bought for short period of time (i.e. only one year) to avoid blacklisting + - Domains are created/updated just before URL creation + +- **Domain name & local URL usage consistency** + +- **Domain name registration**: +- "Legitimate websites are likely to register a domain name reflecting the brand or the service they represent." + +- **Domain name length**: + - In phishing websites, URL tends to be much longer than legitimate websites. However, domains themselves tend to be much shorter (without TLD) + +- **URL analysis** + - Phishing URLs often contain more number of dots and subdomains than legitimate URLs + - "Researchers have observed that more than half of the phishing URLs are shortened to obfuscate the target URL and to hide malignant intentions rather than to gain character space" + +- **Robots.txt analysis**: + - Legitimate robots.txt redirects bots to a legitimate domain rather than to the original phishing domain + +**HTML data keyterms identification** +- Analysis of + - Starting URL + - Landing URL + - title + - text content + - copyright marks + - number of `iframe`s and `input` fields + - Reference links (`href`, `src`, etc.) + ## Known bugs issues and missing features - Non-UTF-8 character decoding not implemented - If multiple JSON data files exist, a wrong JSON data file is likely selected -- Get URLs and other parameters from command line +- Get URLs and other parameters from command line and/or associated `.conf` file - More data visualization and compherensive analysis @@ -78,6 +118,8 @@ The following screenshots are generated with `matplotlib` - Add (unit) tests +- Improve modularity of the codebase + ## License N/A