@ -64,13 +64,53 @@ The following screenshots are generated with `matplotlib`
![](screenshots/domain_figure_tsfi.png)
**Purpose - WHOIS query lookup**:
- Phishing campaigns register domains of websites from the same registrar
### Other analysis would reveal more
Other analysis may give better insights such as:
- **Initial and final URL**:
- "Even if victim realizes he/she is visiting phishing website, he/she will be likely to report the randomly-generated URL of the visited website, and not that of the redirecting one, which makes blacklisting unable to stop the scam"
- Phishing URLs may use multiple redirections to avoid blacklist detection
- **Domain timestamps**:
- Domains bought for short period of time (i.e. only one year) to avoid blacklisting
- Domains are created/updated just before URL creation
- **Domain name & local URL usage consistency**
- **Domain name registration**:
- "Legitimate websites are likely to register a domain name reflecting the brand or the service they represent."
- **Domain name length**:
- In phishing websites, URL tends to be much longer than legitimate websites. However, domains themselves tend to be much shorter (without TLD)
- **URL analysis**
- Phishing URLs often contain more number of dots and subdomains than legitimate URLs
- "Researchers have observed that more than half of the phishing URLs are shortened to obfuscate the target URL and to hide malignant intentions rather than to gain character space"
- **Robots.txt analysis**:
- Legitimate robots.txt redirects bots to a legitimate domain rather than to the original phishing domain
**HTML data keyterms identification**
- Analysis of
- Starting URL
- Landing URL
- title
- text content
- copyright marks
- number of `iframe`s and `input` fields
- Reference links (`href`, `src`, etc.)
## Known bugs issues and missing features
- Non-UTF-8 character decoding not implemented
- If multiple JSON data files exist, a wrong JSON data file is likely selected
- Get URLs and other parameters from command line
- Get URLs and other parameters from command line and/or associated `.conf` file
- More data visualization and compherensive analysis
@ -78,6 +118,8 @@ The following screenshots are generated with `matplotlib`