|
@ -1,3 +1,71 @@ |
|
|
# url-analyzer |
|
|
|
|
|
|
|
|
# URL Analyzer |
|
|
|
|
|
|
|
|
URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs. |
|
|
|
|
|
|
|
|
URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs. |
|
|
|
|
|
|
|
|
|
|
|
## About |
|
|
|
|
|
|
|
|
|
|
|
This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL. |
|
|
|
|
|
|
|
|
|
|
|
### Features |
|
|
|
|
|
|
|
|
|
|
|
The program does the following procedures: |
|
|
|
|
|
|
|
|
|
|
|
- Gets domain registrar |
|
|
|
|
|
- Gets webpage title and automatically compares it to the domain registrar name |
|
|
|
|
|
- Gets initial and final destination of a given URL |
|
|
|
|
|
- Analyzes whether final destination domain is same than the initial one |
|
|
|
|
|
- Gets URL redirects and HTTP response status codes |
|
|
|
|
|
- Fetches WHOIS data |
|
|
|
|
|
- Gets domain timestamps such as creation, update and expire days |
|
|
|
|
|
- Exact days & days relative to the current day |
|
|
|
|
|
|
|
|
|
|
|
- Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting) |
|
|
|
|
|
|
|
|
|
|
|
- Gets URL references on a webpage |
|
|
|
|
|
- **Local** domain referrals |
|
|
|
|
|
- **External** URL referrals |
|
|
|
|
|
- **Multidot** URLs (ones with `../` in the URL path) |
|
|
|
|
|
- Gets domain registrars for each URL |
|
|
|
|
|
|
|
|
|
|
|
## Requirements |
|
|
|
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
Python 3 |
|
|
|
|
|
Python 3 BeautifulSoup4 python-beautifulsoup4 |
|
|
|
|
|
Python 3 whois <= 0.7.3 python-whois; PyPI |
|
|
|
|
|
Python 3 JSON Schema python-jsonschema |
|
|
|
|
|
Python 3 Numpy python-numpy |
|
|
|
|
|
Python 3 matplotlib python-matplotlib |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
**NOTE**: Some Linux distributions may use `python3` executable instead of `python` for Python 3. |
|
|
|
|
|
|
|
|
|
|
|
### Other requirements |
|
|
|
|
|
|
|
|
|
|
|
- Jupyter (recommended) |
|
|
|
|
|
- Working DNS name resolution |
|
|
|
|
|
- Internet connection |
|
|
|
|
|
|
|
|
|
|
|
## Code |
|
|
|
|
|
|
|
|
|
|
|
- `jupyter notebook (python 3)`: [Get file](code/url-analyzer.ipynb) |
|
|
|
|
|
|
|
|
|
|
|
- `python 3`: [Get file](code/url-analyzer.py) |
|
|
|
|
|
|
|
|
|
|
|
## Screenshots |
|
|
|
|
|
|
|
|
|
|
|
The following screenshots are generated with `matplotlib` |
|
|
|
|
|
|
|
|
|
|
|
### Domains associated with HTML URL data |
|
|
|
|
|
|
|
|
|
|
|
![](screenshots/domain_figure_hsfi.png) |
|
|
|
|
|
|
|
|
|
|
|
![](screenshots/domain_figure_tsfi.png) |
|
|
|
|
|
|
|
|
|
|
|
## Sample data |
|
|
|
|
|
|
|
|
|
|
|
- `JSON sample data`: [Get file](sample_dataset.json) |
|
|
|
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
|
|
|
|
|
|
N/A |