diff --git a/README.md b/README.md index 2647c76..5b6f3bb 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,71 @@ -# url-analyzer +# URL Analyzer -URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs. \ No newline at end of file +URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs. + +## About + +This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL. + +### Features + +The program does the following procedures: + +- Gets domain registrar +- Gets webpage title and automatically compares it to the domain registrar name +- Gets initial and final destination of a given URL + - Analyzes whether final destination domain is same than the initial one +- Gets URL redirects and HTTP response status codes +- Fetches WHOIS data + - Gets domain timestamps such as creation, update and expire days + - Exact days & days relative to the current day + +- Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting) + +- Gets URL references on a webpage + - **Local** domain referrals + - **External** URL referrals + - **Multidot** URLs (ones with `../` in the URL path) + - Gets domain registrars for each URL + +## Requirements + +``` +Python 3 +Python 3 BeautifulSoup4 python-beautifulsoup4 +Python 3 whois <= 0.7.3 python-whois; PyPI +Python 3 JSON Schema python-jsonschema +Python 3 Numpy python-numpy +Python 3 matplotlib python-matplotlib +``` + +**NOTE**: Some Linux distributions may use `python3` executable instead of `python` for Python 3. + +### Other requirements + +- Jupyter (recommended) +- Working DNS name resolution +- Internet connection + +## Code + +- `jupyter notebook (python 3)`: [Get file](code/url-analyzer.ipynb) + +- `python 3`: [Get file](code/url-analyzer.py) + +## Screenshots + +The following screenshots are generated with `matplotlib` + +### Domains associated with HTML URL data + +![](screenshots/domain_figure_hsfi.png) + +![](screenshots/domain_figure_tsfi.png) + +## Sample data + +- `JSON sample data`: [Get file](sample_dataset.json) + +## License + +N/A