URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

71 lines
1.9 KiB

3 years ago
3 years ago
3 years ago
  1. # URL Analyzer
  2. URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.
  3. ## About
  4. This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL.
  5. ### Features
  6. The program does the following procedures:
  7. - Gets domain registrar
  8. - Gets webpage title and automatically compares it to the domain registrar name
  9. - Gets initial and final destination of a given URL
  10. - Analyzes whether final destination domain is same than the initial one
  11. - Gets URL redirects and HTTP response status codes
  12. - Fetches WHOIS data
  13. - Gets domain timestamps such as creation, update and expire days
  14. - Exact days & days relative to the current day
  15. - Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting)
  16. - Gets URL references on a webpage
  17. - **Local** domain referrals
  18. - **External** URL referrals
  19. - **Multidot** URLs (ones with `../` in the URL path)
  20. - Gets domain registrars for each URL
  21. ## Requirements
  22. ```
  23. Python 3
  24. Python 3 BeautifulSoup4 python-beautifulsoup4
  25. Python 3 whois <= 0.7.3 python-whois; PyPI
  26. Python 3 JSON Schema python-jsonschema
  27. Python 3 Numpy python-numpy
  28. Python 3 matplotlib python-matplotlib
  29. ```
  30. **NOTE**: Some Linux distributions may use `python3` executable instead of `python` for Python 3.
  31. ### Other requirements
  32. - Jupyter (recommended)
  33. - Working DNS name resolution
  34. - Internet connection
  35. ## Code
  36. - `jupyter notebook (python 3)`: [Get file](code/url-analyzer.ipynb)
  37. - `python 3`: [Get file](code/url-analyzer.py)
  38. ## Screenshots
  39. The following screenshots are generated with `matplotlib`
  40. ### Domains associated with HTML URL data
  41. ![](screenshots/domain_figure_hsfi.png)
  42. ![](screenshots/domain_figure_tsfi.png)
  43. ## Sample data
  44. - `JSON sample data`: [Get file](sample_dataset.json)
  45. ## License
  46. N/A