URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Pekka Helenius e31765edf9 Prettify sample JSON data 3 years ago
code url-analyzer.py: set exec flag 3 years ago
screenshots Add screenshots 3 years ago
README.md Update README 3 years ago
sample_dataset.json Prettify sample JSON data 3 years ago

README.md

URL Analyzer

URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.

About

This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL.

Features

The program does the following procedures:

  • Gets domain registrar

  • Gets webpage title and automatically compares it to the domain registrar name

  • Gets initial and final destination of a given URL

    • Analyzes whether final destination domain is same than the initial one
  • Gets URL redirects and HTTP response status codes

  • Fetches WHOIS data

    • Gets domain timestamps such as creation, update and expire days
      • Exact days & days relative to the current day
  • Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting)

  • Gets URL references on a webpage

    • Local domain referrals
    • External URL referrals
    • Multidot URLs (ones with ../ in the URL path)
      • Gets domain registrars for each URL

Requirements

Python 3
Python 3 BeautifulSoup4   python-beautifulsoup4
Python 3 whois <= 0.7.3   python-whois; PyPI
Python 3 JSON Schema      python-jsonschema
Python 3 Numpy            python-numpy
Python 3 matplotlib       python-matplotlib

NOTE: Some Linux distributions may use python3 executable instead of python for Python 3.

Other requirements

  • Jupyter (recommended)
  • Working DNS name resolution
  • Internet connection

Code

Screenshots

The following screenshots are generated with matplotlib

Domains associated with HTML URL data

Sample data

License

N/A