Pekka Helenius c2d0eca47a | 3 years ago | |
---|---|---|
code | 3 years ago | |
screenshots | 3 years ago | |
README.md | 3 years ago | |
sample_dataset.json | 3 years ago | |
url-analysis-report.pdf | 3 years ago |
URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.
This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL.
NOTE: See sample JSON data: Get file
To summarize, the program does the following procedures for listed URLs:
Gets domain registrar
Gets webpage title and automatically compares it to the domain registrar name
Gets initial and final destination of a given URL
Gets URL redirects and HTTP response status codes
Fetches WHOIS data
Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting)
Gets URL references on a webpage
../
in the URL path)
Python 3
Python 3 BeautifulSoup4 python-beautifulsoup4
Python 3 whois <= 0.7.3 python-whois; PyPI
Python 3 JSON Schema python-jsonschema
Python 3 Numpy python-numpy
Python 3 matplotlib python-matplotlib
NOTE: Some Linux distributions may use python3
executable instead of python
for Python 3.
The following screenshots are generated with matplotlib
Purpose - WHOIS query lookup:
Other analysis may give better insights such as:
Initial and final URL:
Domain timestamps:
Domain name & local URL usage consistency
Domain name registration:
"Legitimate websites are likely to register a domain name reflecting the brand or the service they represent."
Domain name length:
URL analysis
Robots.txt analysis:
HTML data keyterms identification
iframe
s and input
fieldshref
, src
, etc.)Non-UTF-8 character decoding not implemented
If multiple JSON data files exist, a wrong JSON data file is likely selected
Get URLs and other parameters from command line and/or associated .conf
file
More data visualization and compherensive analysis
Null data may be generated in some cases
Add (unit) tests
Improve modularity of the codebase
N/A