Pekka Helenius 3df3cb660d | 3 years ago | |
---|---|---|
code | 3 years ago | |
screenshots | 3 years ago | |
README.md | 3 years ago | |
sample_dataset.json | 3 years ago |
URL data analyzer and extractor. Detect malicious signs and other useful data associated with URLs.
This program extract various website information based on URL addresses. This data can be used to analyze maliciousness of the given URL.
NOTE: See sample JSON data: Get file
To summarize, the program does the following procedures for listed URLs:
Gets domain registrar
Gets webpage title and automatically compares it to the domain registrar name
Gets initial and final destination of a given URL
Gets URL redirects and HTTP response status codes
Fetches WHOIS data
Gets content and number of iframes (for detecting possible XSS; Cross-Site Scripting)
Gets URL references on a webpage
../
in the URL path)
Python 3
Python 3 BeautifulSoup4 python-beautifulsoup4
Python 3 whois <= 0.7.3 python-whois; PyPI
Python 3 JSON Schema python-jsonschema
Python 3 Numpy python-numpy
Python 3 matplotlib python-matplotlib
NOTE: Some Linux distributions may use python3
executable instead of python
for Python 3.
The following screenshots are generated with matplotlib
Non-UTF-8 character decoding not implemented
If multiple JSON data files exist, a wrong JSON data file is likely selected
Get URLs and other parameters from command line
More data visualization and compherensive analysis
Null data may be generated in some cases
Add (unit) tests
N/A