Browse Source

Add support for city information, add 'geodir' parameter

master
Pekka Helenius 2 years ago
parent
commit
2d2979b07e
2 changed files with 90 additions and 9 deletions
  1. +73
    -3
      README.md
  2. +17
    -6
      apache-logparser/logparser.py

+ 73
- 3
README.md View File

@ -42,6 +42,74 @@ run `updpkgsums && makepkg -Cfi` in [apache-logparser](apache-logparser/) direct
## Examples ## Examples
**Q: Can you list me unique connections (IP addresses) associated with country and city location data, using the last Apache log file?**
```
httpd-logparser --outfields time remote_host country city -d /var/log/httpd/ -f access_log$ -np --stats | sort -k 3 -u | sort -k 4
Processed files: access_log
Matched log entries: 724
Processed log entries: 724
2021-06-06 10:00:57 135.23.195.XXX Canada Quebec
2021-06-06 04:58:58 8.210.233.XXX China Guangzhou
2021-06-06 05:01:37 23.228.109.XXX China Shanghai
2021-06-06 04:49:57 8.210.71.XXX China Unknown: 34.772499, 113.726601
2021-06-06 09:47:32 92.151.100.XXX France Boulogne-Billancourt
2021-06-06 02:05:38 195.154.122.XXX France Ivry-sur-Seine
2021-06-06 03:24:22 92.116.45.XXX Germany Bielefeld
2021-06-06 06:06:58 207.154.218.XXX Germany Frankfurt am Main
2021-06-06 10:45:40 172.105.77.XXX Germany Frankfurt am Main
2021-06-06 00:25:20 92.116.52.XXX Germany Hamm
2021-06-06 05:02:54 159.69.10.XXX Germany Mannheim
2021-06-06 06:24:55 89.246.127.XXX Germany Schloss Holte-Stukenbrock
2021-06-06 10:08:21 138.201.56.XXX Germany Unknown: 51.299301, 9.490900
2021-06-06 03:42:02 47.31.198.XXX India Delhi
2021-06-06 00:15:16 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 02:10:21 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 02:32:48 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 03:26:22 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 06:52:23 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 07:00:48 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 11:10:59 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 00:23:05 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 02:46:33 92.118.160.XXX Lithuania Unknown: 56.000000, 24.000000
2021-06-06 05:11:20 45.131.212.XXX Netherlands Amsterdam
2021-06-06 05:12:40 185.180.143.XXX Portugal Unknown: 38.705700, -9.135900
2021-06-06 07:55:47 89.137.179.XXX Romania Timisoara
2021-06-06 06:10:46 91.243.100.XXX Russian Federation Novocherkassk
2021-06-06 11:30:51 213.177.208.XXX Spain Palencia
2021-06-06 01:41:48 184.22.158.XXX Thailand Thalang
2021-06-06 08:14:41 176.88.78.XXX Turkey Ankara
2021-06-06 08:32:04 212.82.66.XXX United Kingdom Burnham
2021-06-06 03:53:41 45.146.164.XXX United Kingdom London
2021-06-06 04:33:42 185.158.250.XXX United Kingdom Manchester
2021-06-06 10:16:19 82.10.88.XXX United Kingdom Shrewsbury
2021-06-06 10:14:28 40.77.189.XXX United States Chicago
2021-06-06 08:16:07 69.170.221.XXX United States Colorado Springs
2021-06-06 10:57:25 192.241.206.XXX United States San Francisco
2021-06-06 01:09:16 128.14.209.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 06:44:49 47.243.113.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 06:45:48 47.243.116.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 08:00:40 162.244.34.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 10:30:53 47.242.214.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 04:22:27 162.244.33.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 04:34:47 47.243.48.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 06:37:16 47.243.109.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 06:42:37 162.244.33.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 06:44:49 47.243.109.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 07:04:20 47.243.113.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 07:44:23 47.243.110.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 08:29:33 47.242.12.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 10:38:15 128.14.133.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 03:18:25 23.95.132.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 04:13:55 128.1.248.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 08:21:11 64.62.197.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 11:17:33 47.243.95.XXX United States Unknown: 37.750999, -97.821999
2021-06-06 08:03:24 167.56.236.XXX Uruguay Castillos
```
NOTE: The last numerical part of all ip addresses are anonymized with `XXX` string.
**Q: How many valid requests from Finland and Sweden occured between 15th - 24th April 2020?** **Q: How many valid requests from Finland and Sweden occured between 15th - 24th April 2020?**
``` ```
@ -258,8 +326,8 @@ httpd-logparser --outfields http_status time time_diff country -d /var/log/httpd
## Usage ## Usage
``` ```
usage: httpd-logparser [-h] -d [LOG_DIR] -f LOG_FILE [LOG_FILE ...] [-s [LOG_SYNTAX]] [-c STATUS_CODE [STATUS_CODE ...]] [-cf COUNTRY [COUNTRY ...]] [-ot [OUT_TIMEFORMAT]]
[-of OUT_FIELD [OUT_FIELD ...]] [-ng] [-dl [DAY_LOWER]] [-du [DAY_UPPER]] [-sb [SORTBY_FIELD]] [-sbr [SORTBY_FIELD_REVERSE]] [-st] [-np]
usage: httpd-logparser [-h] -d [LOG_DIR] -f LOG_FILE [LOG_FILE ...] [-s [LOG_SYNTAX]] [-c STATUS_CODE [STATUS_CODE ...]] [-cf COUNTRY [COUNTRY ...]] [-ot [OUT_TIMEFORMAT]] [-of OUT_FIELD [OUT_FIELD ...]] [-ng] [-gd [GEODB]] [-dl [DAY_LOWER]] [-du [DAY_UPPER]]
[-sb [SORTBY_FIELD]] [-sbr [SORTBY_FIELD_REVERSE]] [-st] [-np]
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
@ -276,8 +344,10 @@ optional arguments:
-ot [OUT_TIMEFORMAT], --outtimeformat [OUT_TIMEFORMAT] -ot [OUT_TIMEFORMAT], --outtimeformat [OUT_TIMEFORMAT]
Output time format. Default: "%d-%m-%Y %H:%M:%S" Output time format. Default: "%d-%m-%Y %H:%M:%S"
-of OUT_FIELD [OUT_FIELD ...], --outfields OUT_FIELD [OUT_FIELD ...] -of OUT_FIELD [OUT_FIELD ...], --outfields OUT_FIELD [OUT_FIELD ...]
Output fields. Default: log_file_name, http_status, remote_host, country, time, time_diff, user_agent, http_request
Output fields. Default: log_file_name, http_status, remote_host, country, city, time, time_diff, user_agent, http_request
-ng, --nogeo Skip country check with external "geoiplookup" tool. -ng, --nogeo Skip country check with external "geoiplookup" tool.
-gd [GEODB], --geodir [GEODB]
Database file directory for "geoiplookup" tool. Default: /usr/share/GeoIP/
-dl [DAY_LOWER], --daylower [DAY_LOWER] -dl [DAY_LOWER], --daylower [DAY_LOWER]
Do not check log entries older than this day. Day syntax: 31-12-2020 Do not check log entries older than this day. Day syntax: 31-12-2020
-du [DAY_UPPER], --dayupper [DAY_UPPER] -du [DAY_UPPER], --dayupper [DAY_UPPER]


+ 17
- 6
apache-logparser/logparser.py View File

@ -25,18 +25,19 @@ import argparse
import os import os
import re import re
import subprocess import subprocess
import sys
from datetime import datetime from datetime import datetime
from apachelogs import LogParser from apachelogs import LogParser
out_fields_list = ['log_file_name', 'http_status', 'remote_host', 'country', 'time', 'time_diff', 'user_agent', 'http_request']
out_fields_list = ['log_file_name', 'http_status', 'remote_host', 'country', 'city', 'time', 'time_diff', 'user_agent', 'http_request']
out_timeformat = "%d-%m-%Y %H:%M:%S" out_timeformat = "%d-%m-%Y %H:%M:%S"
dayformat = "%d-%m-%Y" dayformat = "%d-%m-%Y"
ot = '"' + re.sub(r'%', '%%', out_timeformat) + '"' ot = '"' + re.sub(r'%', '%%', out_timeformat) + '"'
geotool = "geoiplookup" geotool = "geoiplookup"
geodb = "/usr/share/GeoIP/GeoIP.dat"
geodb = "/usr/share/GeoIP/"
# Log format as defined in Apache/HTTPD configuration file (LogFormat directive) # Log format as defined in Apache/HTTPD configuration file (LogFormat directive)
in_log_syntax = "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{cache-status}e\""
in_log_syntax = "%h %u %t %T \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{cache-status}e\" %I %O"
argparser = argparse.ArgumentParser() argparser = argparse.ArgumentParser()
@ -48,6 +49,7 @@ argparser.add_argument('-cf', '--countryfilter', help = 'Include only these cou
argparser.add_argument('-ot', '--outtimeformat', help = 'Output time format.\nDefault: ' + ot, nargs = '?', dest = 'out_timeformat') argparser.add_argument('-ot', '--outtimeformat', help = 'Output time format.\nDefault: ' + ot, nargs = '?', dest = 'out_timeformat')
argparser.add_argument('-of', '--outfields', help = 'Output fields.\nDefault: ' + ', '.join(out_fields_list), nargs = '+', dest = 'out_field') argparser.add_argument('-of', '--outfields', help = 'Output fields.\nDefault: ' + ', '.join(out_fields_list), nargs = '+', dest = 'out_field')
argparser.add_argument('-ng', '--nogeo', help = 'Skip country check with external "geoiplookup" tool.', action='store_true', dest = 'no_geo') argparser.add_argument('-ng', '--nogeo', help = 'Skip country check with external "geoiplookup" tool.', action='store_true', dest = 'no_geo')
argparser.add_argument('-gd', '--geodir', help = 'Database file directory for "geoiplookup" tool.\nDefault: ' + geodb, nargs = '?', dest = 'geodb')
argparser.add_argument('-dl', '--daylower', help = 'Do not check log entries older than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_lower') argparser.add_argument('-dl', '--daylower', help = 'Do not check log entries older than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_lower')
argparser.add_argument('-du', '--dayupper', help = 'Do not check log entries newer than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_upper') argparser.add_argument('-du', '--dayupper', help = 'Do not check log entries newer than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_upper')
argparser.add_argument('-sb', '--sortby', help = 'Sort by an output field.', nargs = '?', dest = 'sortby_field') argparser.add_argument('-sb', '--sortby', help = 'Sort by an output field.', nargs = '?', dest = 'sortby_field')
@ -233,6 +235,7 @@ i = 0
country_seen = False country_seen = False
prev_host = "" prev_host = ""
host_country = "" host_country = ""
host_city = ""
log_entries = [] log_entries = []
for file in files: for file in files:
@ -296,12 +299,19 @@ for file in files:
country_seen = False country_seen = False
if not country_seen: if not country_seen:
host_country = subprocess.check_output([geotool, entry_remote_host]).rstrip().decode()
host_country = re.sub(r"^.*, (.*)", r'\1', host_country)
host_country_main = subprocess.check_output([geotool,'-d',geodb,entry_remote_host]).rstrip().decode()
host_country_main = host_country_main.split('\n')
host_country = re.sub(r"^.*, (.*)", r'\1', host_country_main[0])
if re.search("Address not found", host_country): if re.search("Address not found", host_country):
host_country = "Unknown" host_country = "Unknown"
else:
if len(host_country_main) > 1:
host_city = host_country_main[1].split(', ')[4]
if re.search("N/A", host_city):
host_city = "Unknown: " + host_country_main[1].split(', ')[6] + ', ' + host_country_main[1].split(', ')[7]
if country_filter: if country_filter:
for country in countries_filter_list: for country in countries_filter_list:
if country[1] == "!": if country[1] == "!":
@ -339,6 +349,7 @@ for file in files:
('http_status', entry_http_status, '{:3s}' ), ('http_status', entry_http_status, '{:3s}' ),
('remote_host', entry_remote_host, '{:15s}'), ('remote_host', entry_remote_host, '{:15s}'),
('country', host_country, '{:20s}'), ('country', host_country, '{:20s}'),
('city', host_city, '{:15s}'),
('time', entry_time, '{:8s}' ), ('time', entry_time, '{:8s}' ),
('time_diff', time_diff, '{:8s}' ), ('time_diff', time_diff, '{:8s}' ),
('user_agent', entry_user_agent, '{:s}' ), ('user_agent', entry_user_agent, '{:s}' ),
@ -353,7 +364,7 @@ for file in files:
for out_field in out_fields: for out_field in out_fields:
entry, data, striformat = out_field entry, data, striformat = out_field
if args.no_geo and entry == "country":
if args.no_geo and (entry == "country" or entry == "city"):
continue continue
if out_fields_list[t] == entry: if out_fields_list[t] == entry:


Loading…
Cancel
Save