@ -1,3 +1,295 @@ | |||
# apache-logparser | |||
# Apache log parser | |||
Simple Apache/HTTPD log parser for short analysis | |||
Simple Apache/HTTPD command-line log parser for short analysis, targeted to web server administration tasks. | |||
Unix-alike systems only. | |||
## Motivation | |||
Keep it simple. Very simple. | |||
Although advanced and nice-looking log analytic tools such as [Elastic Stack](https://www.elastic.co/products/) exists (I have used it), I wanted something far more simple and with far less overhead for weekly tasks and for configuring an Apache web server. Therefore, I wrote this simple Python script to parse Apache web server logs. | |||
**Advantages** of this tool are little overhead, piping output to other Unix tools and doing some quick log checks. The main idea is to give desired output for short analysis so that you can properly configure your web server protection mechanisms and network environment based on the actual server data. | |||
This tool is not for intrusion detection/prevention or does not alert administration about hostile penetration attempts. However, it may reveal simple underlying misconfigurations such as invalid URL references on your site. | |||
## Requirements | |||
Following Arch Linux packages. If you use another distribution, refer to corresponding packages: | |||
``` | |||
python | |||
python-apachelogs | |||
``` | |||
[python-apachelogs](https://github.com/jwodder/apachelogs/) is not available either on Arch Linux repositories or AUR repositories. Therefore, I provide a PKGBUILD file to install it. [python-apachelogs - PKGBUILD](python-apachelogs/PKGBUILD) | |||
`python-apachelogs` has a sub-dependency of [python-pydicti](python-apachelogs/python-pydicti/PKGBUILD) package. | |||
Recommended packages for IP address geo-location: | |||
``` | |||
geoip | |||
geoip-database | |||
``` | |||
## Installation | |||
Arch Linux: | |||
run `updpkgsums && makepkg -Cfi` in [apache-logparser](apache-logparser/) directory. Installs `httpd-logparser` executable file in `/usr/bin/` folder. | |||
## Examples | |||
**Q: How many valid requests from Finland and Sweden occured between 15th - 24th April 2020?** | |||
``` | |||
httpd-logparser --outfields time http_status country -d /var/log/httpd/ -c ^20* -f access_log* -cf Finland Sweden -dl "15-04-2020" -du "24-04-2020" --sortby time --stats | |||
Processing file: access_log | |||
Processing file: access_log.1 | |||
Processing file: access_log.2 | |||
Processing file: access_log.3 | |||
Processing file: access_log.4 | |||
Processing log entry: 883 | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
2020-04-17 08:47:05 200 Finland | |||
... | |||
... | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:07 200 Finland | |||
2020-04-23 18:04:08 200 Finland | |||
Processed files: access_log, access_log.1, access_log.2, access.log_3, access_log.4 | |||
Processed log entries: 883 | |||
Matched log entries: 211 | |||
``` | |||
**Q: How many redirects have occured since 01st April 2020?** | |||
``` | |||
httpd-logparser --outfields time http_status country -d /var/log/httpd/ -c ^30* -f access_log* -dl "01-04-2020" --sortby time --stats | |||
Processing file: access_log | |||
Processing file: access_log.1 | |||
Processing file: access_log.2 | |||
Processing file: access_log.3 | |||
Processing file: access_log.4 | |||
Processing log entry: 8993 | |||
2020-04-01 02:13:12 302 United States | |||
2020-04-01 02:13:12 302 United States | |||
2020-04-01 02:13:13 301 United States | |||
2020-04-01 02:13:13 302 United States | |||
2020-04-01 02:13:14 302 United States | |||
2020-04-01 02:13:14 302 United States | |||
2020-04-01 02:13:14 302 United States | |||
2020-04-01 02:13:15 302 United States | |||
2020-04-01 02:13:15 302 United States | |||
2020-04-01 03:25:06 302 United States | |||
2020-04-01 04:03:39 302 Russian Federation | |||
2020-04-01 04:03:44 302 Russian Federation | |||
... | |||
... | |||
2020-05-01 18:53:05 302 Italy | |||
2020-05-01 18:53:21 301 Italy | |||
2020-05-01 18:53:22 301 Italy | |||
2020-05-01 18:53:24 302 Italy | |||
2020-05-01 18:53:25 302 Italy | |||
2020-05-01 18:53:26 302 Italy | |||
2020-05-01 18:53:26 302 Italy | |||
2020-05-01 18:54:20 302 Italy | |||
2020-05-01 19:18:15 301 Russian Federation | |||
2020-05-01 19:18:15 301 Russian Federation | |||
2020-05-01 19:18:15 301 Russian Federation | |||
2020-05-01 19:18:17 301 Russian Federation | |||
2020-05-01 19:21:19 302 France | |||
Processed files: access_log, access_log.1, access_log.2, access_log.3, access_log.4 | |||
Processed log entries: 8994 | |||
Matched log entries: 3207 | |||
``` | |||
**Q: How many `4XX` codes have connected clients from China and United States produced in all time?** | |||
``` | |||
httpd-logparser --outfields time country http_status http_request -d /var/log/httpd/ -c ^4 -f access_log* -cf "United States" China --sortby time --stats | |||
Processing file: access_log | |||
Processing file: access_log.1 | |||
Processing file: access_log.2 | |||
Processing file: access_log.3 | |||
Processing file: access_log.4 | |||
Processing log entry: 10221 | |||
2020-03-29 18:49:34 United States 408 None | |||
2020-03-29 18:49:34 United States 408 None | |||
2020-03-29 19:28:02 China 408 None | |||
2020-04-08 06:14:48 China 400 GET /phpMyAdmin/scripts/setup.php HTTP/1.1 | |||
2020-04-08 06:14:53 China 400 GET /horde/imp/test.php HTTP/1.1 | |||
2020-04-08 06:14:54 China 400 GET /login?from=0.000000 HTTP/1.1 | |||
... | |||
... | |||
2020-04-24 10:40:16 United States 403 GET /MAPI/API HTTP/1.1 | |||
2020-04-24 11:33:16 United States 403 GET /owa/auth/logon.aspx?url=https%3a%2f%2f1%2fecp%2f HTTP/1.1 | |||
2020-04-24 13:00:12 United States 403 GET /cgi-bin/luci HTTP/1.1 | |||
2020-04-24 13:00:13 United States 403 GET /dana-na/auth/url_default/welcome.cgi HTTP/1.1 | |||
2020-04-24 13:00:15 United States 403 GET /remote/login?lang=en HTTP/1.1 | |||
2020-04-24 13:00:17 United States 403 GET /index.asp HTTP/1.1 | |||
2020-04-24 13:00:18 United States 403 GET /htmlV/welcomeMain.htm HTTP/1.1 | |||
2020-04-24 20:08:20 United States 403 GET /dana-na/auth/url_default/welcome.cgi HTTP/1.1 | |||
2020-04-24 20:08:22 United States 403 GET /remote/login?lang=en HTTP/1.1 | |||
2020-04-25 03:57:39 United States 403 GET /home.asp HTTP/1.1 | |||
2020-04-25 03:57:39 United States 403 GET /login.cgi?uri= HTTP/1.1 | |||
2020-04-25 03:57:39 United States 403 GET /vpn/index.html HTTP/1.1 | |||
2020-04-25 03:57:39 United States 403 GET /cgi-bin/luci HTTP/1.1 | |||
2020-04-25 03:57:40 United States 403 GET /dana-na/auth/url_default/welcome.cgi HTTP/1.1 | |||
2020-04-25 03:57:40 United States 403 GET /remote/login?lang=en HTTP/1.1 | |||
2020-04-25 03:57:40 United States 403 GET /index.asp HTTP/1.1 | |||
2020-04-25 03:57:40 United States 403 GET /htmlV/welcomeMain.htm HTTP/1.1 | |||
2020-04-25 11:56:32 United States 403 GET /owa/auth/logon.aspx?url=https%3a%2f%2f1%2fecp%2f HTTP/1.1 | |||
2020-04-25 21:29:50 United States 403 GET /images/favicon-32x32.png HTTP/1.1 | |||
2020-04-25 21:30:08 United States 408 None | |||
Processed files: access_log, access_log.1, access_log.2, access_log.3, access_log.4 | |||
Processed log entries: 10222 | |||
Matched log entries: 90 | |||
``` | |||
**Q: Which user agents are used by all clients in all time?** | |||
``` | |||
httpd-logparser --outfields user_agent -d /var/log/httpd/ -f access_log* --noprogress | sort -u | |||
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) | |||
fasthttp | |||
Go-http-client/1.1 | |||
HTTP Banner Detection (https://security.ipip.net) | |||
kubectl/v1.12.0 (linux/amd64) kubernetes/0ed3388 | |||
libwww-perl/5.833 | |||
libwww-perl/6.06 | |||
libwww-perl/6.43 | |||
Microsoft Office Word 2014 | |||
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) | |||
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) | |||
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728) | |||
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Tablet PC 2.0) | |||
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2) | |||
... | |||
... | |||
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0 | |||
Mozilla/5.0 (X11; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0 | |||
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0 | |||
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0 | |||
Mozilla/5.0 zgrab/0.x | |||
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12sns; +http://researchscan.comsys.rwth-aachen.de) | |||
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13rl; +http://researchscan.comsys.rwth-aachen.de) | |||
NetSystemsResearch studies the availability of various services across the internet. Our website is netsystemsresearch.com | |||
None | |||
python-requests/1.2.3 CPython/2.7.16 Linux/4.14.165-102.185.amzn1.x86_64 | |||
python-requests/2.10.0 | |||
python-requests/2.19.1 | |||
python-requests/2.22.0 | |||
python-requests/2.23.0 | |||
python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1062.12.1.el7.x86_64 | |||
python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1062.18.1.el7.x86_64 | |||
Python-urllib/3.7 | |||
Ruby | |||
Wget/1.19.4 (linux-gnu) | |||
WinHTTP/1.1 | |||
``` | |||
**Q: Time difference between a single client requests? Exclude Finland! Include only the most recent access_log file.** | |||
``` | |||
httpd-logparser --outfields http_status time time_diff country -d /var/log/httpd/ -cf "\!Finland" -f access_log$ | |||
200 2020-05-01 18:53:07 +2.0 Italy | |||
200 2020-05-01 18:53:19 +12.0 Italy | |||
200 2020-05-01 18:53:20 +1.0 Italy | |||
200 2020-05-01 18:53:20 0.0 Italy | |||
200 2020-05-01 18:53:21 +1.0 Italy | |||
200 2020-05-01 18:53:20 -1.0 Italy | |||
200 2020-05-01 18:53:21 +1.0 Italy | |||
200 2020-05-01 18:53:21 0.0 Italy | |||
301 2020-05-01 18:53:21 0.0 Italy | |||
301 2020-05-01 18:53:22 +1.0 Italy | |||
200 2020-05-01 18:53:22 0.0 Italy | |||
200 2020-05-01 18:53:22 0.0 Italy | |||
200 2020-05-01 18:53:23 +1.0 Italy | |||
200 2020-05-01 18:53:23 0.0 Italy | |||
302 2020-05-01 18:53:24 +1.0 Italy | |||
200 2020-05-01 18:53:24 0.0 Italy | |||
200 2020-05-01 18:53:25 +1.0 Italy | |||
302 2020-05-01 18:53:25 0.0 Italy | |||
302 2020-05-01 18:53:26 +1.0 Italy | |||
302 2020-05-01 18:53:26 0.0 Italy | |||
200 2020-05-01 18:53:26 0.0 Italy | |||
200 2020-05-01 18:53:27 +1.0 Italy | |||
200 2020-05-01 18:53:32 +5.0 Italy | |||
302 2020-05-01 18:54:20 +48.0 Italy | |||
408 2020-05-01 18:54:40 +20.0 Italy | |||
... | |||
... | |||
200 2020-05-01 22:14:36 NEW_CONN Russian Federation | |||
200 2020-05-01 22:30:40 +964.0 Russian Federation | |||
500 2020-05-01 22:35:01 NEW_CONN Singapore | |||
500 2020-05-01 22:35:06 +5.0 Singapore | |||
500 2020-05-01 22:35:09 +3.0 Singapore | |||
500 2020-05-01 22:35:14 +5.0 Singapore | |||
200 2020-05-01 22:37:47 NEW_CONN Russian Federation | |||
... | |||
... | |||
``` | |||
## Usage | |||
``` | |||
usage: httpd-logparser [-h] -d [LOG_DIR] -f LOG_FILE [LOG_FILE ...] [-s [LOG_SYNTAX]] [-c STATUS_CODE [STATUS_CODE ...]] [-cf COUNTRY [COUNTRY ...]] [-ot [OUT_TIMEFORMAT]] | |||
[-of OUT_FIELD [OUT_FIELD ...]] [-ng] [-dl [DAY_LOWER]] [-du [DAY_UPPER]] [-sb [SORTBY_FIELD]] [-sbr [SORTBY_FIELD_REVERSE]] [-st] [-np] | |||
optional arguments: | |||
-h, --help show this help message and exit | |||
-d [LOG_DIR], --dir [LOG_DIR] | |||
Apache log file directory. | |||
-f LOG_FILE [LOG_FILE ...], --files LOG_FILE [LOG_FILE ...] | |||
Apache log files. Regular expressions supported. | |||
-s [LOG_SYNTAX], --logsyntax [LOG_SYNTAX] | |||
Apache log files syntax, defined as "LogFormat" directive in Apache configuration. | |||
-c STATUS_CODE [STATUS_CODE ...], --statuscodes STATUS_CODE [STATUS_CODE ...] | |||
Print only these status codes. Regular expressions supported. | |||
-cf COUNTRY [COUNTRY ...], --countryfilter COUNTRY [COUNTRY ...] | |||
Include only these countries. Negative match (exclude): "\!Country" | |||
-ot [OUT_TIMEFORMAT], --outtimeformat [OUT_TIMEFORMAT] | |||
Output time format. Default: "%d-%m-%Y %H:%M:%S" | |||
-of OUT_FIELD [OUT_FIELD ...], --outfields OUT_FIELD [OUT_FIELD ...] | |||
Output fields. Default: log_file_name, http_status, remote_host, country, time, time_diff, user_agent, http_request | |||
-ng, --nogeo Skip country check with external "geoiplookup" tool. | |||
-dl [DAY_LOWER], --daylower [DAY_LOWER] | |||
Do not check log entries older than this day. Day syntax: 31-12-2020 | |||
-du [DAY_UPPER], --dayupper [DAY_UPPER] | |||
Do not check log entries newer than this day. Day syntax: 31-12-2020 | |||
-sb [SORTBY_FIELD], --sortby [SORTBY_FIELD] | |||
Sort by an output field. | |||
-sbr [SORTBY_FIELD_REVERSE], --sortbyreverse [SORTBY_FIELD_REVERSE] | |||
Sort by an output field, reverse order. | |||
-st, --stats Show short statistics at the end. | |||
-np, --noprogress Do not show progress information. | |||
``` | |||
## License | |||
GPLv3. |
@ -0,0 +1,21 @@ | |||
# Maintainer: Pekka Helenius <fincer89 [at] hotmail [dot] com> | |||
pkgname=apache-logparser | |||
pkgver=1 | |||
pkgrel=1 | |||
pkgdesc='Apache log parser' | |||
arch=('any') | |||
url='https://github.com/Fincer/apache-logparser' | |||
license=('MIT') | |||
depends=('python' 'python-apachelogs') | |||
optdepends=( | |||
'geoip: Non-DNS IP-to-country resolver C library & utils' | |||
'geoip-database: GeoLite country geolocation database compiled by MaxMind' | |||
) | |||
makedepends=() | |||
source=('logparser.py') | |||
md5sums=('9a11feac97bffa1d8aadc9e91fee49eb') | |||
package() { | |||
install -Dm755 ${srcdir}/logparser.py ${pkgdir}/usr/bin/httpd-logparser | |||
} |
@ -0,0 +1,402 @@ | |||
#!/bin/env python | |||
# Simple Apache log parser | |||
# Copyright (C) 2020 Pekka Helenius | |||
# | |||
# This program is free software: you can redistribute it and/or modify | |||
# it under the terms of the GNU General Public License as published by | |||
# the Free Software Foundation, either version 3 of the License, or | |||
# (at your option) any later version. | |||
# | |||
# This program is distributed in the hope that it will be useful, | |||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | |||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |||
# GNU General Public License for more details. | |||
# | |||
# You should have received a copy of the GNU General Public License | |||
# along with this program. If not, see <https://www.gnu.org/licenses/>. | |||
################################################################ | |||
# TODO prev_host: instead of comparing to previous entry, check if such IP has been seen in XXX seconds | |||
# store IP values for temporary list for XXX seconds, and check list values | |||
import argparse | |||
import os | |||
import re | |||
import subprocess | |||
from datetime import datetime | |||
from apachelogs import LogParser | |||
out_fields_list = ['log_file_name', 'http_status', 'remote_host', 'country', 'time', 'time_diff', 'user_agent', 'http_request'] | |||
out_timeformat = "%d-%m-%Y %H:%M:%S" | |||
dayformat = "%d-%m-%Y" | |||
ot = '"' + re.sub(r'%', '%%', out_timeformat) + '"' | |||
geotool = "geoiplookup" | |||
geodb = "/usr/share/GeoIP/GeoIP.dat" | |||
# Log format as defined in Apache/HTTPD configuration file (LogFormat directive) | |||
in_log_syntax = "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" \"%{cache-status}e\"" | |||
argparser = argparse.ArgumentParser() | |||
argparser.add_argument('-d', '--dir', help = 'Apache log file directory.', nargs = '?', dest = 'log_dir', required = True) | |||
argparser.add_argument('-f', '--files', help = 'Apache log files. Regular expressions supported.', nargs = '+', dest = 'log_file', required = True) | |||
argparser.add_argument('-s', '--logsyntax', help = 'Apache log files syntax, defined as "LogFormat" directive in Apache configuration.', nargs = '?', dest = 'log_syntax') | |||
argparser.add_argument('-c', '--statuscodes', help = 'Print only these status codes. Regular expressions supported.', nargs = '+', dest = 'status_code') | |||
argparser.add_argument('-cf', '--countryfilter', help = 'Include only these countries. Negative match (exclude): "\!Country"', nargs = '+', dest = 'country') | |||
argparser.add_argument('-ot', '--outtimeformat', help = 'Output time format.\nDefault: ' + ot, nargs = '?', dest = 'out_timeformat') | |||
argparser.add_argument('-of', '--outfields', help = 'Output fields.\nDefault: ' + ', '.join(out_fields_list), nargs = '+', dest = 'out_field') | |||
argparser.add_argument('-ng', '--nogeo', help = 'Skip country check with external "geoiplookup" tool.', action='store_true', dest = 'no_geo') | |||
argparser.add_argument('-dl', '--daylower', help = 'Do not check log entries older than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_lower') | |||
argparser.add_argument('-du', '--dayupper', help = 'Do not check log entries newer than this day.\nDay syntax: 31-12-2020', nargs = '?', dest = 'day_upper') | |||
argparser.add_argument('-sb', '--sortby', help = 'Sort by an output field.', nargs = '?', dest = 'sortby_field') | |||
argparser.add_argument('-sbr', '--sortbyreverse', help = 'Sort by an output field, reverse order.', nargs = '?', dest = 'sortby_field_reverse') | |||
argparser.add_argument('-st', '--stats', help = 'Show short statistics at the end.', action='store_true', dest = 'show_count') | |||
argparser.add_argument('-np', '--noprogress', help = 'Do not show progress information.', action='store_true', dest = 'no_progress') | |||
args = argparser.parse_args() | |||
if args.status_code is None: | |||
status_filter = False | |||
skip_line_1 = False | |||
else: | |||
status_filter = True | |||
skip_line_1 = True | |||
status_codes = args.status_code | |||
http_valid_codes = [ | |||
'100', | |||
'101', | |||
'102', | |||
'103', | |||
'200', | |||
'201', | |||
'202', | |||
'203', | |||
'204', | |||
'205', | |||
'206', | |||
'207', | |||
'208', | |||
'226', | |||
'300', | |||
'301', | |||
'302', | |||
'303', | |||
'304', | |||
'305', | |||
'306', | |||
'307', | |||
'308', | |||
'400', | |||
'401', | |||
'402', | |||
'403', | |||
'404', | |||
'405', | |||
'406', | |||
'407', | |||
'408', | |||
'409', | |||
'410', | |||
'411', | |||
'412', | |||
'413', | |||
'414', | |||
'415', | |||
'416', | |||
'417', | |||
'418', | |||
'421', | |||
'422', | |||
'423', | |||
'424', | |||
'425', | |||
'426', | |||
'428', | |||
'429', | |||
'431', | |||
'451', | |||
'500', | |||
'501', | |||
'502', | |||
'503', | |||
'504', | |||
'505', | |||
'506', | |||
'507', | |||
'508', | |||
'510', | |||
'511', | |||
'218' | |||
] | |||
code_statuses = [] | |||
for status_input in status_codes: | |||
init_status = False | |||
status_append = status_input | |||
status_appended = False | |||
for status_valid in http_valid_codes: | |||
if re.search(status_input, status_valid): | |||
status_append = status_valid | |||
init_status = True | |||
status_appended = True | |||
code_statuses.append((status_append, init_status)) | |||
else: | |||
init_status = False | |||
if not status_appended: | |||
code_statuses.append((status_append, init_status)) | |||
error_msg = "" | |||
for vl in code_statuses: | |||
status, init_status = vl | |||
if not init_status: | |||
error_msg += "Invalid status code '" + status + "' supplied\n" | |||
if error_msg != "": | |||
raise Exception("\n" + error_msg) | |||
if args.country is None: | |||
country_filter = False | |||
skip_line_2 = False | |||
else: | |||
country_filter = True | |||
countries_filter_list = args.country | |||
skip_line_2 = True | |||
if args.out_timeformat is not None: | |||
out_timeformat = args.out_timeformat | |||
if args.out_field is not None: | |||
out_fields_list = args.out_field | |||
if args.day_lower is not None: | |||
day_lower = datetime.strptime(args.day_lower, dayformat) | |||
else: | |||
day_lower = None | |||
if args.day_upper is not None: | |||
day_upper = datetime.strptime(args.day_upper, dayformat) | |||
else: | |||
day_upper = None | |||
if args.log_syntax is None: | |||
log_syntax = in_log_syntax | |||
else: | |||
log_syntax = args.log_syntax | |||
log_dir = args.log_dir | |||
files = args.log_file | |||
no_progress = args.no_progress | |||
files_tmp = [] | |||
parser = LogParser(log_syntax) | |||
for file_regex in files: | |||
for file in os.listdir(log_dir): | |||
fullpath = log_dir + file | |||
if os.path.isfile(fullpath): | |||
if re.search(file_regex, file): | |||
files_tmp.append(file) | |||
files_tmp.sort() | |||
files = files_tmp | |||
def fileCheck(file, flag, env=None): | |||
if env is None: | |||
filepath = file | |||
else: | |||
for path in os.environ[env].split(os.pathsep): | |||
filepath = os.path.join(path, file) | |||
if os.path.isfile(filepath): | |||
break | |||
if os.access(filepath, eval(flag)): | |||
return True | |||
return False | |||
# TODO Really exclude, when no additional args are passed to either of both | |||
if args.sortby_field is not None and args.sortby_field_reverse is not None: | |||
raise Exception("Use either normal or reverse sorting.") | |||
sortby_field = None | |||
if args.sortby_field is not None: | |||
sortby_field = args.sortby_field | |||
reverse_order = False | |||
elif args.sortby_field_reverse is not None: | |||
sortby_field = args.sortby_field_reverse | |||
reverse_order = True | |||
i = 0 | |||
country_seen = False | |||
prev_host = "" | |||
host_country = "" | |||
log_entries = [] | |||
for file in files: | |||
if not no_progress: | |||
print("Processing file: " + file) | |||
with open(log_dir + file, 'r') as f: | |||
for line in f: | |||
if not no_progress: | |||
print("Processing log entry: " + str(i), end = "\r") | |||
if i != 0 and not (skip_line_1 or skip_line_2): | |||
prev_host = entry_remote_host | |||
prev_host_time = entry_time | |||
entry = parser.parse(line) | |||
entry_time = entry.request_time.replace(tzinfo=None) | |||
# TODO Handle situations where date_upper & date_lower are equal | |||
if day_upper is not None and day_lower is not None: | |||
if day_lower > day_upper: | |||
raise Exception("Earlier day can't be later than later day") | |||
if day_upper is not None: | |||
if day_upper > datetime.now(): | |||
raise Exception("Day can't be in the future") | |||
if day_lower is not None: | |||
if day_lower > datetime.now(): | |||
raise Exception("Day can't be in the future") | |||
if day_lower is not None: | |||
if entry_time <= day_lower: continue | |||
if day_upper is not None: | |||
if entry_time >= day_upper: continue | |||
entry_remote_host = entry.remote_host | |||
entry_http_status = entry.final_status | |||
entry_user_agent = entry.headers_in["User-Agent"] | |||
# In case where request has newline or other similar chars. Tell Python interpreter to escape them | |||
entry_http_request = str(entry.request_line).encode('unicode_escape').decode() | |||
if status_filter: | |||
for status in code_statuses: | |||
num, num_ok = status | |||
status = int(num) | |||
if status != entry_http_status: | |||
skip_line_1 = True | |||
else: | |||
skip_line_1 = False | |||
break | |||
if not args.no_geo and fileCheck(geotool, "os.X_OK", "PATH") and fileCheck(geodb, "os.R_OK"): | |||
if prev_host == entry.remote_host: | |||
country_seen = True | |||
else: | |||
country_seen = False | |||
if not country_seen: | |||
host_country = subprocess.check_output([geotool, entry_remote_host]).rstrip().decode() | |||
host_country = re.sub(r"^.*, (.*)", r'\1', host_country) | |||
if re.search("Address not found", host_country): | |||
host_country = "Unknown" | |||
if country_filter: | |||
for country in countries_filter_list: | |||
if country[1] == "!": | |||
country = country[2:] | |||
if country.lower() == host_country.lower(): | |||
skip_line_2 = True | |||
break | |||
else: | |||
skip_line_2 = False | |||
elif country.lower() != host_country.lower(): | |||
skip_line_2 = True | |||
else: | |||
skip_line_2 = False | |||
break | |||
else: | |||
skip_line_2 = False | |||
if skip_line_1 or skip_line_2: | |||
i += 1 | |||
continue | |||
time_diff = str("NEW_CONN") | |||
if prev_host == entry_remote_host: | |||
time_diff = ( entry_time - prev_host_time ).total_seconds() | |||
if time_diff > 0: | |||
time_diff = "+" + str(time_diff) | |||
if i == 0: | |||
time_diff = float(0.0) | |||
# TODO: Optimize stri generation logic, avoid generating multiple times since it's really not necessary | |||
out_fields = [ | |||
('log_file_name', file, '{:s}' ), | |||
('http_status', entry_http_status, '{:3s}' ), | |||
('remote_host', entry_remote_host, '{:15s}'), | |||
('country', host_country, '{:20s}'), | |||
('time', entry_time, '{:8s}' ), | |||
('time_diff', time_diff, '{:8s}' ), | |||
('user_agent', entry_user_agent, '{:s}' ), | |||
('http_request', entry_http_request, '{:s}' ) | |||
] | |||
stri = "" | |||
printargs = [] | |||
t = 0 | |||
while t <= len(out_fields_list) - 1: | |||
for out_field in out_fields: | |||
entry, data, striformat = out_field | |||
if args.no_geo and entry == "country": | |||
continue | |||
if out_fields_list[t] == entry: | |||
stri += "\t" + striformat | |||
printargs.append(data) | |||
break | |||
t += 1 | |||
log_entries.append(printargs) | |||
i += 1 | |||
if sortby_field is not None: | |||
sort_field_found = False | |||
d = 0 | |||
for field in out_fields_list: | |||
if field == sortby_field: | |||
sort_field_index = d | |||
sort_field_found = True | |||
break | |||
d += 1 | |||
if sort_field_found: | |||
log_entries.sort(key = lambda log_entries: log_entries[sort_field_index], reverse=reverse_order) | |||
if not no_progress: | |||
print("\n") | |||
for entry in log_entries: | |||
c = 0 | |||
entry_tmp = [] | |||
while c <= len(entry) - 1: | |||
entry_tmp.append(str(entry[c])) | |||
c += 1 | |||
print(stri.format(*entry_tmp).lstrip()) | |||
if args.show_count: | |||
print(("\n" + | |||
"Processed files: {:s}\n" + | |||
"Processed log entries: {:d}\n" + | |||
"Matched log entries: {:d}\n").format( | |||
', '.join(files), | |||
i, | |||
len(log_entries) | |||
) | |||
) |
@ -0,0 +1,26 @@ | |||
# Maintainer: Pekka Helenius <fincer89 [at] hotmail [dot] com> | |||
pkgname=python-apachelogs | |||
_pkgname=apachelogs | |||
pkgver=v0.5.0.r4.g7ee86af | |||
pkgrel=1 | |||
pkgdesc='Python Apache logs parser' | |||
arch=('any') | |||
url='https://github.com/jwodder/apachelogs' | |||
license=('MIT') | |||
depends=('python' 'python-pydicti' 'python-attrs') | |||
makedepends=('git' 'python') | |||
source=("$pkgname::git+https://github.com/jwodder/${_pkgname}.git") | |||
sha256sums=('SKIP') | |||
pkgver() { | |||
cd $pkgname | |||
git describe --long | sed 's/\([^-]*-g\)/r\1/;s/-/./g' | |||
} | |||
package() { | |||
cd $pkgname | |||
python setup.py install --root="$pkgdir/" | |||
} | |||
@ -0,0 +1,24 @@ | |||
# Maintainer: Pekka Helenius <fincer89 [at] hotmail [dot] com> | |||
pkgname=python-pydicti | |||
_pkgname=pydicti | |||
pkgver=127.fa414fd | |||
pkgrel=1 | |||
pkgdesc='Case insensitive dictionary with user-defined underlying dictionary for Python' | |||
arch=('any') | |||
url='https://github.com/coldfix/pydicti' | |||
license=('GPLv2') | |||
depends=('python') | |||
makedepends=('git' 'python') | |||
source=("$pkgname::git+https://github.com/coldfix/${_pkgname}.git") | |||
sha256sums=('SKIP') | |||
pkgver() { | |||
cd $pkgname | |||
echo $(git rev-list --count HEAD).$(git rev-parse --short HEAD) | |||
} | |||
package() { | |||
cd $pkgname | |||
python setup.py install --root="$pkgdir/" | |||
} |