Simple Apache/HTTPD log parser for administrative analysis
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

530 lines
29 KiB

1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
1 year ago
4 years ago
1 year ago
1 year ago
1 year ago
4 years ago
1 year ago
1 year ago
4 years ago
4 years ago
1 year ago
4 years ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
4 years ago
1 year ago
1 year ago
1 year ago
4 years ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
1 year ago
4 years ago
  1. # Apache HTTPD log parser
  2. Apache/HTTPD command-line log parser for Linux web server administrators.
  3. ## Motivation
  4. Keep it simple. Very simple.
  5. Although advanced and nice-looking log analytic tools such as [Elastic Stack](https://www.elastic.co/products/) exists, I wanted something far more simple and with far less overhead for weekly tasks and for configuring an Apache web server. Therefore, I wrote this simple Python script to parse Apache web server logs.
  6. **Advantages** of this tool are little overhead, piping output to other Unix tools and doing some quick log checks. The main idea is to give desired output for short analysis so that you can properly configure your web server protection mechanisms and network environment based on the actual server data.
  7. This tool is not for intrusion detection/prevention or does not alert administration about hostile penetration attempts. However, it may reveal simple underlying misconfigurations such as invalid URL references on your website.
  8. ## Requirements
  9. Following Python packages (Arch Linux):
  10. ```
  11. python
  12. python-apachelogs
  13. ```
  14. [python-apachelogs](https://github.com/jwodder/apachelogs/) is not available either on Arch Linux repositories or AUR repositories. Therefore, I provide a PKGBUILD file to install it. [python-apachelogs - PKGBUILD](python-apachelogs/PKGBUILD)
  15. `python-apachelogs` has a sub-dependency of [python-pydicti](python-apachelogs/python-pydicti/PKGBUILD) package.
  16. Recommended packages for IP address geo-location:
  17. ```
  18. geoip
  19. geoip-database
  20. ```
  21. ## Installation
  22. Arch Linux:
  23. run `updpkgsums && makepkg -Cfi` in [apache-logparser](apache-logparser/) directory. The command installs `httpd-logparser` executable file in `/usr/bin/` folder.
  24. ## Features
  25. - Multiple Linux distributions supported
  26. - Supported output formats: `table` and `csv`
  27. - Use output log entry field ordering
  28. - Include and exclude log entry fields
  29. - Date ranges
  30. - Geo IP lookup for log entries
  31. - Get origin countries and cities
  32. - Unknown cities: give coordinates instead
  33. - Check also: [MaxMind DB Apache Module](https://github.com/maxmind/mod_maxminddb)
  34. - Output field filters
  35. - Limit processed log entries with `--head` and `--tail` parameters
  36. - Get only interesting HTTP response codes
  37. - Get only interesting countries of origin
  38. - Process multiple log files at once, either by providing a list of files or matching regex
  39. - Show processing status
  40. - Show processing summary
  41. - List invalid log entries that couldn't be processed
  42. ## Examples
  43. **Q: List unique connections (IP addresses) associated with country and city location data, using the last Apache log file?**
  44. ```
  45. httpd-logparser --files-list /var/log/httpd/access_log --included-fields time,remote_host,country,city | sort -k 2 -u | sort -k 3
  46. 103.102.153.XXX Indonesia Unknown: -6.175000, 106.828598 2022-06-12 10:33:58
  47. 103.102.153.XXX Indonesia Unknown: -6.175000, 106.828598 2022-06-12 10:33:59
  48. 103.102.153.XXX Indonesia Unknown: -6.175000, 106.828598 2022-06-12 10:34:00
  49. 103.102.153.XXX Indonesia Unknown: -6.175000, 106.828598 2022-06-12 10:34:01
  50. 103.144.178.XXX Indonesia Unknown: -6.175000, 106.828598 2022-06-16 06:34:19
  51. 62.214.113.XXX Germany Unterhaching 2022-06-10 14:39:16
  52. 62.214.113.XXX Germany Unterhaching 2022-06-10 16:34:15
  53. 62.214.113.XXX Germany Unterhaching 2022-06-10 16:40:03
  54. 62.214.113.XXX Germany Unterhaching 2022-06-10 16:40:04
  55. 62.214.113.XXX Germany Unterhaching 2022-06-10 16:40:05
  56. 84.234.169.XXX Norway Valderoy 2022-06-06 00:20:18
  57. 194.137.241.XXX Finland Vantaa 2022-06-07 12:20:42
  58. 194.137.241.XXX Finland Vantaa 2022-06-07 12:20:43
  59. 194.137.241.XXX Finland Vantaa 2022-06-07 12:20:44
  60. ...
  61. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-07 21:25:38
  62. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-07 21:25:39
  63. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-07 21:25:40
  64. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-07 21:25:41
  65. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-07 21:25:42
  66. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-08 23:35:25
  67. 176.108.111.XXX Ukraine Vyzhnytsya 2022-06-11 19:52:42
  68. 82.207.245.XXX Germany Wachtberg 2022-06-03 02:26:58
  69. 82.207.245.XXX Germany Wachtberg 2022-06-03 02:27:08
  70. 82.207.245.XXX Germany Wachtberg 2022-06-03 02:27:09
  71. 82.207.245.XXX Germany Wachtberg 2022-06-03 02:27:10
  72. 79.191.159.XXX Poland Warsaw 2022-06-11 18:05:13
  73. 49.7.20.XXX China Wenzhou 2022-06-09 15:26:26
  74. 49.7.21.XXX China Wenzhou 2022-06-09 23:25:57
  75. 49.7.20.XXX China Wenzhou 2022-06-19 01:41:41
  76. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:45:21
  77. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:10
  78. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:11
  79. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:12
  80. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:13
  81. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:14
  82. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:41
  83. 81.82.244.XXX Belgium Wetteren 2022-06-13 13:49:46
  84. 95.223.231.XXX Germany Wiesbaden 2022-06-04 21:42:20
  85. 95.223.231.XXX Germany Wiesbaden 2022-06-04 21:42:21
  86. 95.223.231.XXX Germany Wiesbaden 2022-06-04 21:42:23
  87. 95.223.231.XXX Germany Wiesbaden 2022-06-04 21:42:28
  88. 37.201.116.XXX Germany Wiesbaden 2022-06-10 19:55:50
  89. 113.57.152.XXX China Wuhan 2022-06-14 15:51:21
  90. 113.57.152.XXX China Wuhan 2022-06-14 15:51:22
  91. 113.57.152.XXX China Wuhan 2022-06-14 15:51:23
  92. 113.57.152.XXX China Wuhan 2022-06-14 15:51:25
  93. 113.57.152.XXX China Wuhan 2022-06-14 15:51:26
  94. 113.57.152.XXX China Wuhan 2022-06-14 15:51:57
  95. 113.57.152.XXX China Wuhan 2022-06-14 15:51:58
  96. 113.57.152.XXX China Wuhan 2022-06-14 15:52:01
  97. 89.164.183.XXX Croatia Zagreb 2022-06-04 11:44:22
  98. 89.164.183.XXX Croatia Zagreb 2022-06-04 11:44:23
  99. 89.164.183.XXX Croatia Zagreb 2022-06-04 11:44:24
  100. 89.164.183.XXX Croatia Zagreb 2022-06-04 11:44:25
  101. 89.164.183.XXX Croatia Zagreb 2022-06-04 11:44:26
  102. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:49
  103. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:51
  104. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:52
  105. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:53
  106. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:55
  107. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:56
  108. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:45:59
  109. 86.32.46.XXX Croatia Zagreb 2022-06-04 16:46:00
  110. 85.10.56.XXX Croatia Zagreb 2022-06-09 19:39:55
  111. 85.10.56.XXX Croatia Zagreb 2022-06-17 19:57:56
  112. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:41
  113. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:42
  114. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:43
  115. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:44
  116. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:45
  117. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:46
  118. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:47
  119. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:48
  120. 122.56.232.XXX New Zealand Auckland 2022-06-02 08:46:49
  121. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:22
  122. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:23
  123. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:24
  124. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:25
  125. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:26
  126. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:27
  127. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:28
  128. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:29
  129. 121.98.28.XXX New Zealand Dunedin 2022-06-08 14:32:30
  130. 185.113.213.XXX Netherlands Zennewijnen 2022-06-15 11:54:36
  131. 185.113.213.XXX Netherlands Zennewijnen 2022-06-15 11:54:37
  132. 185.113.213.XXX Netherlands Zennewijnen 2022-06-15 11:54:39
  133. ```
  134. NOTE: The last numerical part of all ip addresses are anonymized with `XXX` string.
  135. **Q: How many valid requests from Finland and Sweden occured between 15th - 24th April 2022?**
  136. ```
  137. httpd-logparser --files-regex /var/log/httpd/access_log --included-fields time,http_status,country --sort-by time --status-codes ^20* --day-lower "15-04-2022" --day-upper "24-04-2022" --countries Finland,Sweden --show-stats --show-progress
  138. File count: 5
  139. Lines in total: 86876
  140. Processing file: /var/log/httpd/access_log (lines: 23116)
  141. Processing file: /var/log/httpd/access_log.1 (lines: 21566)
  142. Processing file: /var/log/httpd/access_log.2 (lines: 13490)
  143. Processing file: /var/log/httpd/access_log.3 (lines: 13822)
  144. Processing file: /var/log/httpd/access_log.4 (lines: 14882)
  145. Processing log entry: 81924 (94.30%)
  146. ...
  147. 200 Sweden 2022-04-17 21:51:09
  148. 200 Sweden 2022-04-17 21:51:10
  149. 200 Sweden 2022-04-17 21:51:10
  150. 200 Sweden 2022-04-17 23:41:35
  151. 200 Sweden 2022-04-17 23:41:36
  152. 200 Sweden 2022-04-17 23:41:36
  153. 200 Sweden 2022-04-17 23:41:39
  154. 200 Sweden 2022-04-18 11:23:18
  155. 200 Sweden 2022-04-19 07:16:25
  156. 200 Sweden 2022-04-19 07:16:34
  157. 200 Finland 2022-04-19 11:47:51
  158. 200 Finland 2022-04-19 11:47:52
  159. 200 Finland 2022-04-19 11:47:52
  160. 200 Finland 2022-04-19 11:47:52
  161. ...
  162. 200 Finland 2022-04-22 09:51:16
  163. 200 Finland 2022-04-22 09:51:16
  164. 200 Finland 2022-04-22 09:51:16
  165. 200 Finland 2022-04-22 09:51:16
  166. 200 Finland 2022-04-22 09:51:16
  167. 200 Finland 2022-04-22 09:51:16
  168. 200 Finland 2022-04-22 12:38:49
  169. 200 Finland 2022-04-22 16:53:11
  170. ...
  171. Processed files: /var/log/httpd/access_log, /var/log/httpd/access_log.1, /var/log/httpd/access_log.2, /var/log/httpd/access_log.3, /var/log/httpd/access_log.4
  172. Processed log entries: 86876
  173. Matched log entries: 533
  174. ```
  175. Answer: 533
  176. **Q: How many redirects have occured since the 1st April 2022 according to two selected log files?**
  177. ```
  178. httpd-logparser --outfields time http_status country -d /var/log/httpd/ -c ^30* -f access_log* -dl "01-04-2020" --sortby time --stats
  179. httpd-logparser --files-regex /var/log/httpd/access_log.\[2-3\] --included-fields time,http_status,country --sort-by time --status-codes ^30* --day-lower "01-04-2022" --show-stats
  180. ...
  181. 304 Canada 2022-05-23 01:52:45
  182. 302 Canada 2022-05-23 01:53:33
  183. 302 Europe 2022-05-23 01:56:03
  184. 302 Poland 2022-05-23 02:00:31
  185. 302 Russian Federation 2022-05-23 02:52:50
  186. 302 United States 2022-05-23 04:34:30
  187. 302 France 2022-05-23 04:51:31
  188. 302 Germany 2022-05-23 05:02:16
  189. 302 Russian Federation 2022-05-23 05:04:13
  190. 302 Russian Federation 2022-05-23 05:04:14
  191. 302 Russian Federation 2022-05-23 05:04:14
  192. 302 United States 2022-05-23 05:11:10
  193. 302 United States 2022-05-23 05:11:11
  194. 302 Russian Federation 2022-05-23 05:23:09
  195. 302 China 2022-05-23 05:54:41
  196. ...
  197. 302 Germany 2022-05-31 19:53:18
  198. 302 Germany 2022-05-31 19:53:18
  199. 302 Germany 2022-05-31 19:53:18
  200. 302 Germany 2022-05-31 19:53:19
  201. 302 Germany 2022-05-31 19:53:19
  202. 304 Finland 2022-05-31 20:06:55
  203. 304 Finland 2022-05-31 20:16:02
  204. 304 Finland 2022-05-31 20:16:03
  205. 304 Finland 2022-05-31 20:16:06
  206. 302 Russian Federation 2022-05-31 20:40:33
  207. 302 United Kingdom 2022-05-31 21:09:32
  208. 302 China 2022-05-31 21:13:38
  209. 302 Russian Federation 2022-05-31 21:20:09
  210. 302 Romania 2022-05-31 22:01:31
  211. 304 United States 2022-05-31 22:11:30
  212. 302 Russian Federation 2022-05-31 22:59:23
  213. 302 United States 2022-05-31 23:16:52
  214. 304 Ukraine 2022-05-31 23:22:50
  215. 302 Russian Federation 2022-05-31 23:30:51
  216. 302 Netherlands 2022-05-31 23:37:10
  217. 302 Netherlands 2022-05-31 23:37:11
  218. 302 Netherlands 2022-05-31 23:37:12
  219. Processed files: /var/log/httpd/access_log.2, /var/log/httpd/access_log.3
  220. Processed log entries: 77730
  221. Matched log entries: 6788
  222. Invalid lines:
  223. File: /var/log/httpd/access_log.2, line: 24668
  224. ```
  225. Answer: 6788
  226. You should also check any invalid log lines detected by the tool.
  227. **Q: How many `4XX` codes have connected clients from China and United States produced?**
  228. ```
  229. httpd-logparser --files-regex /var/log/httpd/access_log --included-fields time,country,http_status,http_request --countries "United States",China --sort-by time --status-codes ^4 --show-progress --show-stats
  230. File count: 2
  231. Lines in total: 23614
  232. Processing file: /var/log/httpd/access_log (lines: 12021)
  233. Processing file: /var/log/httpd/access_log.1 (lines: 11593)
  234. Processing log entry: 18423 (78.01%)
  235. ...
  236. 408 United States 2022-06-01 03:45:18 None
  237. 408 United States 2022-06-01 03:45:18 None
  238. 408 United States 2022-06-01 09:11:15 None
  239. 408 United States 2022-06-01 11:36:05 None
  240. 408 United States 2022-06-01 11:36:05 None
  241. 421 United States 2022-06-01 13:08:29 GET / HTTP/1.1
  242. 408 United States 2022-06-01 19:44:42 None
  243. 408 United States 2022-06-01 19:44:42 None
  244. 408 China 2022-06-02 06:30:51 None
  245. 408 China 2022-06-02 06:30:51 None
  246. 408 China 2022-06-02 06:30:51 None
  247. 408 United States 2022-06-02 11:45:57 None
  248. 408 United States 2022-06-02 11:46:05 None
  249. 408 United States 2022-06-02 11:46:18 None
  250. 408 United States 2022-06-02 20:53:49 None
  251. 408 United States 2022-06-02 20:53:49 None
  252. 408 United States 2022-06-03 00:01:39 None
  253. 408 United States 2022-06-03 00:02:04 None
  254. 408 United States 2022-06-03 00:02:37 None
  255. 408 United States 2022-06-03 00:21:26 None
  256. 408 China 2022-06-03 11:39:22 None
  257. 408 United States 2022-06-03 15:41:34 None
  258. 408 United States 2022-06-04 01:28:08 None
  259. 408 United States 2022-06-04 07:29:53 None
  260. 408 United States 2022-06-04 07:29:56 None
  261. 408 United States 2022-06-04 07:29:56 None
  262. 408 United States 2022-06-04 11:25:10 None
  263. 408 United States 2022-06-04 11:25:10 None
  264. 408 China 2022-06-04 11:37:11 None
  265. 408 United States 2022-06-04 17:36:35 None
  266. 408 China 2022-06-05 15:56:35 None
  267. 408 China 2022-06-05 15:56:45 None
  268. 408 United States 2022-06-06 01:32:25 None
  269. 408 United States 2022-06-06 01:32:25 None
  270. 408 United States 2022-06-06 01:32:29 None
  271. ...
  272. Processed files: /var/log/httpd/access_log, /var/log/httpd/access_log.1
  273. Processed log entries: 23614
  274. Matched log entries: 112
  275. ```
  276. Answer: 112
  277. **Q: Which user agents clients have used recently?**
  278. ```
  279. httpd-logparser --files-list /var/log/httpd/access_log --included-fields user_agent | sort -u
  280. facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
  281. fasthttp
  282. Go-http-client/1.1
  283. HTTP Banner Detection (https://security.ipip.net)
  284. kubectl/v1.12.0 (linux/amd64) kubernetes/0ed3388
  285. libwww-perl/5.833
  286. libwww-perl/6.06
  287. libwww-perl/6.43
  288. Microsoft Office Word 2014
  289. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
  290. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
  291. Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50728)
  292. Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Win64; x64; Trident/7.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.30729; .NET CLR 3.5.30729; Tablet PC 2.0)
  293. Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)
  294. ...
  295. ...
  296. Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0
  297. Mozilla/5.0 (X11; Linux x86_64; rv:73.0) Gecko/20100101 Firefox/73.0
  298. Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0
  299. Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0
  300. Mozilla/5.0 zgrab/0.x
  301. Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12sns; +http://researchscan.comsys.rwth-aachen.de)
  302. Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13rl; +http://researchscan.comsys.rwth-aachen.de)
  303. NetSystemsResearch studies the availability of various services across the internet. Our website is netsystemsresearch.com
  304. None
  305. python-requests/1.2.3 CPython/2.7.16 Linux/4.14.165-102.185.amzn1.x86_64
  306. python-requests/2.10.0
  307. python-requests/2.19.1
  308. python-requests/2.22.0
  309. python-requests/2.23.0
  310. python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1062.12.1.el7.x86_64
  311. python-requests/2.6.0 CPython/2.7.5 Linux/3.10.0-1062.18.1.el7.x86_64
  312. Python-urllib/3.7
  313. Ruby
  314. Wget/1.19.4 (linux-gnu)
  315. WinHTTP/1.1
  316. ```
  317. **Q: Which is time difference between single client requests? Exclude Finland. Include all access_log files.**
  318. ```
  319. httpd-logparser --included-fields http_status,time,time_diff,country --countries "\!Finland" --files-regex /var/log/httpd/old/access_log
  320. 200 Taiwan 2022-06-19 12:21:47 NEW_CONN
  321. 200 Taiwan 2022-06-19 12:21:48 +1
  322. 200 Taiwan 2022-06-19 12:21:49 +1
  323. 200 Taiwan 2022-06-19 12:21:49 0
  324. 200 Taiwan 2022-06-19 12:21:49 0
  325. 200 Taiwan 2022-06-19 12:21:49 0
  326. 200 Taiwan 2022-06-19 12:21:50 +1
  327. 200 Taiwan 2022-06-19 12:21:49 -1
  328. 200 Taiwan 2022-06-19 12:21:49 0
  329. 200 Taiwan 2022-06-19 12:21:50 +1
  330. 200 Taiwan 2022-06-19 12:21:50 0
  331. 200 Taiwan 2022-06-19 12:21:50 0
  332. 200 Taiwan 2022-06-19 12:21:51 +1
  333. 200 Taiwan 2022-06-19 12:21:56 +5
  334. 200 Taiwan 2022-06-19 12:22:04 +8
  335. 200 Taiwan 2022-06-19 12:22:05 +1
  336. 200 Taiwan 2022-06-19 12:22:06 +1
  337. 200 Taiwan 2022-06-19 12:22:06 0
  338. 200 Taiwan 2022-06-19 12:22:06 0
  339. 302 Taiwan 2022-06-19 12:22:07 +1
  340. 200 Taiwan 2022-06-19 12:22:07 0
  341. 200 Taiwan 2022-06-19 12:22:07 0
  342. 200 Taiwan 2022-06-19 12:22:07 0
  343. 200 Taiwan 2022-06-19 12:22:07 0
  344. 200 Taiwan 2022-06-19 12:22:07 0
  345. 200 Taiwan 2022-06-19 12:22:14 +7
  346. 200 Taiwan 2022-06-19 12:22:14 0
  347. 200 Japan 2022-06-19 12:34:49 NEW_CONN
  348. 200 Japan 2022-06-19 12:34:54 +5
  349. 200 United States 2022-06-19 12:55:44 NEW_CONN
  350. 200 United States 2022-06-19 12:55:44 0
  351. 200 United States 2022-06-19 12:55:50 +6
  352. 200 United States 2022-06-19 12:55:55 +5
  353. 302 France 2022-06-19 13:01:30 NEW_CONN
  354. 200 United States 2022-06-19 13:10:07 NEW_CONN
  355. 200 United States 2022-06-19 13:10:12 +5
  356. 302 China 2022-06-19 13:15:59 NEW_CONN
  357. 302 China 2022-06-19 13:16:10 +11
  358. 302 China 2022-06-19 13:16:11 +1
  359. 200 Germany 2022-06-19 13:27:42 NEW_CONN
  360. 200 Hong Kong 2022-06-19 13:40:02 NEW_CONN
  361. 200 Hong Kong 2022-06-19 13:40:02 0
  362. 200 Hong Kong 2022-06-19 13:40:02 0
  363. ...
  364. 200 India 2022-06-19 13:45:03 NEW_CONN
  365. 200 India 2022-06-19 13:45:04 +1
  366. 200 India 2022-06-19 13:45:04 0
  367. 200 India 2022-06-19 13:45:04 0
  368. 200 India 2022-06-19 13:45:04 0
  369. 200 India 2022-06-19 13:45:05 +1
  370. 200 India 2022-06-19 13:45:05 0
  371. 200 India 2022-06-19 13:45:05 0
  372. ...
  373. ```
  374. **Get CSV formatted output, selected fields only, use day limit, process last 100 server log entries. Print header information.**
  375. ```
  376. httpd-logparser --files-list /var/log/httpd/access_log --geo-location --sort-by time --included-fields time,country,city,http_request --day-lower 27-06-2022 --verbose --tail 100 --output csv --print-header
  377. Date/Time,Country,City,Request
  378. ...
  379. 2022-06-27 23:33:14,United States,Unknown: 37.750999, -97.821999,GET /git/explore/repos?sort=recentupdate&q=dds-format&tab= HTTP/1.1
  380. 2022-06-27 23:33:16,United States,Unknown: 37.750999, -97.821999,GET /git/explore/repos?sort=reversealphabetically&q=transmission&tab= HTTP/1.1
  381. 2022-06-27 23:33:19,United States,Unknown: 37.750999, -97.821999,GET /git/explore/repos?sort=feweststars&q=real-time-strategy&tab= HTTP/1.1
  382. 2022-06-27 23:33:21,United States,Unknown: 37.750999, -97.821999,GET /git/explore/repos?sort=feweststars&q=shell-script&tab= HTTP/1.1
  383. 2022-06-27 23:34:28,United States,Austin,GET /XXX HTTP/1.1
  384. 2022-06-27 23:34:28,United States,Austin,GET /css/XXX HTTP/1.1
  385. 2022-06-27 23:34:28,United States,Austin,GET /css/XXX HTTP/1.1
  386. 2022-06-27 23:34:28,United States,Austin,GET /js/XXX HTTP/1.1
  387. 2022-06-27 23:34:29,United States,Austin,GET /js/XXX HTTP/1.1
  388. 2022-06-27 23:34:29,United States,Austin,GET /js/XXX HTTP/1.1
  389. 2022-06-27 23:34:29,United States,Austin,GET /images/XXX HTTP/1.1
  390. 2022-06-27 23:34:29,United States,Austin,GET /js/XXX HTTP/1.1
  391. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  392. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  393. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  394. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  395. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  396. 2022-06-27 23:34:30,United States,Austin,GET /images/XXX HTTP/1.1
  397. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  398. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  399. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  400. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  401. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  402. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  403. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  404. 2022-06-27 23:34:31,United States,Austin,GET /webfonts/XXX HTTP/1.1
  405. 2022-06-27 23:34:31,United States,Austin,GET /webfonts/XXX HTTP/1.1
  406. 2022-06-27 23:34:31,United States,Austin,GET /images/XXX HTTP/1.1
  407. 2022-06-27 23:34:31,United States,Austin,GET /webfonts/XXX HTTP/1.1
  408. 2022-06-27 23:34:32,United States,Austin,GET /images/XXX HTTP/1.1
  409. 2022-06-27 23:34:32,United States,Austin,GET / HTTP/1.1
  410. 2022-06-27 23:34:32,United States,Austin,GET /images/favicon-32x32.png HTTP/1.1
  411. 2022-06-27 23:34:32,United States,Austin,GET /XXX HTTP/1.1
  412. 2022-06-27 23:34:37,United States,Austin,GET /images/favicon-32x32.png HTTP/1.1
  413. 2022-06-27 23:34:59,United States,Austin,None
  414. 2022-06-27 23:35:02,Germany,Unknown: 51.299301, 9.490900,GET /git/ HTTP/1.1
  415. 2022-06-27 23:35:04,United States,Austin,None
  416. ```
  417. ## Usage
  418. ```
  419. usage: httpd-logparser [-h] [-fr [FILES_REGEX]] [-f [FILES_LIST]] [-c CODES [CODES ...]] [-cf [COUNTRIES]] [-tf [TIME_FORMAT]] [-if [INCL_FIELDS]]
  420. [-ef [EXCL_FIELDS]] [-gl] [-ge [GEOTOOL_EXEC]] [-gd [GEO_DATABASE_LOCATION]] [-dl [DATE_LOWER]] [-du [DATE_UPPER]] [-sb [SORTBY_FIELD]]
  421. [-ro] [-st] [-p] [--httpd-conf-file] [--httpd-log-nickname] [-lf LOG_FORMAT] [-ph] [--output-format {table,csv}]
  422. [--head [READ_FIRST_LINES_NUM]] [--tail [READ_LAST_LINES_NUM]] [--sort-logs-by {date,size,name}] [--verbose]
  423. Apache HTTPD server log parser
  424. optional arguments:
  425. -h, --help show this help message and exit
  426. -fr [FILES_REGEX], --files-regex [FILES_REGEX]
  427. Apache log files matching input regular expression. (default: None)
  428. -f [FILES_LIST], --files-list [FILES_LIST]
  429. Apache log files. Regular expressions supported. (default: None)
  430. -c CODES [CODES ...], --status-codes CODES [CODES ...]
  431. Print only these numerical status codes. Regular expressions supported. (default: None)
  432. -cf [COUNTRIES], --countries [COUNTRIES]
  433. Include only these countries. Negative match (exclude): "\!Country" (default: None)
  434. -tf [TIME_FORMAT], --time-format [TIME_FORMAT]
  435. Output time format. (default: %d-%m-%Y %H:%M:%S)
  436. -if [INCL_FIELDS], --included-fields [INCL_FIELDS]
  437. Included fields. All fields: all, log_file_name, http_status, remote_host, country, city, time, time_diff, user_agent, http_request
  438. (default: http_status,remote_host,time,time_diff,user_agent,http_request)
  439. -ef [EXCL_FIELDS], --excluded-fields [EXCL_FIELDS]
  440. Excluded fields. (default: None)
  441. -gl, --geo-location Check origin countries with external "geoiplookup" tool. NOTE: Automatically includes "country" and "city" fields. (default: False)
  442. -ge [GEOTOOL_EXEC], --geotool-exec [GEOTOOL_EXEC]
  443. "geoiplookup" tool executable found in PATH. (default: geoiplookup)
  444. -gd [GEO_DATABASE_LOCATION], --geo-database-dir [GEO_DATABASE_LOCATION]
  445. Database file directory for "geoiplookup" tool. (default: /usr/share/GeoIP/)
  446. -dl [DATE_LOWER], --day-lower [DATE_LOWER]
  447. Do not check log entries older than this day. Day syntax: 31-12-2020 (default: None)
  448. -du [DATE_UPPER], --day-upper [DATE_UPPER]
  449. Do not check log entries newer than this day. Day syntax: 31-12-2020 (default: None)
  450. -sb [SORTBY_FIELD], --sort-by [SORTBY_FIELD]
  451. Sort by an output field. (default: None)
  452. -ro, --reverse Sort in reverse order. (default: False)
  453. -st, --show-stats Show short statistics at the end. (default: False)
  454. -p, --show-progress Show progress information. (default: False)
  455. --httpd-conf-file Apache HTTPD configuration file with LogFormat directive. (default: /etc/httpd/conf/httpd.conf)
  456. --httpd-log-nickname LogFormat directive nickname (default: combinedio)
  457. -lf LOG_FORMAT, --log-format LOG_FORMAT
  458. Log format, manually defined. (default: None)
  459. -ph, --print-headers Print column headers. (default: False)
  460. --output-format {table,csv}
  461. Output format for results. (default: table)
  462. --head [READ_FIRST_LINES_NUM]
  463. Read first N lines from all log entries. (default: None)
  464. --tail [READ_LAST_LINES_NUM]
  465. Read last N lines from all log entries. (default: None)
  466. --sort-logs-by {date,size,name}
  467. Sorting order for input log files. (default: name)
  468. --verbose Verbose output. (default: False)
  469. ```
  470. ## License
  471. GPLv3.