Whenever I’m working on my website, I always find working with the error logs from Apache, MySQL and PHP a hassle. Splunk tries to solve this problem by keeping track of the log files you have, and letting you effectively search through each entry. It works fairly well. They also claim to use Artificial Intelligence to predict threats to your website. After some digging, I found that this functionality is only available as a plugin – a very expensive one.

I decided that this would be a perfect project to undertake: a free system like Splunk that attempts to find website users with suspicious activity. Working with three other students, we built a system in Java attempting to mimic the functionality claimed by Splunk.

The system first watched for changes to a set of files chosen by the user (essentially a simple implementation of the tail command). Each time a new log entry was recorded, the system notified each of the “researchers” that were active. A researcher is simply a class that accepts log entries as strings, parses them as it chooses, and keeps tabs on possible threats to the system. When it deems that there is a threat, it notifies a process about the nature, severity, and the IP address of the threat. The system is exemplified by this dependency graph:

A diagram depicting the flow of the ThreatAnalyzer.

The process (Commander) that receives threats from researchers aggregates all threat severities for each IP address and reports to the user once a level has been reached, as determined by the user. Over time, the threat level of each IP address decreases to signify the use of the website over time as good activity, not bad. Therefore, an IP address that accesses the website over a thousand times in a few minutes would have a much higher score than an IP address that accesses the website the same number of times in a month.

The two researchers we implemented to show the system’s functionality are the FrequentAccessResearcher and the ErrorResearcher. The FrequentAccessResearcher is responsible for determining whether a user is attempting to hit website with a large number of requests. If a user makes enough accesses within a short amount of time, this researcher reports the user’s IP address to the commander. The ErrorResearcher checks whether a user’s requests result in errors. Such a user will also be reported to the commander. With this researcher, we attempt to find users who may be attempting to exploit the website by fuzzing, which frequently results in errors due to the random nature of the requests sent.