Saturday, September 14, 2013

Don't Throw your logs -- Generate Electricity from Garbage

Per day all around the world terabytes of log files are being generated by software applications/products. If everything is going fine people generally keep them for some specific time and then they throw or shred the log files. If something goes wrong people look at log files and try to know the reason of failure or error.

Most of the time we see log files only in case of failure or errors in softwares. Let's explore these log files from other point of view where we can use these log files to extract some information which we can use to reduce the errors, improve stability, improve performance of softwares etc.

Production issues have always been a nightmare for most of the organizations. People do firefighting in production to fix and resolve the issues to make the application stable and working. Whenever an error or issue occurred in software application writes this error and description in log files. People see the log file fix them and wait for other issue about to happen. By exploring these log files we can extract the information about occurrences of errors in software and try to come up with generic solution or alerting mechanism in the application so either we can reduce the occurrences of errors or take some preventive actions if we get alert when they are going to happen.

There are some situations where we get some repetitive errors which are related to programing errors like null pointer errors. In this case if we get the information about the error points in software then we can rewrite these modules or program so we can reduce these such types of programming errors in production.

There are some situations where we need some alarm points in softwares so if we can develop a utility for our softwares which can alert us when this error is about to happen like Out Of Memory errors.

There are some situations where we can collect information abut time taken by single use cases like create customer or deposit transaction and if time taken by actions are not up to mark we can improve the performance.

There are a lot of situations or errors can be avoided/reduced by taking some preventive actions and log files can up us to do that. But How?

We can use these log files to explore the occurrences of errors issues using some home grown programs or available open source softwares. To explore and extract the log files we can use the below steps.

  1. Collect one years (up to what ever time you can) all log files of application at single place or directory.
  2. Write a program in any language which is suitable for you which scans these log files one by one and generate a report where it will display number of counts for all errors.
  3. Now we are having data about the errors and their occurrences throughout the year.
  4. Identify the most occurred errors and try to analyze if we can avoid or reduce the error by improving our code, increasing vigilance or alerting.
  5. Some issues like connection leaks, memory leaks, null pointer exception can be avoided by doing some micro level investigation on application so we can improve our application to avoid these errors.
  6. There are some errors which can be avoided by alerting the user before they are about to happen like Out Of memory Errors. We can create an small application or utility which can run along with application and if memory utilization to reaching up to a specified level then it will alert to users that Out Of Memory Error can happen in next minutes. There may be another situation where application is using database connection more than a specified error and application can crash with connection not available error.

So using the above steps where we can identify the error points and their occurrences and improve our application to handle or avoid such situations and reduce the chances of application unavailability.

This is just like generating electricity from garbage.

Next post we will explore how Apache Hadoop can help us to know the error patterns in our log files.