Per
day all around the world terabytes of log files are being generated
by software applications/products. If everything is going fine people
generally keep them for some specific time and then they throw or
shred the log files. If something goes wrong people look at log files
and try to know the reason of failure or error.
Most
of the time we see log files only in case of failure or errors in
softwares. Let's explore these log files from other point of view
where we can use these log files to extract some information which we
can use to reduce the errors, improve stability, improve performance
of softwares etc.
Production
issues have always been a nightmare for most of the organizations.
People do firefighting in production to fix and resolve the issues to
make the application stable and working. Whenever an error or issue
occurred in software application writes this error and description in
log files. People see the log file fix them and wait for other issue
about to happen. By exploring these log files we can extract the
information about occurrences of errors in software and try to come
up with generic solution or alerting mechanism in the application so
either we can reduce the occurrences of errors or take some
preventive actions if we get alert when they are going to happen.
There
are some situations where we get some repetitive errors which are
related to programing errors like null pointer errors. In this case
if we get the information about the error points in software then we
can rewrite these modules or program so we can reduce these such
types of programming errors in production.
There
are some situations where we need some alarm points in softwares so
if we can develop a utility for our softwares which can alert us when
this error is about to happen like Out Of Memory errors.
There
are some situations where we can collect information abut time taken
by single use cases like create customer or deposit transaction and
if time taken by actions are not up to mark we can improve the
performance.
There
are a lot of situations or errors can be avoided/reduced by taking
some preventive actions and log files can up us to do that. But How?
We
can use these log files to explore the occurrences of errors issues
using some home grown programs or available open source softwares. To
explore and extract the log files we can use the below steps.
- Collect one years (up to what ever time you can) all log files of application at single place or directory.
- Write a program in any language which is suitable for you which scans these log files one by one and generate a report where it will display number of counts for all errors.
- Now we are having data about the errors and their occurrences throughout the year.
- Identify the most occurred errors and try to analyze if we can avoid or reduce the error by improving our code, increasing vigilance or alerting.
- Some issues like connection leaks, memory leaks, null pointer exception can be avoided by doing some micro level investigation on application so we can improve our application to avoid these errors.
- There are some errors which can be avoided by alerting the user before they are about to happen like Out Of memory Errors. We can create an small application or utility which can run along with application and if memory utilization to reaching up to a specified level then it will alert to users that Out Of Memory Error can happen in next minutes. There may be another situation where application is using database connection more than a specified error and application can crash with connection not available error.
So
using the above steps where we can identify the error points and
their occurrences and improve our application to handle or avoid such
situations and reduce the chances of application unavailability.
This
is just like generating electricity from garbage.
Next
post we will explore how Apache Hadoop can help us to know the error
patterns in our log files.