The “Spot the log entry” contest
February 6, 2007
As a programmer, most of us would have been taught to write comments, externalize constants, indent code and of course log the flow of control. Credit goes to OpenSource for having provided such wonderful frameworks like Log4j to enable logging.
The framework authors did foresee the problem of growing log files and were smart to provide means to control the level of logging. While, this does help a lot to categorize log entries, it doesnot solve the most commonly occurring problem for a developer:
That of “Spotting the log entry” that is most relevant to the situation.
Let me explain. Picture this: a large application has been deployed in production for a while. The log level has been set to INFO because the application support team argues that ERROR doesnot give them enough detail for trouble shooting. In the application code, a few developers have gotten carried away and have logged entry and exit of each method call (re-inventing the “around” advice in AOP if you may call it that way). The result – log files that have rolled many times over and totalling to around 100 MB across just as many files.
These files are not always easily available to the developer debugging a problem that has been reported in production. The developer needs to quickly find the log entry that helps to diagnoze the issue and here the “Spot the log entry” contest begins.
Obstacles in this course are(and not limited to) :
- Having to sift through potentially hundreds of thousands of log entries
- Log files located on remote machines accessible via protocols like SSH/SFTP.
- Noise in log files – entries that log entry and exit from methods
- Absence of tools to analyze the structured data contained in log entries. Text editors are an often used, but poor choice.
- Inability to analyze contiguous log events. For e.g log entries made from one thread donot appear together in a log file in medium to heavy use systems.
There are a few ways in which this problem can be addressed:
- Avoid AOP like log statements in code. Use AOP at run time to instrument byte-code on the fly if required.
- Clear logging strategy
- Log errors when encountered the first time and not when it is handled each time, say as it progresses up the call stack.
- Use of appropriate log levels to differentiate between debug information, messages, warnings and errors.
- Use of log patterns that provide sufficient information to analyze the flow. Logging timestamp, thread, category and priority for e.g
What if your log files are still huge after all this? Its time to invest in tools that help you spot your log entry.
Some of us at MindTree(http://www.mindtree.com) looked around for OpenSource tools for log analysis when we had to inspect logs from aorund a dozen servers. Chainsaw (http://logging.apache.org/log4j/docs/chainsaw.html) was a decent implementation but not good enough. Commercial tools were not satisfactory either.
Thats when we decided to implement Insight – an application analysis tool.
To start with, it was conceived to do comprehensive log analysis. In brief it provides the following:
- Provide visual analysis of any pattern based log files
- Analyze logs from remote servers over (S)Ftp and Http.
- Supports tailing of local files and a plug-in for Eclipse
- Provides summary and detailed view of the log event
- Supports “no-mutating” analysis of the data set – such as search, sort.
- Supports “mutating” analysis of data set – via progressive filtering
- Helps to locate the “context” of an event i.e snap shot of log entries around a specific log entry.
- Optimized for performance and footprint size
- Loads 1000 entries in around 375 ms
- VM size between 45 to 60MB even after loading 110 000 entries
See attached presentation for details on Insight and testimonials : Insight features
Our developers are now front runners in the “Spot the log entry” contest
———————————————————————————–
MindTree Insight is now an OpenSource project on SourceForge and is available at :
http://sourceforge.net/projects/mindtreeinsight
The download of the latest release is available at:
http://sourceforge.net/project/showfiles.php?group_id=212019&package_id=254922
——————————————————————————–
Feed for the trail
October 2, 2007 at 6:44 am
did you ever finish developing it? is it available to download or did you commercialize it?
October 2, 2007 at 8:07 am
Sohel,
Yes we did complete development. Insight has seen 5 major releases till date. MindTree has a enterprise-wide licensing model for Insight. There is also talks of making it open-source. It could take a while though as the legal department is working on the license.
In the meanwhile, the commercial licensing model will continue. Let me know if you require details.
November 29, 2007 at 2:57 am
Hi.
Good design, who make it?
January 17, 2008 at 9:29 am
[...] Collegues ( idea by Regu ) created a tool they named insight last year and have just made it open source. Here is the [...]
April 1, 2008 at 8:51 am
MindTree Insight is now an OpenSource project on SourceForge and is available at :
http://sourceforge.net/projects/mindtreeinsight
The download of the latest release is available at:
http://sourceforge.net/project/showfiles.php?group_id=212019&package_id=254922
April 1, 2008 at 1:15 pm
Hi,
It seems to be a nice tool.
I’ve been having some problems when using it.
I’m setting the primary and secondary patterns, which are the same that I have configured in my log4j configuration file, and when I try to open a log file the tool would say:
“specified log pattern doesnot match data in: …”
Could you tell me what I’m doing wrong?
Nico
April 1, 2008 at 1:58 pm
Nicolas,
Its got most likely to do with your log file and the pattern you have used. Please pass on your log4j.properties file and a sample log file and I can figure out the pattern for you.
Note that the pattern is case and space sensitive and has to be an exact match.
April 1, 2008 at 2:38 pm
Hi again!
As you said, it was a problem with the pattern layout.
Thx for everything!
Nico
Buenos Aires – Argentina
April 1, 2008 at 7:49 pm
Note after looking at screenshot:
Actually log events has `Severity` not `Priority`. Althought they can be prioritized by severity, but it seems odd to call it `Priority` in the first place
April 2, 2008 at 4:35 am
Eugene,
I guess we simply borrowed what Log4j said. See
http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html
where “p” in the pattern string is defined as “Used to output the priority of the logging event.”
April 3, 2008 at 3:19 pm
Grats!
You did a useful and easy-to-use tool.
I’d like to collaborate, if that is posible.
berserkpi@gmail.com
April 4, 2008 at 4:43 am
Alejandro,
Glad to know you liked MindTree Insight. You are of course welcome to participate!
To start, please take a look at the bugs and feature requests on the project site and let us know if you would like to take up any of those.
April 4, 2008 at 2:27 pm
OK
I’ll do that.
Thx.
September 16, 2008 at 1:06 pm
Ive downloaded your application hoping I could use RegEx to match it to my cutom log file. It’s based on a log4j format. I’ve looked at your preference.xml and I think I can’t customize it for my needs. Any chance you could lead me to some docs I could use to adapt your tool to read my logfiles?
Thanks!
September 17, 2008 at 3:22 am
Aken,
We have exported and viewed Windows events using Insight. This goes to say that you can view any pattern based log file so long as the fields in the pattern are ones supported by log4j and Insight. Of course, you may have some issues with priorities other than debug,info,warn,error and fatal.
We unfortunately don’t have documentation on the entire design. The classes are well documented though. You may want to look at the following:
Log4JPatternInterpeter – Creates a ReceiverFormat object from the specified pattern. You may want to customize this class to recognize RegEx patterns.
LogInterpreter – Parses and creates the LogEvent instances from the log files. Internally uses multiple Apache Oro Regex pattern matching classes.
The LogEvent is the data object recognized by rest of Insight and contains all the information pertaining to a n event logged by the logging framework.
I hope this helps.