As a programmer, most of us would have been taught to write comments, externalize constants, indent code and of course log the flow of control. Credit goes to OpenSource for having provided such wonderful frameworks like Log4j to enable logging.

The framework authors did foresee the problem of growing log files and were smart to provide means to control the level of logging. While, this does help a lot to categorize log entries, it doesnot solve the most commonly occurring problem for a developer:

That of “Spotting the log entry” that is most relevant to the situation.

Let me explain. Picture this: a large application has been deployed in production for a while. The log level has been set to INFO because the application support team argues that ERROR doesnot give them enough detail for trouble shooting. In the application code, a few developers have gotten carried away and have logged entry and exit of each method call (re-inventing the “around” advice in AOP if you may call it that way). The result – log files that have rolled many times over and totalling to around 100 MB across just as many files.

These files are not always easily available to the developer debugging a problem that has been reported in production. The developer needs to quickly find the log entry that helps to diagnoze the issue and here the “Spot the log entry” contest begins.

Obstacles in this course are(and not limited to) :

  • Having to sift through potentially hundreds of thousands of log entries
  • Log files located on remote machines accessible via protocols like SSH/SFTP.
  • Noise in log files – entries that log entry and exit from methods
  • Absence of tools to analyze the structured data contained in log entries. Text editors are an often used, but poor choice.
  • Inability to analyze contiguous log events. For e.g log entries made from one thread donot appear together in a log file in medium to heavy use systems.

There are a few ways in which this problem can be addressed:

  • Avoid AOP like log statements in code. Use AOP at run time to instrument byte-code on the fly if required.
  • Clear logging strategy
    • Log errors when encountered the first time and not when it is handled each time, say as it progresses up the call stack.
    • Use of appropriate log levels to differentiate between debug information, messages, warnings and errors.
    • Use of log patterns that provide sufficient information to analyze the flow. Logging timestamp, thread, category and priority for e.g

What if your log files are still huge after all this? Its time to invest in tools that help you spot your log entry.

Some of us at MindTree(http://www.mindtree.com) looked around for OpenSource tools for log analysis when we had to inspect logs from aorund a dozen servers. Chainsaw (http://logging.apache.org/log4j/docs/chainsaw.html) was a decent implementation but not good enough. Commercial tools were not satisfactory either.

Thats when we decided to implement Insight – an application analysis tool.

Insight

To start with, it was conceived to do comprehensive log analysis. In brief it provides the following:

  1. Provide visual analysis of any pattern based log files
  2. Analyze logs from remote servers over (S)Ftp and Http.
  3. Supports tailing of local files and a plug-in for Eclipse
  4. Provides summary and detailed view of the log event
  5. Supports “no-mutating” analysis of the data set – such as search, sort.
  6. Supports “mutating” analysis of data set – via progressive filtering
  7. Helps to locate the “context” of an event i.e snap shot of log entries around a specific log entry.
  8. Optimized for performance and footprint size
    1. Loads 1000 entries in around 375 ms
    2. VM size between 45 to 60MB even after loading 110 000 entries

See attached presentation for details on Insight and testimonials : Insight features

Our developers are now front runners in the “Spot the log entry” contest ūüôā

———————————————————————————–

MindTree Insight is now an OpenSource project on SourceForge and is available at :
http://sourceforge.net/projects/mindtreeinsight

The download of the latest release is available at:

http://sourceforge.net/project/showfiles.php?group_id=212019&package_id=254922
——————————————————————————–

Thanks to the likes of Google, the world today sees application scalability in a new light Рthat of not being only dependent on Symmetric Multi-Processor (SMP) boxes.  What probably doesnot come out clearly from such implementations is optimization of the available CPU power.

I read somewhere that only a portion of the world’s available processing power is actually used. On the other hand, dont many of us worry about applications being slow? One area that is being addressed to improve performance in regular J2EE applications is that of data access through application strategies(partitioning of data, lazy load e.t.c) and technologies (caches and data grids) , often provided by the vendor themselves.

The ubiquitous  nature of Http and its applications has created its own patterns of application design. The positive ones being:

  • Stateless behaviour of applications
  • Tiers in the application
  • Application security and identity management

On the other hand, it has also led to stereo-typing of applications. Iam yet to see significant number of¬† applications that deviate from the standard J2EE patterns : MVC–>Service Locator–> Facade –> DAO. It has constrained us in some ways:

  • ¬†The flow from web to the database and back is one thread of activity
  • Patterns donot let us think otherwise
  • Platform specifications are considered sacred – not spawning new threads inside an EJB container for e.g

In physical deployments, we consider the job done, by having, say a hardware load-balancer in place to seemingly “load balance” requests between servers. Its not often that load balancing happens by nature of work that needs to be done in order to service a request. It often is a simple IP level round-robin or at best a weighted one based on CPUs on a server.

This leads to the question : Is scalability a factor of number of machines/CPUs?

It appears so unless we think differently. To prove the point, in a IP level load-balanced setup, once a request is assigned to a server, the burden is solely on the machine servicing the request and thereby processing the entire thread of execution for a period of time when other servers could have unused processing capability.

There are ways to address this issue and ensure high CPU utilization before deciding that scalability is a factor of number of machines/CPU:

  1. Co-locating applications : different applications have varied peak loads. Co-locating applications on a shared setup(software i.e framework, hardware)  ensures overall better scalability and availability. [I have worked on an engagement where we have 6 applications co-deployed in production on just 2 blade servers]
  2. Leveraging multi-threading capabilities of the JVM. Now but isnt that against the specifications? Actually no, if you use the features of the JVM to multi-thread say Message Driven Bean(MDB) for e.g

There are some fundamental changes to the way we design applications in order to make the second point(multi-threading) a reality.

Lets take an example: A sequence of activity in a regular J2EE application to generate invoices would involve:

  1.  Validating the incoming data
  2. Grouping  request data Рby article, customer, country e.t.c.
  3. Retrieving master and transactional data from the RDBMS
  4. Calculating the invoice amount – tax, other computations
  5. Generating the final artifact – XML, PDF e.t.c.

In most designs, 1. to 5. happen via components implemented in one or other  of the stereotyped J2EE tiers and the execution is therefore serial in nature.

What if we implemented a few of the above steps using the Command pattern i.e the component takes a well defined request and produces a well defined response using the request data provided?

This component may then be remoted: as SOA services or just plain remotely invocable objects, say stateless EJBs.

Go on and now implement a request processor that breaks up a request to multiple discreet units of work. Each unit of work is then a message – request and a response. The messages may then be distributed across machines using a suitable channel – a JMS queue with MDBs listening to it. The interesting thing happens here : the container would spawn MDBs depending on the number of messages in the queue and other resource availability factors, thereby providing the multi-threaded execution.¬† The MDBs themselves may be deployed across machines to “truly” spread the load across available machines.

I therefore believe that scalability in a well designed system is a factor of number of threads that can be efficiently executed in parallel and not just on the  number of machines/CPUs.