Tuesday, November 23, 2010

JRugged - Making your code more RUGGED

Cowboy coding
We have all done some cowboy coding at some point in our life.  I think we can even recognize when we may be asked to perform our cowboy coding.  The scenario is something like:

Your boss or your boss's boss comes down and says "... we have to have this new thing - and we have to have it by this deadline or the sky will fall and we will all die..."

It is at this point that as a developer you begin to run through how you might develop what is being asked for and work through in your head exactly how you will tackle the problem at hand.  It is also usually right at this point that decisions about what short cuts need to be taken in order to get the job complete on time get made.  We, as developers, short cut things like:
  • unit-testing (because we tend to do it after development, making it feel superfluous)
  • monitoring/data collecting about the system we are developing
  • resilience to failure
These last two items, monitoring and resilience to failure,  are the focus of this post.

Out of the box monitoring
When I mention monitoring what is the first thing that comes to mind?  Is the machine my software is deployed on running?  Does it have connectivity?  What is the program’s memory footprint?  How much free memory or disk is available?  While all of these are important base questions to know the answers to - they do very little to help you understand your running, deployed software.  To understand your running software you need to be able to answer questions like:
  • How many requests per second has my software performed in the past minute, hour, day?
  • How often did my software fail in the past minute, hour, day?
  • How often did my software succeed in fulfilling a request in the past minute, hour, day?
  • What was the latency for the calls that were made into my software in the past minute, hour, day?
Fault Tolerance
How often is it that case that the software you build has to call an outside resource?  Maybe your software needs to make an API call to another system to get some information/data or maybe your system has to integrate with a remote system in a specific way; how do you insulate yourself from that other system's failures?  How do you go about keeping the system you develop responsive and allowing it to fail quickly and respond back to the end users in a gracefully degraded way?

I believe that making a software system that "gracefully and quickly" fails when an outside resource is not available is usually accomplished by introducing something like timeouts, retries or other systematic back-off mechanisms.  Adding timeouts or back-off can be problematic and can cause additional unforeseen issues like threads that hang or thread counts that run out of control.  What we really would like to do is detect errors and if the error rate is high enough turn off ALL remote calls to that resource to save the user and the system from the cost of having to 'wait' for timeouts to occur.

Enter a Java project that helps you move beyond out of the box monitoring and timeout-based fault tolerance with ease: JRugged.

JRugged
JRugged provides straightforward add-ons to existing code to make it more tolerant of failures and easier to manage/monitor. In other words, it makes your Java code more rugged!

The purpose of the project is to help answer the questions we posed above in a straightforward and easy to understand way.  By answering questions like how many requests per second am I processing currently, JRugged makes it dead simple for any project to be able to gather and understand the metrics for their running systems as well as assisting in making those production systems as resilient to failure as possible.

For collecting performance statistics we have a PerformanceMonitor object that provides the following output:

RequestCount: 26
AverageSuccessLatencyLastMinute: 974.3247446446182
AverageSuccessLatencyLastHour: 1051.3236248591827
AverageSuccessLatencyLastDay: 1052.9298656896194
AverageFailureLatencyLastMinute: 0.0
AverageFailureLatencyLastHour: 0.0
AverageFailureLatencyLastDay: 0.0
TotalRequestsPerSecondLastMinute: 0.34042328314561054
SuccessRequestsPerSecondLastMinute: 0.34042328314561054
FailureRequestsPerSecondLastMinute: 0.0
TotalRequestsPerSecondLastHour: 0.006920235124926995
SuccessRequestsPerSecondLastHour: 0.006920235124926995
FailureRequestsPerSecondLastHour: 0.0
TotalRequestsPerSecondLastDay: 2.893097268271628E-4
SuccessRequestsPerSecondLastDay: 2.893097268271628E-4
FailureRequestsPerSecondLastDay: 0.0
TotalRequestsPerSecondLifetime: 0.6241298190023524
SuccessRequestsPerSecondLifetime: 0.6241298190023524
FailureRequestsPerSecondLifetime: 0.0
SuccessCount: 26
FailureCount: 0

For adding fault tolerance and resilience we have CircuitBreakers.  CircuitBreakers provide the following characteristics:
LastTripTime: 0
TripCount: 0
ByPassState: false
ResetMillis: 10000
HealthCheck: GREEN
Status: UP
FailureInterpreter: org.fishwife.jrugged.DefaultFailureInterpreter@39ce9085
ExceptionMapper: null

There are three modules in the project:

jrugged-core
Contained in jrugged-core are building block classes that can be used by independently to build out functionality within your application.  Most of the items in jrugged-core utilize a simple decorator pattern allowing them to be easily inserted into your existing Java projects with little or no impact.

For example, if a developer wanted to wrap a performance monitor around a section of code for a backend call it might look like the following:

public BackEndData processArgument(final String myArg) { 
     final BackEndService theBackend = backend;
     public BackEndData call() throws Exception {  
          return perfMonitor.invoke(new Callable() {    
               public BackEndData call() throws Exception {
                    return theBackend.processArgument(myArg); 
               }
          });
     }
}

If you were then interested in the number of requests per second made to that back-end you could interrogate the 'perfMonitor' object to find out.

jrugged-spring
The jrugged-spring library is built upon the classes and items exposed in jrugged-core to provide an easy Spring integration path.  jrugged-spring provides Spring interceptors that can be utilized in conjunction with a Spring proxy to 'wrap' methods in classes based on  regular Spring configuration files.   If you are currently using Spring, this is the way to go, as there is no change to your existing code needed.  Just add needed lines to the Spring config and you automatically have the performance information gathered into an object that can be exposed easily with JMX.

jrugged-aspects
Similar to the jrugged-spring library, the aspects library provides the user with handy annotations that can be used on methods to wrap the performance monitoring or circuitbreaker classes around the target method.  Getting statistics or incorporating graceful and quick failures becomes simply a matter of adding the appropriate annotations and assigning a name to get it to start collecting information or providing the fault tolerance of a circuit breaker.

What is the take away?
Having lots of information about how our software runs and handles failures is important to the business and we should already be building mechanisms into our developed code to provide them; the problem is that there is rarely time to do so.  JRugged makes adding these critical components to your software so easy that it would be criminal not to add them.  Please go and check out the project at http://code.google.com/p/jrugged/; we are always looking for comments and enhancements on how this works out for you and suggestions for future enhancements.

No comments: