Ponderous Programmer: November 2010

Wednesday, November 24, 2010

Volkswagon - An unavoidable rant

Long Time Owner
Let me first lay the groundwork for this post. I have been a very long time VW owner and enthusiast, from as far back as when I was thinking about my getting my own drivers license. Before that my family owned VWs as well, my father had a brown gasoline rabbit (1972 I think) and my mother owned a silver diesel rabbit (a 1979 I believe). I owned a number of odd cars before also getting my first VW, which was a 1997 Passat 1.9 TDI Sedan. In between my other cars and finally getting the 1997 TDI, I purchased a diesel rabbit for my wife which eventually was replaced by a 1994 Jetta, which was then replaced with a white roll top Cabrio. As the Cabrio declined, her and I switched cars for a little, while my wife bought her first brand new car purchasing a new green metallic Cabrio with a tan convertible top.

To say that I like my VWs is a bit of understatement. Now to be sure, I am NOT as hard-core as some, but I have been very loyal to the VW brand. The cars are generally fun to drive, easy to repair and maintain for the average gear head and at least for all the VWs I owned relatively cheap to own and maintain, that is until my purchase of a 2003 Passat gasoline 1.8T Wagon.

Track Record
In fairness before I get into the rant proper, all the VWs I have had have had some trouble with specific items. The diesel rabbit had the 'rabbit leak' over the fuse panel giving it all sorts of interesting issues including a propensity to short and run the battery down while it was 'sitting' for multi-day periods. The same problem was prevalent in the white Cabrio that my wife had for a while. Despite those problems the cars continued to operate and run and the fixes for any 'major' problems either of those cars ever had averaged a MAX of $500. We had a belt wear out on the 1994 Jetta, never had any trouble with the green Cabrio right up to trading it in on the 2003 Passat. The 1997 TDI that I purchased had relay109 fail, has gone through a number of window regulators and other than normal maintenance has had few if any real problems. My TDI currently has 215,000 miles on it and recently got the infusion of the largest amount of TLC that it has ever gotten when I spent $3200+ to do the following:

Belts + Pulleys
Suspension on ALL 4 corners

Struts, Springs, etc.

Water Pump
Window Regulator on the Drivers Side
Add a CCV Filter (Mann Provent)
Remove and Clean the Intake Manifold
Remove and clean the Intercooler and Piping
Replace the Entire Exhaust from the Turbo Back to the tail (The OEM had essentially fallen OFF the car)
New Clutch
Starter Cleaning/Rebuild
Water Thermostat

Some of this work I did myself while some I had done at my local repair shop. In total I may have put $5000-$6000 total into the TDI since I purchased it new in 1997. I CANNOT say the same about the 2003 Passat Wagon, and hence this rant.

The reason I will NEVER own another VW Automobile
The 2003 Passat Wagon 1.8T is 100% the reason that I will never own another VW in my life. This car has had so many problems that it makes me slightly sick:

Fuel pump
Fuel level sender/indicator
Water Thermostat
Radio Antenna Shorting/Rusting
Moon roof 'Flooding'
Ignition/Coil Pack Failure
A/C Bearing Failure
OIL SLUDGE (x2)

and - recall after recall after recall. In fact it seems like every time we got a recall notice on a part that part would then fail forcing us to tow it, due to failure while driving, and get the recall item replaced. This car has stranded my wife in intersections and other situations too numerous for me to count. Even with all these issues it is the last one in the list that is the worst of them all because it appears to be the one thing that VW wanted to 'hide' after it started to occur on ALL of the 1.8T engines, an engine oil sludging issue of class action lawsuit proportions. Lets look at the situation from my point of view; the 1.8T engine seems to have the following issues:

An oil sump that is too small
A quantity of oil that is also too small to dissipate the engine heat appropriately
An originally specified oil filter that is too small to cope with oil sludge that occurs because of 1 and 2
A turbo that can get hot enough to glow with a slight cherry color
Insufficient cooling of the turbo means that engine oil comes in contact with VERY high temps compounding oil sludge issues
Using 'conventional' oil compounds these issues
Oil screen and uptakes that are too small

So - at first the documentation for the car did not indicate anything special about the oil to be used, just normal oil with a specified weight. Then an update addendum from VW indicating that only 5w-40 FULL SYNTHETIC oil should be used. Shortly after that update was received - the oil pressure light in the car came on despite there being plenty of oil in the car. On taking the car to VW - they indicated that the engine was sludged and the repair would be $1800. They said that they would cover the cost if I were able to produce every oil change slip I ever had to prove that I had done what was asked of the dealer and VWNoA (VW North America). Of course I don't have every slip for every oil change, and of course I did not take that car back to the dealer every time for its oil change - because that is expensive and insane. So $1800 dollars later, we got the car back. Fast-forward to today - and the car has the sludge problem AGAIN. This car has been problem after problem after problem that has essentially left us wondering WHEN it will fail next, not IF it will fail next. This engine was poorly designed from the start giving it a propensity to sludge and fail sometimes catastrophically.

Post after Post after Post after Post after Post of people with the same oil sludge problem and VWNoA essentially turning a blind eye to the problem. The engine should have just been recalled en-mass because it was prone to this issue.

No car I have EVER OWNED has had this level of problem with oil sludging. Not my TDI, not any of the rabbits, cabrios, jettas that I have owned have ever had a problem like this, even with some serious abuse of the change intervals - NOT A SINGLE SLUDGE PROBLEM. Change with conventional oil, change with synthetic oil, change with some mix of both in the same engine... not a single problem. The 1.8T seems to MAKE sludge due to its design. It would have been interesting if VW had made some improvements/revamps to some engine parts to attempt to address this issue. Instead VW redesigned the whole thing in a beefier 2.0T engine and did away with the 1.8T engine all together.

Quality Counts
I have never felt so abused/betrayed by a car in my life. Every time I turned around there was another problem leaving my wife stranded in the middle of nowhere. If it wasn't the oil sludge costing me money it was the fuel pump (eventually recalled) or the ignition coils (also recalled) or water in the cabin from the moon roof drains being clogged. This by far has been the worst VW I have ever owned and despite a long history of VW ownership - this car has caused both my wife and myself to turn away in disgust from VW in general. VW's inability to take responsibility for poor design and parts choices is inexcusable in my mind and I cannot take the chance of getting another car like this one. Sorry VW - with this latest round of engine oil sludge in my 1.8T (despite religious, possibly even anal retentive oil changes) you have just lost a long time customer (from childhood till now) of your cars.

Tuesday, November 23, 2010

JRugged - Making your code more RUGGED

Cowboy coding
We have all done some cowboy coding at some point in our life. I think we can even recognize when we may be asked to perform our cowboy coding. The scenario is something like:

Your boss or your boss's boss comes down and says "... we have to have this new thing - and we have to have it by this deadline or the sky will fall and we will all die..."

It is at this point that as a developer you begin to run through how you might develop what is being asked for and work through in your head exactly how you will tackle the problem at hand. It is also usually right at this point that decisions about what short cuts need to be taken in order to get the job complete on time get made. We, as developers, short cut things like:

unit-testing (because we tend to do it after development, making it feel superfluous)
monitoring/data collecting about the system we are developing
resilience to failure

These last two items, monitoring and resilience to failure, are the focus of this post.

Out of the box monitoring
When I mention monitoring what is the first thing that comes to mind? Is the machine my software is deployed on running? Does it have connectivity? What is the program’s memory footprint? How much free memory or disk is available? While all of these are important base questions to know the answers to - they do very little to help you understand your running, deployed software. To understand your running software you need to be able to answer questions like:

How many requests per second has my software performed in the past minute, hour, day?
How often did my software fail in the past minute, hour, day?
How often did my software succeed in fulfilling a request in the past minute, hour, day?
What was the latency for the calls that were made into my software in the past minute, hour, day?

Fault Tolerance
How often is it that case that the software you build has to call an outside resource? Maybe your software needs to make an API call to another system to get some information/data or maybe your system has to integrate with a remote system in a specific way; how do you insulate yourself from that other system's failures? How do you go about keeping the system you develop responsive and allowing it to fail quickly and respond back to the end users in a gracefully degraded way?

I believe that making a software system that "gracefully and quickly" fails when an outside resource is not available is usually accomplished by introducing something like timeouts, retries or other systematic back-off mechanisms. Adding timeouts or back-off can be problematic and can cause additional unforeseen issues like threads that hang or thread counts that run out of control. What we really would like to do is detect errors and if the error rate is high enough turn off ALL remote calls to that resource to save the user and the system from the cost of having to 'wait' for timeouts to occur.

Enter a Java project that helps you move beyond out of the box monitoring and timeout-based fault tolerance with ease: JRugged.

JRugged
JRugged provides straightforward add-ons to existing code to make it more tolerant of failures and easier to manage/monitor. In other words, it makes your Java code more rugged!

The purpose of the project is to help answer the questions we posed above in a straightforward and easy to understand way. By answering questions like how many requests per second am I processing currently, JRugged makes it dead simple for any project to be able to gather and understand the metrics for their running systems as well as assisting in making those production systems as resilient to failure as possible.

For collecting performance statistics we have a PerformanceMonitor object that provides the following output:

RequestCount: 26
AverageSuccessLatencyLastMinute: 974.3247446446182
AverageSuccessLatencyLastHour: 1051.3236248591827
AverageSuccessLatencyLastDay: 1052.9298656896194
AverageFailureLatencyLastMinute: 0.0
AverageFailureLatencyLastHour: 0.0
AverageFailureLatencyLastDay: 0.0
TotalRequestsPerSecondLastMinute: 0.34042328314561054
SuccessRequestsPerSecondLastMinute: 0.34042328314561054
FailureRequestsPerSecondLastMinute: 0.0
TotalRequestsPerSecondLastHour: 0.006920235124926995
SuccessRequestsPerSecondLastHour: 0.006920235124926995
FailureRequestsPerSecondLastHour: 0.0
TotalRequestsPerSecondLastDay: 2.893097268271628E-4
SuccessRequestsPerSecondLastDay: 2.893097268271628E-4
FailureRequestsPerSecondLastDay: 0.0
TotalRequestsPerSecondLifetime: 0.6241298190023524
SuccessRequestsPerSecondLifetime: 0.6241298190023524
FailureRequestsPerSecondLifetime: 0.0
SuccessCount: 26
FailureCount: 0

For adding fault tolerance and resilience we have CircuitBreakers. CircuitBreakers provide the following characteristics:

LastTripTime: 0
TripCount: 0
ByPassState: false
ResetMillis: 10000
HealthCheck: GREEN
Status: UP
FailureInterpreter: org.fishwife.jrugged.DefaultFailureInterpreter@39ce9085
ExceptionMapper: null

There are three modules in the project:

jrugged-core
Contained in jrugged-core are building block classes that can be used by independently to build out functionality within your application. Most of the items in jrugged-core utilize a simple decorator pattern allowing them to be easily inserted into your existing Java projects with little or no impact.

For example, if a developer wanted to wrap a performance monitor around a section of code for a backend call it might look like the following:

public BackEndData processArgument(final String myArg) { 
     final BackEndService theBackend = backend;
     public BackEndData call() throws Exception {  
          return perfMonitor.invoke(new Callable() {    
               public BackEndData call() throws Exception {
                    return theBackend.processArgument(myArg); 
               }
          });
     }
}

If you were then interested in the number of requests per second made to that back-end you could interrogate the 'perfMonitor' object to find out.

jrugged-spring
The jrugged-spring library is built upon the classes and items exposed in jrugged-core to provide an easy Spring integration path. jrugged-spring provides Spring interceptors that can be utilized in conjunction with a Spring proxy to 'wrap' methods in classes based on regular Spring configuration files. If you are currently using Spring, this is the way to go, as there is no change to your existing code needed. Just add needed lines to the Spring config and you automatically have the performance information gathered into an object that can be exposed easily with JMX.

jrugged-aspects
Similar to the jrugged-spring library, the aspects library provides the user with handy annotations that can be used on methods to wrap the performance monitoring or circuitbreaker classes around the target method. Getting statistics or incorporating graceful and quick failures becomes simply a matter of adding the appropriate annotations and assigning a name to get it to start collecting information or providing the fault tolerance of a circuit breaker.

What is the take away?
Having lots of information about how our software runs and handles failures is important to the business and we should already be building mechanisms into our developed code to provide them; the problem is that there is rarely time to do so. JRugged makes adding these critical components to your software so easy that it would be criminal not to add them. Please go and check out the project at http://code.google.com/p/jrugged/; we are always looking for comments and enhancements on how this works out for you and suggestions for future enhancements.

Monday, November 22, 2010

all about being "Good enough", not the best

To be honest, I do not believe that people will move from an iPhone to an Android handset because it happens to have cool or 'must have' applications; I believe that people will move from one to the other because the Android platform is 'good enough', has enough of the same features, is cheaper than the iPhone. I don't buy into Steve Job's idea that quantity of quality applications is what matters - the statement pertaining to application quantity or quality is marketing fluff that works only when the sheeple take it in fully and lasts only fleetingly before the American populace is onto the next thing, keeping up with the Jones'.

I draw similarities between the marketing done for the iPhone and its applications to Rosie the Riveter. When the war was over and all the women who had been working to make the machines of war for their men who were off fighting were being told to go back to the house and the kitchen so that the returning men could have jobs, common marketing and media helped to reinforce the idea that doing so would be good allaround. You saw shows like Leave it to Beaver and other media outlets depicting the perfect house wives in high heels, cooking and cleaning. Eventually that pervasive depiction of the perfect house wife became a main steam understanding again and womens place was the house and home, the kitchen, doing house work in heels and having dinner on the table when their 'Man' came home.

People don't specifically need killer android applications in order to choose the Android Platform. They need something that is 'good enough' or 'similar enough' at a reasonable price. Given a choice people might choose the iPhone to have the same thing as a friend driven by pressure from their friend to purchase and pay the premiums associated - I believe what will eventually happen is that the pressure applied will dissipate, the need to keep up with the Jones' will dissipate and 'good enough' will do for most people. I do want Android to be a great platform, but honestly I think having something 'good enough' that allows me to do as I please with the device is much more appealing - voted with my dollars and purchased an Android phone when I could have gotten the iPhone 4.

Sunday, November 21, 2010

Feeling like a hero should be your sign, something is wrong

I was reading the following post on "Work Around Cultures" and got to thinking how many places I have worked or listened to friends of mine talk about their jobs and how all but one of the places I have worked and a fair number of the places that my friends work fall into the category of "Work Around Cultures". Now to be fair - the post is about Medical work places, but I believe it to be just as applicable to the software companies that I have worked for in my career.

The article points out that there are several consequences to what the author calls "Patch-It" work arounds.

1) Increasing medical error
Work-arounds lead to interruptions, which in hospitals are associated with errors and accidents. They increase the cumulative workload for nurses; higher workloads are associated with worse patient outcomes.

The same sort of issue can be found in IT departments and operations groups where work-arounds lead to interruptions and errors due to the work-around not being well documented or understood. Consider the following: a work around is done by one person, the next person that works on that system has to remember that a work-around was done or know who to go talk to about the work that had been previously done in order to work with that system. This can lead to errors and mistakes. Granted - the mistakes made are not generally life and death but can be costly nonetheless.

2) Wasting resources.
Individually, work-arounds don’t appear to waste much time, but studies have shown that required hunting and fetching eat up as much as 36 to 60 minutes per 7.5-hour shift.

As you can well imagine from #1, when you need to make a change to a system or need to make an operational change having to find an individual who made a work-around can be costly in terms of the time required to go track down the needed information. It would be my speculation that the amount of time wasted would be very similar to the numbers indicated in the article if not more. Having to deal with the work arounds will cause people a great deal of hassle trying to figure out what the work around was meant to do and who put the work around in place in the first place.

3) Promoting employee burnout.
Persistently lacking resources required to do one’s job takes physical and psychological tolls that lead to nurses’ burnout.

Again leading from #1 and #2, you can well imagine that people would eventually get very tired of having to go track down work arounds and the people that performed them. I can hear everyone I have worked with in the past saying something like "... I was not hired to do this, I was hired to X..."

4) Creates a work-around culture.
When work-arounds are common, people are less likely to seek system improvements. There’s an insidious aspect to the culture that causes people to dismiss notions of improving things and learn to live with imperfection.

Work arounds seem to beget work arounds similar to telling a little white lie that you then need to cover up with another little white lie or even bigger lies. I believe having to work around things in an IT culture equal to the idea of technical debt. There may be a good reason to incur debt and to perform a work around as long as you make the space and time to go and 'repair' the work around. Left in place how ever, work arounds can be a drag on the culture and lead people to believe that things will never get fixed. Worse - people my start to believe that the work arounds are EASIER than working to actually fix the underlying problems.

The article points out that having a work around culture can be compelling to people because it provides a way to get the job done. Paraphrasing and translating from the article the work arounds provide a way for the worker to 'get the job done', not involve the manager (keeping managers and works happier), and foster a hero like feeling. The work around culture however is terribly detrimental to the organization because it can translate into a compiling list of work arounds that might never be solved, possibly leading to the need for "The Big Re-Write" of the developed software.

“There’s an insidious culture of ‘That’s just the way it is around here.’”
— Anita Tucker

Managers can help alleviate this by setting a tone that indicates that they would like workers that report to them to report issues that they find. Doing so has to go hand in hand with actually doing something about the reported items but in most circles doing something about reported issues is the easy part. I personally work always to make sure I am attempting to address the root causes of problems and not just tossing a band-aid on problems. I don't like to repeat work I could have saved myself from doing it better or right the first time. What does your work culture support? If where you work would rather you patch it, work around it and ignores the root causes ... it might be time to look for a new job.

Friday, November 12, 2010

Professionalisim (Software Engineering)

Software Engineering is a discipline much maligned by a fair number of people. Actually - what bothers most people is the idea that developing software is an 'Engineering' practice because it places software developers on a level playing field with civil, chemical or materials engineers (or any engineering discipline). Engineering disciplines require certifications and testing and a great deal of knowledge that all gets boiled down to 'permits' and other legal documents that determine an individuals qualifications to be labeled an 'Engineer' none of which software ENGINEERING has currently - all you have to do to be labeled a software engineer is 'develop software', that is it - no fancy diplomas, no permit tests, no other qualifications required.

There are some indications however that this is changing from the inside out. Software engineers would like to be considered highly professional people and are talking about what it takes to have professionalism in software engineering and have it represented as a top notch profession.

1) Your code must be clean
Take this point as you will - your code must be clean, easy to read, and easily maintained. You shouldn't have overly complicated methods, unrecognizable variable names, and just plain nasty nested if, else statements. Think twice about class files that are over 1000 lines long, methods that are over 5-10 lines long... take the time to take pride in your work (like me editing this ... again).

2) Your code should always include unit tests
Unit tests do two things - one they help you to validate that what you have written is what you intended and two they provide a way for you to verify after you re-factor to get to the first goal of having clean code that you have not broken anything. Unit testing provides a very critical safety net - you are lost without.

3) You should practice
Programming like anything else requires practice to make you better and to keep skills sharp. Doing programming katas in your spare time to exercise the skills needed to have clean code and always have unit tests is of vital importance to being professional. If you were in the NFL or baseball or any other sport or musical profession you would need to practice to keep your skills up - programming is no different.

4) It pays to be multi-lingual
The basics of programming are essentially the same no matter the language that you typically program in. Because of that - lots of programming languages often look very similar but have subtle differences in their syntax or effect when they are executed. It can be beneficial to know several ways to skin a given cat as well as to understand what tools are best suited to a given job.

5) Pair Program where possible
Learning can be a two way street (and often is) so it pays to pair with someone when you are doing work - you can have one person writing failing tests that the other person makes pass. It can be eye opening and enlightening to have someone to talk with and discuss ideas with while you are working on coding out the solution to a problem. you may find your self writing far less code than you may have otherwise if you include a partner.

Is it possible to be professional without the above items? Yes - I imagine that it is, but I believe that if you want to call yourself a professional software engineer, you should be doing the items above at a minimum and including everything else as just common place.

Businesses and Money

A business that makes nothing but money is a poor business.
--Henry Ford

Ponderous Programmer