Wednesday, July 1, 2009

How Do We Measure High Availability?


I would like to react to a post on Forrester blog, asking the following important question: How Do We Measure High Availability?.

These are the main issues we faced in my company:
  1. Explain and show to the business how to define Service Level. The business always wants 99.9%, 24h*24h, 7*7 support. But when they discover the price for building the infrastructure to support the SLA, and to measure it, they finally reduce the SLA to the minimum. But if the application down, they will scream, spreading emails to all VP. At that time they forgot their decision not to invest on the infrastructure. I'm still struggling with this one.
  2. Make the business SLA be executed. legal contracts with infrastructure is of particular importance. Penalties are really difficult to apply and to get any money back. So do not go to a big supplier if you are not big. Penalties will not frighten any big company in that area.
  3. Do not forget the network. It exists cheap Content Delivery Network today that will let you begin small and optimize the availability at the "Internet cloud" level.
  4. Choose between synthetic monitoring (play evry x minutes the same scenario to simulate user actions on the application) v.s. real monitoring (real data captured and dashboard created using a passive appliance on the network).
  5. Availability Should also be related to business value chains: number of clients lost or not able to access the application is highly important. If the global availability is low due to an FTP server use once a month, who cares?Availability should be mapped to hard dollars (euro).
  6. HA is difficult to set-up since it involves both development(s) and operation(s) teams. Operations will look at server/network availability and deduce the application one. But all servers can be up and the application stalled. Application availability is different from server availability. So who is resposnible for what? It creates issues for defining tools and process and reporting ...
  7. Measuring internally is different from externally ...
  8. Do you prefer on the cloud or on premise solutions? If you do not have CAPEX, move to rental solutions and OPEX.
Hope this will help.

