Traditional SLA Flavors
Originally (circa 1995 - 2000), "web hosting" providers simply offered space on a shared web server for individuals and businesses to upload a web site, at the time consisting of a simple set of files, for public access. These companies mostly grew out of the older public Internet Service Providers (ISPs). The SLA for this type of "shared web hosting" service was typically a detailed replica of the purchase invoice (i.e.: amount of disk space the user was authorized to use, amount of bandwidth cap, etc.) During the early web era, businesses were simply happy to get a glorified version of a marketing brochure published online; business did not come to a screeching halt if the web server was down for a while, as opposed to if the office telephones went dead.
As web sites grew in size and complexity, and evolved into full-scale business applications (circa 1998 - 2005; the evolution of Intranets and Extranets), businesses and IT organizations increasingly required full control over the Internet-connected computer systems they ran on. As an alternative to shared hosting, service providers offered leased space in the racks of their growing data centers for equipment fully owned and controlled by business customers. Essentially, this type of "co-location" service package consisted simply of delivering reliable electricity, air-conditioning, Internet bandwidth, and physical space for the computer equipment owned or leased by the business (dedicated, not shared with other customers). Fortunately, co-location services mainly provide resources that are easy to measure and quantify (i.e.: electricity and bandwidth use VERSUS hosted software use). This allowed for more sophisticated and competitive billing arrangements to become popular (i.e.: the introduction of "Burstable" Internet bandwidth packages; primarily an accounting term describing dynamic billing for Internet use based on a running average time window, such as every two months). But these co-located systems were different beasts from the simple marketing web site brochure; these servers now hosted web sites and applications accessed regularly by an increasing amount of businesses' coveted customers, performed more and more tasks critical to business operations, and were generating real revenue through e-commerce. So, inevitably following these advances came negotiated warrantees, or at least resolution commitments, for those unfortunate times when customers were denied the online service(s). The SLA paperwork dramatically grew in size.
Limiting Liability Before Guaranteeing Service...
Naturally, SLAs were first written to protect the service provider from any damages claimed by a customer (liability). After all, the Internet is an unreliable, uncontrollable medium on which to deliver business services, and managing a large data center stacked full of computer systems is complicated and risky to make any guarantees, much less warrantees, over service continuity to customers. Customers are always free to go elsewhere if they are not satisfied. In fact, there was not much of a difference between an early business SLA and a "Terms of Service" statement made to individual users of public-wide services like AOL or free services like Google's GMail:
Example of Google GMail's "Limitation of Liability" Statement (Free version):
15.1 SUBJECT TO OVERALL PROVISION IN PARAGRAPH 14.1 ABOVE, YOU EXPRESSLY UNDERSTAND AND AGREE THAT GOOGLE, ITS SUBSIDIARIES AND AFFILIATES, AND ITS LICENSORS SHALL NOT BE LIABLE TO YOU FOR:But, as the business web hosting market grew, and providers got better at managing and building redundant, highly-available infrastructures; competitive "service guarantee" features tailored for business users started to creep into SLAs, such as credits for usage corresponding to unplanned down-time over a specific threshold, guaranteed response time to outages, etc. However, as business use of computer systems hosted by these 3rd party providers exploded after 2004, the stress of this rapid growth and inability to scale-up systems fast enough started to exercise the section of SLAs providers hoped never to have to visit; "...in the event of service disruption." These service outage situations often led to a 'cat-and-mouse game' between providers and users over SLA clauses; from outages being categorized as planned or expected, to response time being emphatically decoupled from resolution time, to custom tailoring of the definition of down-time itself. And keep in mind, I am only referring to computer hardware and, at most, the operating system software only; not the many additional layers of application software heaped on top of all computer systems delivering online business service(s). In many cases, a hosting provider need only guarantee that some platform is running and "ping-able" from their local network, not the Internet. And, as many business users know, this is a far cry from an online business service actually being fully functioning and available for use.
(A) ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL CONSEQUENTIAL OR EXEMPLARY DAMAGES WHICH MAY BE INCURRED BY YOU, HOWEVER CAUSED AND UNDER ANY THEORY OF LIABILITY.. THIS SHALL INCLUDE, BUT NOT BE LIMITED TO, ANY LOSS OF PROFIT (WHETHER INCURRED DIRECTLY OR INDIRECTLY), ANY LOSS OF GOODWILL OR BUSINESS REPUTATION, ANY LOSS OF DATA SUFFERED, COST OF PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES, OR OTHER INTANGIBLE LOSS;
...
GoGrid.COM SLA Limiting Liability Example
"Problems related in any way to the Customer server operating system or any other software on the customer server, or to the actions of Customers or third parties, do not constitute Failures and so are not covered by this SLA."Guaranteeing Web Applications? The Known Before The Unknown...
As commercial online application use flourished, interest in securing guarantees covering this increasingly important application functionality grew. But initially, these new "Application Service Providers (ASPs)" did not want to go anywhere near SLA-style service promises for applications. And understandably so; where hardware and operating system software could be somewhat controlled and managed with steady configuration and investment, and those technologies followed a digestible set of standards; applications, on the other hand, were similar to the "Wild West." A typical online business application tied together multiple databases, files, web servers, and specialized software consisting of hundreds of thousands of lines of code completely cost-prohibitive for even the world's largest QA departments to test out every feature to every permutation a user might stumble across. Only in recent years have robust web application load testing systems been constructed, by those ASP providers that can afford them, to help predict how some applications might behave under the stress of simulated mass concurrent web users, before exposure to actual real users. And only mature, well tested applications, whose functionality is well known and modeled, have journeyed into the realm of application-level SLAs. Business users typically pay a premium for this level of service guaranty as well, but, the price can be driven down if the volume is high enough. For example, Google App's GMail Premier service offers some warranty in the form of credits (for an arguably extremely mature application; e-mail) for a reasonable annual price:
Google Apps Premier SLAKeep in mind that this software is hosted on massive "grid computing" systems, which span hundreds of thousands of computer systems hosted in many global data centers. However, extremely successful commercial and free services like Salesforce.com and Google Apps can be credited with attracting and growing business "in the cloud," despite the lack of a traditional SLA. And many businesses, especially small ones, choose to accept the risk of running some operations on free, non-SLA systems. The low cost is hard to resist, and there is some perceived safety in the massive number of users using the system, although service outages are no less disruptive.
Microsoft Live (includes Office Live and SkyDrive) Service Agreement Warranty Section
"13. WE MAKE NO WARRANTY.But what about custom tailored applications, or even commercial applications highly configured for a specific business's needs, like online CRM and ERP sites? ...still no SLA love? The risk to the provider and the cost to mitigate that risk has historically been too great.
We provide the service 'as-is,' 'with all faults' and 'as available.'"
Breaking Applications Apart; Software As A Service (SaaS)
A relatively new approach to creating online applications that are more apt to be resilient, and possibly warranted to some extent (eventually) is to break them apart into the smaller components that make up the overall web application, and offer those components as stand-alone, reliable, measurable services by themselves. In the ideal version of this "Service Oriented Architecture (SOA)," a web application would primarily be an aggregator of separate, distributed services; each being a specialized expert in a particular business function with a stake in the successful delivery of that specific service to many users, not just that single application. SaaS has become realistic now that market demand for these "horizontal" services has grown enough to sustain their businesses, and web service technology standards have made their way into main stream development tools. Commercial application component examples such as "Databases As A Service (DaaS)", online storage grids such as Amazon's very successful S3 service, and Microsoft's new Windows Azure Platform are increasingly being used to construct new online web applications. Although some form of an SLA might be available for purchase for each one of these services separately, a business would not currently find an SLA that spanned its coverage across multiple services used to create an entire web application. Some research is being conducted in the area of creating multiple-party SLAs (SLA at SOI). However, it is apparent that SLA advances will depend on a new generation of applications and services which make their performance fully transparent, publish Quality Of Service (QoS) metrics, and are designed to support an SLA's objectives.
Changing the Focus of the SLA
When I managed a large data center which hosted a large-scale data acquisition application suite for corporate customers, and we suffered the a catastrophic power outage due from an unpredicted scenario that our many redundant systems did not cover, there was one thing I would have appreciated in addition to operations credits from our data center provider; assistance with managing the aftermath. The majority of my customers were technically knowledgeable, understanding, and entirely reasonable. But, they wanted from me the same things I wanted from my provider; answers to questions like "What exactly happened? Please provide us with a clearly written play-by-play of how and when the issue was detected and escalated," "What was done to resolve the issue?," "What is the plan for ensuring this issue never occurs again?," etc., etc. Salesforce.com was one of the first large CRM application providers who recognized this desire amongst their business users and addressed it by creating an online status console of the systems that made up their grid. Some CRM analysts criticized this as a ploy to avoid an actual application SLA. But, this type of real-time information came in quite handy for business customers who in turn supported many users of their own. Google Apps also provides an application status dashboard, which most assuredly relieves some of the support pressure from their hundreds of thousands of users during unplanned outages. And they acknowledge the customer communication need as well:
Google Apps Premier SLA credit and commitment to communication
"3. In cases where your business requires an in-depth dialogue about the outage, we'll support your internal communication process through participation in post-mortem calls with you and your management team."There has also been some investigation into possible methods of guaranteeing the application users' "experience," by abstracting away the systems and application services used to deliver that experience, and somehow covering an applications "use cases" in an SLA. Few concrete examples have surfaced in this area and, in my opinion, would inevitably lead full circle back to applications being designed from the ground up to support SLAs, QoS metrics, monitoring, and all of the tools needed by the provider to ensure service delivery.
Businesses continue to assume the risk of depending on critical Internet hosted software without SLAs; they have been operating this way for years. But, as these applications become even more critical to business operations, the absence of the SLA is becoming one of the biggest hurdles to conducting big business "in the cloud."

0 comments:
Post a Comment