How to Avoid the Bankrupt Private Cloud Part 5/6
Lessons learned from successful Virtual Data Center pioneers
This is the third in a three part blog series that releases a new white paper with the above title.
Today’s final installment concludes with Lessons 5 and 6.
For more information, join today’s webinar today led by the author, Graham Gillen, who will look at this in detail and answer questions. To register, click here.
Lesson 5: Become strategic by winning over Tier 1 Application Owners.
So you’re confident in your cloud architecture. You’ve got a solution that gives you cross-platform insight, including the complex relationship between performance and capacity. How do you get Tier 1 application owners on board?
Tier 1 applications are the lifeblood of business. They are also usually the most complex and expensive applications to operate and maintain, with the greatest potential benefits to gain from cloud-based infrastructures. But championing this new infrastructure poses new challenges that are psychological as well technological.
By nature, Tier 1 application owners are risk averse because the business is impacted if performance is degraded. As a result they prefer over-provisioning to ensure performance, whether you are talking about physical or virtual resources. Tier 1 app owners also typically want dedicated resources with minimal consolidation ratios (which goes against the grain of shared resource models). And while cost savings are important, don’t expect them to be impressed by your server consolidation success: a few dollars saved on hardware is less important to them than NOT hearing from end-users about performance problems. So instead of just touting server consolidation, the best cloud computing champions will persuade application owners to migrate to the cloud infrastructure by stating,
“I can ensure the same or better performance for your application while saving you money on hardware and operational expenses at the same time.” When asked “how will you do this?” they will reply:
- I can map your application workload to my virtual / cloud environment.
- I can “right size” provisioning to, based on your budget, meet your choice of “Silver,” “Gold” or “Platinum” SLAs.
- When your application has peak loads – planned or unplanned – I can dynamically reallocate or provision additional resources – and you only pay for the usage “as you go.”
- I can analyze and forecast performance and easily add – or reduce – computing capacity.
- I can provide you with disaster recovery at a fraction of current cost, and I can get you back up just as quickly.
The lesson: as you design your cloud architecture, make sure you have the technology to address the questions above. But keep in mind the psychology of your customer (Tier 1 application owners). To them it’s almost always about performance, security and compliance first, and cost second.
Lesson 6: Dynamic resource allocation needs Behavior Learning Technology.
To be more specific, you need predictive analytics powered by “behavior learning” technology. This is a new concept so we’ll elaborate. The key question at this point is: have we covered all our bases to ensure performance for our Tier 1 application owners?
One of the key capabilities in the architecture of private cloud infrastructures is to be able to dynamically “right-size” resources to proactively meet periods of peak demand for applications. This goes back to the roots of “autonomic computing,” marketed for years by the likes of IBM, etc. Virtualization technologies and private or hybrid cloud computing models have made it a closer reality due to how fast and dynamic changes can be made.
A new class of software called “Service Directors” promises to orchestrate the dynamic resource allocation life cycle for cloud infrastructures. But how will these Service Directors know when and what type of allocation changes to make? Can you rely on manual methods similar to how VMware DRS (Dynamic Resource Scheduler) currently works?
“With VMware DRS, users define the rules for allocation of physical resources among virtual machines. The utility can be configured for manual or automatic control.”2
Bad resource allocation policies can put the performance of Tier 1 applications at risk in a dynamic environment with so many moving parts. And this type of manual approach is reactive rather than proactive. By the time your allocation rule kicks in, it may already be too late. As one industry analyst puts it the problem is that “Many IT services are reaching a level of complexity where sophisticated mathematical algorithms and object models of the services are more precise and efficient than even your most talented engineers.”3 So what is the solution? As Gartner sees it, the answer may be in “behavior learning” technology. In an overview of the technology, Gartner states:
“Consider behavior learning technologies to gain a consolidated, holistic view of IT infrastructure and to gain an early warning of potential outages based on trends and patterns.”4
Behavior learning technology uses mathematics to provide the predictive analytics necessary to proactively manage performance in the cloud. The diagram below explains how all the ideas we’ve discussed come together in a solution.
First, existing monitoring solutions collect KPIs important in measuring performance and capacity. Then technology like Netuitive’s Behavior Learning Engine™ aggregates and analyzes this data to understand and correlate behavior across IT silos. If performance problems are forecasted, the software makes a pre-emptive suggestion to a Service Director application which closes the “control loop” by triggering resource allocation changes after ensuring workflow and business policy compliance.
The diagram below is an example of closed-loop architecture for managing dynamic resource allocation in private cloud infrastructures. Behavior learning technology like Netuitive is uniquely suitable to play a central analytics role in the architecture.
- Performance and capacity KPIs, and system attributes are collected into a centralized database (for example, Netuitive’s Performance Management Database or PMDB.)
- Integrations to service catalogs automatically group underlying components by service or application.
- Key business KPIs, policies and Service Level Agreement information is integrated into the analytics engine.
- Predictive analytics (with behavior learning technology) forecasts impending performance issues and sends Trusted Alarms to IT Operations consoles; if the problem is resource related, a Trusted Trigger is also sent to the Service Director application.
- Management policies may require human operator confirmation of resource allocation decisions, which may be subject to various business constraints managed by the Service Director.
- After ensuring that resource allocation requests conform to business policy and workflow requirements, the Service Director initiates action in resource controllers (such as VMware DRS or Microsoft SCVMM.)
The final lesson: to automate accurate resource allocation in dynamic cloud infrastructures, you need predictive analytics powered by “behavior learning” technology.
Enterprises typically share the same major goals when investigating virtualized cloud computing models, the most common of which are increased efficiencies through resource sharing, and improved agility to respond to fast-moving business needs. Cloud infrastructures have potential for improved change control through easier replication and rollback, as well as faster (and possibly cheaper) disaster recovery. But the ultimate sign of success for those leading cloud computing initiatives is to run their top applications on this new infrastructure.
After overcoming early design challenges like eliminating virtual sprawl and monitoring point tool proliferation, you can win over Tier 1 application owners with a compelling and broad value proposition. But at this point, you may painfully discover that you are ill equipped to deal with the complexity of performance management in the new infrastructure. There are too many moving parts with complex dependencies, and tools that rely on manual, policy-based approaches to performance monitoring and dynamic resource allocation simply won’t work. You risk letting application owners down on promised performance and flexibility.
So in researching the problem, consider that Gartner believes that solutions like Netuitive “move IT Operations to a more proactive state, where issues can be detected and addressed before affecting the business.”5 And for running Tier 1 applications in your virtualized private cloud, this may just be what the doctor ordered.
Netuitive provides predictive analytics software for IT. Using its patented Behavior Learning Engine, Netuitive software replaces manual, rules-based methods for performance monitoring with automated statistical analysis that correlates and self-learns the operational behavior of IT systems and applications. It then forecasts issues before they impact performance and isolates root causes wherever a problem occurs. By integrating with nearly all of the industry's leading monitoring tools including VMware, HP, Microsoft, BMC, IBM, and CA, Netuitive is able to provide a holistic view across physical, virtual and cloud infrastructures enabling you to proactively manage performance and capacity.
“Netuitive takes all this telemetry out a massively virtualized infrastructure (the Cloud), learns what’s normal, and fine tunes the monitoring accordingly to point out exceptions. This is highly compelling.” - CTO, Top 10 Global Bank
References and Further Reading
1. Kotsovinos, Evangelos; “Virtualization: Blessing or Curse?”; 22 November, 2010; ACM Queue White Paper;
2. TechTarget online definition; http://searchvmware.techtarget.com/sDefinition/0,,sid179_gci1307805,00.html
3. O’Donnell, Glenn; Dines, Rachel; “Virtualization planning: 4 Systems Management Keys to Success”; May 20, 2010;
4. Williams, David; “An Introduction to IT Operations Behavior Learning Tools”; 16 December, 2009; Gartner Research; ID Number G00172843.
5. Williams, David; “Behavior Learning Software Enables Proactive Management at one of World's Largest Telecom Companies”; 15 April 2009; Gartner Research; ID Number G00167307.
6. “Netuitive Technical Overview: Behavior Learning Technology and Predictive Analytics for IT”; December 2010; available at www.netuitive.com.
- Predictive Analytics
- Application Performance Management
- Cloud Computing
- Management Team