Don’t be fooled – by the cloud tech-talk of instances, databases, code, and APIs – into thinking that cloud is all about technology. Unfortunately, this kind of tech-talk can convince IT service management (ITSM) professionals into thinking that cloud is just another technology evolution with zero impact on . However, those that realize cloud is all about business, applications, service, and operations will understand the impact across the ITIL ITSM best practice framework – in particular, the impact on capacity management.
Cloud requires a different capacity management process to that used for traditional, on-premises IT services. The key changes are in the following five approaches, all of which make capacity management more granular, and move from the long-range “vague” to the short-range “specific”:
- Forecast horizon is shorter
- Speed of change is faster
- Blend of resources is finer
- Automated changes in response
- Balance of CapEx and OpEx
An ITSM professional that understands these five cloud capacity management approaches will be a huge asset to any organization, measured in terms of the business bottom line as well as service quality.
Forecast Horizon is Shorter
Buying the full IT stack for on-premises IT service delivery is a long, difficult, complex, and expensive process. Want to know how long?
It takes months, maybe nine-to-twelve months is standard, to design, procure, and deploy any reasonably complex system on-premises. Once procured, it has a lifetime of three, five, or seven years. Maybe longer. This is the long, long, length of the on-premises capacity management horizon.
Over that time, capacity is over-provisioned for peak workloads and this over-provisioning burns money. One might as well be throwing dollar bills out of the window. But in the traditional IT operations spirit of “I only get fired for outages,” capacity management thinking prefers to avoid under-provisioning that can hurt customers and therefore the business.
A capacity manager doesn’t have this many-year-horizon with cloud services. The capacity manager now only needs to forecast ahead as far as the time it takes to add more capacity to the cloud services and that is, on average, around 15 minutes from decision to deploy, including the time to make a coffee and get comfy in front of the console.
Cloud capacity managers additionally do longer predictions to save money by purchasing reserved cloud capacity, sometimes saving over 60% in costs. So capacity management still has a role to play in longer-forecast planning but it’s now about financial efficiency, not the avoidance of disaster.
Speed of Change is Faster
As if predicting capacity changes wasn’t hard enough, responding to them is difficult in non-cloud systems.
Capacity managers cannot quickly respond to unplanned changes in demand if it takes months to procure and deploy capacity on-premises. The brand is then damaged and customers leave if the IT service is down or the business can’t adequately process transactions during highly-visible seasonal fluctuations such as summertime or Christmas (when, unfortunately, many staff are off work).
Cloud components can be scaled quickly and even large amounts can be done in a few hours (10,000 VMs anyone?) with some extra communication with the cloud service provider. Plus, the business can scale down quickly too and turn off all of that excess capacity when the seasonal fluctuation subsides. This can’t be done on-premises, it can only be done in the cloud.
Blend of Resources is Fine
On-premises systems might be measured by the number and size of datacenters, comms rooms, and racks. Adding a server might mean adding another rack. That might mean adding another switch, and another rack. Which then might mean extending the closet or room, or even the datacenter.
To avoid hitting these capacity potholes, long-range capacity management forecasting is done to provide more capacity well ahead of the predicted demand. This is a standard “best practice” enterprise approach that’s wasteful and expensive.
In the cloud, it’s possible to keep on adding VMs without worrying about any physical infrastructure or other capacity limits – and so now the granularity of capacity is one virtual machine.
If it’s possible to use higher-order cloud services such as AWS S3 storage, then operations are further removed from storage capacity considerations as these are so scalable a normal enterprise will never hit the limits – and no capacity management is required in the traditional sense. Capacity management now moves to the question “How efficient are we being with our used capacity, can we save money?”
Automated Changes in Response
Responding to expected and unexpected demand causes much stress for a capacity manager. For instance, in a typical fixed-size, on-premises IT system there are physical limits to the processing capacity.
The normal behavior when capacity demand exceeds current supply is to push out or de-prioritize non-production workloads – something has to give. But what if getting the new product live is also business critical, and that’s what the non-production workloads are doing? Is the unplanned production capacity demand now delaying an important product release, promised to customers already through advertising and other communications?
In the cloud, this is handled differently. Capacity managers can use automated systems such as AWS EC2 Auto Scaling to manually, by schedule, or dynamically add capacity, such as more compute or more load balancers. The only upper limit to capacity supply is how much the business can afford to spend.
Balance of OpEx and CapEx
Pay-as-you-go (PAYG) is one of the five essential cloud characteristics. This consumption-focused purchasing method means that you can align operational expenditure to business need via only consuming the cloud services you need. The alternative approach with on-premises is purchasing hardware and software, and owning (and managing) these assets for a three, five, or seven-year period.
Some organizations have budget arrangements to annually plan spend against capital expenditure. This can also be done with the cloud with mix-and-match reserved capacity (annual) and PAYG (on demand). This allows capacity managers to cater for mostly-steady but occasionally-“bursty” workloads.
The other demonstration of mixing OpEx with CapEx is in the so-called Hybrid Cloud model – mix the CapEx-laden on-premises systems with OpEx-savvy public cloud – handling the steady-state workloads on-premises; and the fluctuations in the public cloud. If you can achieve this technically, architecturally, and operationally that is.
Capacity management is still important, but different, when it comes to cloud. The old constraints are different and a modern capacity manager is now constrained only by budget (and its efficient use) and a workload’s ability to exploit cloud architecture for auto scaling.
This blog was originally published in May 2017 on the SysAid blog.
Rafi Rainshtein is the VP R&D and DevOps at SysAid. He leads the troops who are responsible for delivering SysAid's products and continuously improving their quality and performance. He’s been programming since the age of six (really!) and can already see his geeky legacy continuing with his three young sons. He used to write comic strips for fun and publish them on his own website before he progressed into the animation field where he got to work with the likes of Elmo, Beavis and Butt-Head, and Lara Croft. Now, with over 20 years’ experience in software development management, Rafi maintains a healthy balance by adding basketball and running into his mix of hobbies.
Quite an informative piece on ITIL Capacity Management.. Good Read!
In this new IT world with MicroServices and Cloud services/computing which are dynamic in nature, what metrics are useful and how to collect those metrics for capacity management analysis. Any existing solution is available or new solutions has to be created for client.