The concept of multi-cloud has been kicking around for a while but has become much more of a hot topic through 2020 and into 2021. The idea being that organisations deliberately choose to place or split workloads across more than one cloud vendor. This article discusses multi-cloud strategies, why organisations choose a multi-cloud strategy and the challenges involved.
There are several reasons why organisations choose a multi-cloud approach, the most common of which are:
“We want to ensure we have a flexible approach to technology and equip our business with the best solution for the situation”
“We’ve witnessed the regional and global outages the cloud platforms endure. We don’t want our business to be dependent on one cloud vendor on that basis”
“We have a primary cloud for the majority of workloads and offer our business a secondary cloud service for niche or edge case requirements where our primary cloud doesn’t have the right solution”
“We want to avoid vendor lock-in”
This is talked about from time to time but I’ve only witnessed it once and it wasn’t a deliberate strategy.
The idea being that you deliberately spread your workload across one or more cloud vendors. This could be to ensure extreme levels of high availability should a cloud service go down entirely (which to be fair, does happen), or it could be to enable quick disengagement should an organisation no longer want to work with a particular cloud vendor, or sell part of the business.
Having a primary cloud is a common model as it allows organisations to concentrate their spending, effort and attention on a single cloud. Sometimes this model is deliberately chosen by the Cloud Centre of Excellence (CCoE) within an organisation – the team responsible for evangelising cloud and setting standards within the business.
It can also be by accident. An organisation has chosen a cloud vendor and established a number of workloads. However, a business requirement for a workload needs a service that perhaps your chosen cloud doesn’t have but another does. No problem, we can start up a new tenant very quickly and get going. The organisation then ends up with an 80/20 split.
A third and common occurrence of multi-cloud, also by accident is through mergers and acquisitions.
The reason for a multi-cloud strategy I’m hearing more and more about is the fear of vendor lock-in.
What is Vendor Lock-in and Why is it Perceived to be Bad?
I guarantee that at some point in their life everyone reading this has had a terrible experience with a company they’ve purchased from. We all experience it in our home lives with purchasing cars, renting houses, broadband subscriptions, energy suppliers, grocery deliveries, etc. Vendor lock-in is where you enter a long term agreement that makes it difficult or detrimental to break the contract or at contract end, move on to another supplier. These are often compounded by containing clauses in the contract that typically cause you to spend more money than originally intended.
Historically in IT managed service some vendor lock-in contracts have been horrific and these are the ones that tar everyone with the same brush. I’ve worked with companies I won’t name who won’t lift a finger if the request isn’t listed as a service covered in the contract, or charge an eye watering fee to do so. Contracts that are often explicitly crafted to itemise in fine detail everything covered so that there will certainly be many exclusions and therefore a regular additional income from the contract because the client is left with no choice but to pay.
I am overjoyed to say that these types of companies and contracts have been dying out for some time. Public cloud has made good headway here with pay-as-you-go services. Pay for what you use attitude with no quibble cancellations. However, the memory lingers, particularly in the public sector.
Vendor Lock-in and Servers
I typically hear about vendor lock-in from those who are highly competent in server design and management.
The great thing about servers is that the apps held within them are highly portable (Whoa there, hear me out a minute). So whether the server is physical, or a VM on VMware, Azure, AWS, or Google, a Red Hat server is still a Red Hat server and a Windows server is still a Windows server. There is zero change regardless of where the server is located. So the applications deployed and structured on those servers also remains the same regardless of location making it easy to port, or distribute the application between or across clouds. In my opinion that’s where the multi-cloud strategy to avoid vendor lock-in starts and finishes. From a server perspective it’s easy to stipulate a no lock-in multi-cloud requirement because I can easily move my servers anywhere (relatively speaking).
Lets just pull on that thread a little…
Servers don’t operate on their own. They need storage, networking, security services, patching services, backup services, etc. All those services need their own supporting infra, capability and management. Most of these services have capabilities specific to a cloud vendor.
ExpressRoute for Azure is a good example here. ExpressRoute is a feature reach, highly resilient, high speed point to point Azure communication service for connecting offices, data centres and Azure regions together. For most enterprises it’s a defacto conduit and core part of any distributed Azure deployment and if you’re following the Cloud Adoption Framework at Enterprise Scale an integral part. It’s also not something you could pick up and put down on a month by month basis like you can with a VM.
Another great example specific to VMs is the use of Reserved Instances (RIs). You can purchase RIs on a long term commitment basis in order to gain significant reductions in the consumption cost for your VMs. For example if a VM costs around $100 a month you can gain a potential saving of $70-$80. Good right! However, those commitments come at 1 or 3 years. So you’re committing to the cloud platform for that length of time.
What I’m getting at is that once you start deploying to a cloud platform, regardless of where the workload is housed you are subjected to a certain level of vendor lock-in. It is unavoidable but it should not be scary.
Note – For the benefit or app portability the trade off is that servers are expensive, difficult to automate and they come with an enormous maintenance overhead. I’m not an advocate of using servers to build new apps.
Management and Governance – Choosing Your Tools
One of the big decisions you have to address frequently for management and governance is do you choose the capability native to a cloud vendor, or a 3rd party solution that is cloud agnostic. Both have their pros and cons
Using tools and services provided by the cloud vendor comes with several advantages:
- Sometimes free to use
- Paid services wrapped up in the same billing structure and management
- Often vendor provided has greater compatibility and can be more feature rich
- If you’re an MSP seeking vendor specific competencies such as Azure Expert MSP then using native management and governance tools is often a demonstrable requirement
- Requires less administrative maintenance (alliances, billing, technical skills, etc)
Using 3rd party tools also offers several advantages:
- Same interface and functionality regardless of cloud platform – good for organisations with larger teams
- Provides easy roll-up governance and reporting – single pane of glass
- Training for one product across several clouds rather than one service per cloud
Distributing a workload across multiple clouds is phenomenally complex in terms of seeing it as a whole from end to end. This is needed for monitoring, observability, security, billing and engineering. You’re almost definitely into a lot of bespoke engineering to tie it all together or a number of 3rd party products. This automatically pumps up the cost of headcount, skills and 3rd party purchases.
This is not to say it isn’t the right strategy, but it’s a strategy that needs deep pockets and a lot of staff.
Obtaining and Retaining Skills
All of the multi-cloud strategies have one particular challenge in common, obtaining and retaining skills needed to service the business effectively
Public cloud platforms offer a very large and comprehensive set of services replete with features to suit many different types of workload, and manage those workloads. Added to that is the frequency with which new features and services are rolled out. This paints a picture of a highly complex and fluid platform of services. In a word, they are immense.
I can say with experience that the effort involved in obtaining the right skills and keeping them current comes with a very high cost to the organisation. There is much more time needed now in R&D, training, vendor events, and certification so that staff can not only understand a new platform capability and how to apply it, but also how it benefits a particular workload or the business as a whole. There are several strategies to manage this challenge, the most common of which is to have a core team servicing the business that is supplemented by contracting help from professional and managed service providers.
But make no mistake, it’s a colossal effort and cost to maintain the right level and distribution of skills for one vendor. It obviously multiplies when you add another cloud vendor.
All clouds are not equal. They cater for different mindsets and methodologies.
Amazon and Google both cater heavily to open source developers, architects, administrators and operators. This is a super dynamic community comfortable with adopting tools and code without a support contract and customising them to suit their needs. Taking something with some jigsaw edges and extending the capability rapidly.
Microsoft historically has catered to the opposite end of the market, building servers, applications and services that provide a complete solution, managed through an easy to use GUI. This has continued into Azure with an incredibly rich console interface and feature rich set of services. Microsoft have also invested heavily in management and governance for decades and build it into everything they do.
This not to say that Microsoft doesn’t embrace open source or the open source community (they purchased GitHub and kept it that way), but there is a reason their server and application products where phenomenally successful prior to our modern age of public cloud. If I could sum up Microsoft I’d say their aim was to make IT easy and accessible. I think this sits at the heart of Azure and is one of the reasons they keep breaking revenue records each year.
Multi-cloud is not inherently bad but it comes with some significant challenges.
Operating multiple clouds as a strategy requires significant investment in people and training. The greatest challenge in cloud skills management is retaining them. This in itself needs more than just a good salary, you need a good culture that fosters a challenging but supportive and fun environment.
Vendor lock-in is unavoidable to an extent if you truly commit to taking the advantages of cloud.
Spreading or splitting workloads across multiple clouds increases complexity. The greater the complexity the greater the investment required.
Operating multiple clouds comes with more one-way door decisions (decisions that you’re stuck with or are extremely hard or costly to undo). Chief among which is management and governance choices.
Using more than one cloud grants you greater technological and geographical choice.
Ultimately, whether a multi-cloud strategy is suitable depends on the objectives of the business.
If I was the decision maker in an organisation what would I do? I wouldn’t set out to have a multi-cloud approach from the get go. Start with one and go deep with commitment and skills. Expand if it’s right to do so.