A lot of the below information was gathered by Alex Foster who is an Enterprise Product Manager at Windstream from a February 9, 2012 article along with my points and reviews.
For a high-growth market, the cloud industry is remarkably lacking in standardized terminology to describe what we are offering. Pundits often talk about the cloud turning computing into a standardized utility service, but at the moment the industry has not yet created the vocabulary and taxonomy that would enable customers to buy computing services based on well-understood composition.
So businesses need to understand details on what exactly they are getting. In a recent blog post, Gartner’s Lydia Leong gave some alarming examples of cloud providers conveniently twisting the definition of ‘private’ to fit whatever they happened to be selling. When there is this much room for interpretation on something as simple as public or private, it shows it is really critical for buyers to look under the hood.
So what should you know about your prospective cloud provider? Assuming the environment will be used for some sort of production application or virtual data center (not a pure test and development or batch process data analysis solution) here are some good starting points for discussion:
1. Understanding shared resources: Try getting beyond the usual public/private/public terms to understand what resources are shared and might be a point of concern from a performance or security standpoint.
What hardware (servers, blades, disks, LAN network gear, WAN network gear, firewalls etc.) will be completely dedicated to your cloud? Ideally your provider will offer you multiple options that you can combine in a hybrid model, allowing you to tailor the underlying cloud resources for specific application and security requirements. Hopefully after a while you will be able to map the service provider’s products to something like the matrix below which we use as well:
For any shared resources (particularly compute and storage) you’ll want to understand how resources are allocated to customers and what sort of performance you can expect.
For example, most public cloud providers heavily oversubscribe RAM, CPU and storage and this can lead to intermittent performance problems for demanding applications, particularly at busy times of day. Other cloud providers run cloud environments without oversubscription to ensure critical applications have plenty of resources available at all times.
1. Storage performance should also be one of the first areas you explore to understand the level of performance the cloud will be able to deliver for demanding transactional applications.
Storage I/O performance limits are one of the most frequent and most significant challenges customers run into when they migrate applications from internal environments to commodity public clouds.
Storage performance problems are particularly pernicious, as they tend to impact your most critical transactional applications most, so it is critical to pick a vendor that provides high-performance storage solutions that can meet the needs of your applications.
- Why hypervisors matter: Another area to investigate is how the hypervisor (or virtualization layer) allocates resources. While there are several major hypervisor choices on the market, VMware-based clouds tend to have a significant advantage here as VMware has some of the most robust and mature features for allocating resources or shares of resources. This can prevent the ‘noisy neighbor’ problem that plagues many cloud environments where one busy virtual machine interferes with the performance of other customers on the same physical server.
VMware based clouds also offer a number operational advantages to the service provider that tend to benefit the customer in turn.
For example, VMware offers live migration of running virtual machines (known as vMotion) from one physical server to another. This allows a provider to manually or automatically move customer workloads in a non-disruptive fashion for maintenance, upgrades or workload balancing across physical servers to provide better performance.
VMware features like vMotion and DRS are taken for granted in most virtualized enterprise environments but amazingly, many cloud providers do not have these capabilities.
The importance of these operational features was highlighted a few years ago when Amazon had to implement rolling reboots across their EC2 cloud environment to implement a scheduled software upgrade. This created planned downtime customers had to work around for every VM running in Amazon’s cloud.
Because most of our solutions are focused on mission critical applications, they leverage VMware features that allow them to automatically balance and non-disruptively migrate workloads across multiple physical hosts. This ensures that customers get the best possible performance and aren’t impacted by planned migrations or upgrades. In the rare event of a sudden server failure, another Vmware feature will ensure that all the effected Vms are respawned on a different host. These features sound basic to many sys admins with Vmware experience yet many public clouds lack these capabilities.
Hopefully these examples show that even something seemingly simple, just cloud resources in this case, can be highly nuanced when you get into the details of one provider’s implementation versus another.