Cloud Mechanics: The Cost of Customer Involvement in Managed Cloud Services

Afkham Azeez
5 min readSep 13, 2024

--

As someone fascinated by all things mechanics, I find inspiration in the humor of the sign displayed above from a mechanic’s shop. I must admit, I’ve been guilty of similar behavior in my role as a customer, and I wouldn’t blame the mechanic for wanting to charge me extra. In my role leading the SRE team, I often encounter situations where I wish we had a similar pricing board for our managed cloud services.

Don’t get me wrong. I myself enjoy tinkering with machines and sometimes mess up, and have to request the service of a career mechanic, at which point I end up spending more — to rectify some of the damages I may have done, in addition to rectifying the original problem.

So, what parallels can be drawn between customer interference in a mechanic’s work and the involvement of customer teams in how we install, manage, and operate managed clouds? In what ways does increased customer involvement drive up our costs?

As a business, it’s highly cost-effective for us to run deployments using standardized installation and monitoring scripts that adhere to well-defined, tried-and-tested processes. In other words, utilizing ‘cookie-cutter’ deployments allows us to take full advantage of established models and leverage the team’s deep familiarity with the tools, workflows, and methodologies. This streamlined approach not only reduces complexity but also enables us to achieve economies of scale, which translates into tangible benefits for our customers. By minimizing variations and complications, we can lower operational costs, ensure faster response times, and improve overall adherence to SLAs, ultimately providing a more efficient and cost-effective service.

Unfortunately, this streamlined approach isn’t always feasible. Many customers come with their own internal operations or cloud teams, along with specific standards, preferred technologies, governance frameworks, and security policies that they impose on our operations team. These internal requirements may be well-suited to the customer’s broader IT environment but often complicate the deployment and management of our managed cloud services. As a result, the ‘cookie-cutter’ solution that offers simplicity and efficiency must be set aside, and instead, we are required to make sometimes significant customizations to meet these specific demands.

These customizations can involve adopting entirely different governance policies, reconfiguring network setups, altering monitoring tools and strategies, and making changes at the process level to comply with customer-defined standards. This adds layers of complexity that deviate from our tried-and-tested methods, which are optimized for best practices, compliance, performance, cost efficiency, and scalability. The more deviations there are from our standard processes, the harder it becomes to leverage economies of scale, driving up both the time and cost involved in maintaining these environments.

Moreover, a key challenge arises when our team is given limited access to manage and operate the systems. Many customers want to retain a degree of control over their infrastructure, limiting our ability to make real-time decisions or automate certain processes. In such cases, the terms of operation are dictated by the customer’s teams, which can severely restrict our ability to respond quickly to incidents, roll out updates fast, continuous improvement, or optimize performance. This restricted access also reduces the agility that our teams rely on to efficiently manage cloud infrastructure and maintain uptime.

To further complicate matters, the customer’s internal teams sometimes make changes to the deployment environment without notifying us in advance. These uncoordinated changes, whether it’s tweaking configurations or updating systems, can lead to unforeseen issues, system downtime, or performance degradation. When problems occur, it can take significantly longer to diagnose and resolve the root cause, especially when we have to backtrack through unauthorized changes. This type of scenario not only increases downtime but also requires us to dedicate extra resources for troubleshooting, which in turn inflates costs for the customer.

On top of these operational hurdles, the increased communication overhead between our team and the customer’s internal teams adds another layer of complexity. Constant back-and-forth discussions, meetings to align processes, and additional approvals slow down the overall speed at which we can execute tasks. Each decision, change, or issue requires more steps for validation, which not only prolongs implementation timelines but also diverts resources away from more critical tasks. This coordination overhead translates directly into higher operational costs, as more man-hours are spent on managing communication, mitigating risks, and rectifying issues, rather than focusing on the core service.

Conflicts between our team and the customer’s operations teams often arise when there are differing priorities, technical approaches, or misaligned expectations. These disagreements can lead to significant delays as both teams try to resolve issues, navigate organizational politics, and reach compromises. The back-and-forth process consumes time, money, and energy that could be better spent on continuous improvement, R&D or optimizing the cloud environment. Prolonged conflicts can also take a mental and human toll on the teams involved, causing frustration, burnout, and diminished morale. This not only impacts the quality of work but also strains relationships, making collaboration even more difficult moving forward.

All of these factors combined — customized requirements, restricted access, customer-induced changes, and communication bottlenecks — drive up costs considerably. What could have been a smooth, cost-effective, and high-performance deployment becomes bogged down by inefficiencies. While we are fully capable of adapting to these custom demands, the associated costs, both in time and resources, increase for everyone involved. For the customer, this translates into higher service costs, longer response times, and potentially less effective cloud infrastructure management, as the added complexity detracts from our ability to deliver an optimal solution.

Just like the mechanic’s banner that escalates rates based on the customer’s level of involvement, the same principle holds true in our managed cloud services. When customers allow us to operate with minimal interference, using standardized processes and well-defined models, we can deliver efficient, cost-effective solutions. However, when they step in — whether by imposing custom requirements, limiting our access, or making changes without coordination — the complexity rises, much like when a customer “helps” the mechanic.

As with the mechanic who charges more for having to undo or navigate around a customer’s input, the more involvement and customizations required by the customer’s team, the higher the costs. The additional communication, troubleshooting, and reconfiguration eat away at the efficiency that would otherwise lead to lower costs, faster response times, and better outcomes.

Ultimately, the key to achieving the best results, both for us and our customers, is trust. Just as a mechanic delivers the best service when left to do their job, we provide the most efficient and cost-effective cloud management when we can apply our expertise with minimal constraints. When customers give us the freedom to operate smoothly, everyone benefits — through lower costs, streamlined operations, and optimal performance.

--

--

Afkham Azeez
Afkham Azeez

Written by Afkham Azeez

Head of SRE at WSO2, radio amateur (4S7AZE), nature lover, 4x4 enthusiast, maker, mechanic at heart