Cloudthread Y Combinator
February 1, 2023

CloudCostClip: How to maximize rate optimization across clouds

Insights from Sanjna Srivatsa on rate optimization. How to get started, KPIs, the challenges of implementing rate optimization across clouds, and the advantages of doing rate optimization in-house.

Daniele Packard:

Hey, Sanjna. I'm very excited to chat with you about rate optimization. I think you have a particularly unique and nuanced perspective, on the one hand, because rate optimization is part of what you're managing at VMware, which has a large, complex, multi cloud hybrid cloud footprint, and partly because of your background as a data scientist, which gives you a unique perspective on finops in general and rate optimization specifically. I am very excited to jump at some questions. And thank you so much for taking the time.

Sanjna Srivatsa:

I'm really excited to be here. Thank you for having me.

Daniele Packard:

Whenever we do cloud cost clips, the first couple of questions are just giving a bit of an introduction into the context, and then we jump into our guests unique perspective. I'd love to hear from you, I'm new to FinOps; I've only ever used on-demand instances; what would your two-sentence description of rate optimization be to me? 

Sanjna Srivatsa:

If I were to describe rate optimization in one sentence, it would be that rate optimization is an effort to pay less for long-lasting, predictable workloads. So there are many tools that can help you execute on rate optimization, like reservations and card savings plans. But it is important to note that all of these are simply building constructs; they do not have an impact directly on your workload. You can think of it as an exchange for a commitment to host a certain type or amount of your workload on a vendor's cloud; you get a discount, much like signing up for a one-year gym membership upfront.

Daniele Packard:

I love the analogy. You mentioned RI savings, could you elaborate on the differences between these and when it's most appropriate to use different types of rate optimization in different contexts?

Sanjna Srivatsa:

So reservations or RIs are commitments based on resources. When you purchase an RI, you will need to be pretty specific about the region and instance type, among other things. So any reservation purchased will always map to one type of workload at any given time. So there can be some flexibility to exchange them or convert them to another cloud provider. However, with more rigid reservations, longer terms, and larger upfront payments, they tend to be the highest discount providers; savings plans, or SPS, spent base commitments. So they give you more flexibility, but slightly lower discounts as compared to reservations. So with savings plans, you are required to commit to using $1 per hour. And as that hour passes, the dollar amount is gone, and you cannot carry that forward. Each vendor's custom algorithm will then float in air quotes, and it is this commitment to the workload that can get the best type of discount for all of your workloads. So it's fantastic that if you have workloads in obscure regions or instance types, this is the ideal type of tool to use, as savings plans are not tied to a region or instance type. Both reservations and savings span, however, are tied to one or a small group of allied services. And as an SPS, as I said earlier, they're all simply billing constructs. I think this is something that trips people off, they feel like it has something to do with their workloads. It is not, so you can get started with binaries and SPS at any time, and it will not have an impact on your workload.

Daniele Packard:

Amazing, great descriptions, and this is a bit more relevant to your role as a kind of FinOps manager. I'd love to hear both, maybe starting out, and also at scale. What do you think is an appropriate way to think of rate optimization goals or metrics targets for success? What are the KPIs that are used to evaluate success?

Sanjna Srivatsa:

In the simplest terms, I think our mantra has been to buy small and often. This is a hard thing to achieve if you haven't maintained the discipline right from the start. When you buy small, you give yourself the time you need to understand your usage pattern and the variances that you encounter, and while you are taking the time to better understand your usage, you can already start saving because you bought small. When you buy frequently, you reduce your risk in the event that your workload changes dramatically. So if I have these pockets of change, as we call them, If someone wants to migrate or if someone wants to downsize, we have these pockets that show up often. And that creates opportunities for us to change our workload without creating too much waste. We also want to ensure that our reservation fleet is diverse in terms of length, rigidity, and compute savings plans and services, as would be expected. So we basically follow the same principles that a good and safe personal finance investor would. In terms of goals and coverage targets, I think a high-level generalization would be that more is better, but it really depends. You should only buy RI and savings plans when you know you are at least planning on using them until the break even period. The breakeven period is what we call the minimum time that you must use an RI to not lose money on the whole deal. And if that means your average person is 60, that might be the right number for you. It completely depends on your usage trends and your risk tolerance. And all this does take time to learn. And if you are a small business and you just do not have the expertise, you can use a third-party vendor to help you start saving money as you learn this about yourself.

Daniele Packard:

That's really interesting. I'd love to hear you mention that we've talked in the past about how VMware uses multiple clouds. And so in a multi-cloud environment, what is it that makes rate optimization across clouds challenging? What are some unique challenges that pop up when you're working with a multi cloud?

Sanjna Srivatsa:

Predictability is the hardest aspect of rate optimization, particularly in a multi-cloud scenario. As sure as you are about your product workloads and business goals. Today, things can change, and RIs aren't built for change. It is a little ironic that something that is based on predictability isn't aligned with your business goals. In time, you will learn how to strike that balance between risk and return. The risks in this case would be change and potential waste. However, returns are available at a significant discount. So, depending on where your company is in its journey, you'll have to learn to strike that balance. 

Daniele Packard:

Interesting. I mean, you talked about finding the breakeven point and starting small. And as the mantra goes, I'd love to hear what creating a rate optimization process in house looks like.

Sanjna Srivatsa:

I cannot go into too much detail. But I can say that we did have a crawl, walk, and run phase. There was a manual process where we started off, and we moved on to a semi-automated state with a little more intelligence with ML in the background. Now we're working towards being fully automated and having more intelligence.

Daniele Packard:

Amazing. What are the reasons for bringing it in house? Like you said, for someone who's just starting out with the use of third parties to help with rate optimization, you guys made the decision to bring it in house. And I'd love to hear what the benefits, challenges, and trade-offs are of doing rate optimization in-house versus through a third party.

Sanjna Srivatsa:

Cloud costs are probably in the top three line items for any large enterprise today. That is also true for us. Outsourcing our management means a big chunk of money leaves the company in fees, which is not desirable. At the same time, we want to be able to save as much money as possible and do a good job at it because good rate optimization can impact the bottom line just as much as some products can contribute to the top line. So it's actually very powerful. For these reasons, we decided to build our own IRA management system. Additionally, our team also has business context, which, in my opinion, is critical to being able to predict short-term usage trends. Third parties can have excellent machine learning (ML) AI-driven prediction models, but the prediction model will fail if they don't have business context for what your company is going through and what each product lifecycle has planned. So because of this, we feel like with the right analytics and ML skills and having good business context, that was kind of our secret sauce for success in our conversion rate optimization. There are some challenges when we see unexpected and erratic downsizing. So an increasing workload is not really a problem when you have items in your home with erratic workload reductions? That's a little problematic, doesn't happen often, but when it does, we find ourselves exploring third-party help if we need it.

Daniele Packard:

Yeah, that's really interesting. It's important to understand the business context. And so often we talk to companies, and it's clear that rate optimization is happening in a silo, or maybe they're using a simple time series forecast. And there's no bridge or communication between business forecasts and an actual cloud forecast based on growth predicted in the business. So it's great to hear talk about that. And that is undoubtedly a gap that we see frequently within organizations. Well, this has been super insightful for me, and I hope it will be super insightful for everyone who listens to this cloud costs clip. Thank you so much, again, for signing up and taking the time.

Sanjna Srivatsa:

Thank you, Daniele. It was wonderful talking to you.

Make cloud costs a first class metric for your engineering organization.
Copyright © 2024 CloudThread Inc.
All rights reserved.
Copyright © 2024 CloudThread Inc. All rights reserved