Cloudthread Y Combinator
March 9, 2023

CloudCostClip: Essential FinOps workflows and if they can be automated

Expert opinion by Dann Berg based on his vast experience from DataDog and Fullstory. What can be automated, when automation is needed at all, and if we all are getting replaced by AI anytime soon?

Ilia Semenov:

Hi everyone,

Today we have a true FinOps celebrity and a talented writer, Dann Berg, with us.

We will be discussing an important topic with him: essential FinOps workflows and if they can be automated.

Dann, thank you for taking the time to chat today. How have you been?

Dann Berg:

I've been well, and I want to say thank you so much for having me on the cloud cost clips.

It's great to be here.

Ilia Semenov:

It's great to have you. So you have really deep expertise in managing cloud costs, and you did it for a long time at DataDog and most recently at FullStory. I bet you have a really strong perspective on the automation of FinOps’ day-to-day tasks.

Maybe we can start with a quick introduction to the context. Let's talk about examples of essential FinOps workflows and how much attention throughout your career you've been paying to automate those.

Dann Berg:

I'm happy to jump into that. So it's interesting in terms of automating your workflows and what the essential processes are. I think it goes hand in hand with knowing when it's time to grow and scale your team or who you're working with and then practicing internally at a company because usually, as a FinOps practitioner, your goal is to analyze costs, find valuable stuff, and communicate that out.

Do cost investigations as needed, and that's one element of your work. I feel all of that; there are areas for automation, and it depends on whether you're building your tools at your company, in which case there's going to be specific automation. So if you're using third-party tools, there might be some sort of automation around how you interact with those third-party tools.

All of these are going to be right for automation, and at first, it comes from you doing the work and figuring out what you need to do, and then it comes time to either automate it or hire new people or have other people and pass along the work to them. Because what I find with a lot of FinOps work is that you might put in a quarter or two of work to create a lot of value. So you create this thing of value. It's a report you handed off to engineers. You headed up to senior leadership, and they said, "This is wonderful. We want to continue having this." And so, suddenly, this project that had an end date is now recurring work. The only way to make that recurring work manageable is either to automate it as much as possible or to grow the team, or you could just stop investigating new features of the finOps organization and just focus on that.

So, it's either stop growing, automate, or grow the team. And then that's really where I see automation fitting in.

Ilia Semenov:

Yeah, that makes a lot of sense. And based on your experience, what kinds of workflows do you find the most automated today versus the least automated?

Dann Berg:

I think the most automated process usually involves analysis. So usually that's a recurring process, either daily, weekly, or monthly; however, it is set up in your organization. Often, it's done with a third-party tool, and this can be automated. So that you can just go and pull up a dashboard that shows all the information that you need, maybe those are sent to individual people that need to see them. There could be automation around alerting. So you have the valuable alerts; set up those alerts to go to the right people; those alerts include actionable data, so the stuff they might need to investigate for that particular alert is automatically included in there.

That includes all forms of automation. Where I think it's most challenging to have automation is with the artisanal investigations that might pop up, especially at the beginning when you first started a company, but there's always going to be something new or different popping up, and it's those cost anomaly investigations where you really can't have automation.

We can't have automation if you're doing something new. And hopefully, if it's something that you might need to repeat, you're able to build some sort of automation around it to shorten the amount of time that it takes in the future.

Ilia Semenov:

That makes a lot of sense, and from my personal experience, especially when I was working at Electronic Arts, like years ago, it was just a central unit with a lot of game teams. I was finding that the least automated part of my workflow was going around game teams, like independent engineering teams, and trying to collaborate with them. That was a very repetitive thing. Almost every day I had to go and ask somebody. Do you need this workload for how long should I prepare for that or not? That was like one of them: getting the projections. Another part was when I was looking for certain anomalies or cost optimization recommendations. I was going to them and trying to kind of like deliver this idea that "Hey, let's implement this because, you know, we are wasting money," and that was a huge challenge. It was essentially that elaboration piece, like trying to streamline that.

From your experience, how painful was it for you? And do you think these types of things can ever be automated? We all know that FinOps is a culture in and of itself, and it's all about the collaboration between engineering, finance, and business. Can that be automated at all?

Dann Berg:

Yeah, it depends on the culture and where you work. So when I was at Datadog, we had monthly meetings with the top spending engineering teams, and that couldn't be automated because we were meeting in person.

We were talking, and the information that we were exchanging could have been done via email, so there could have been some form of automation. But I feel like a lot is missing from that, and again, it depends on certain work cultures, because it's possible that the information that was exchanged in those meetings at a certain point was valuable enough to meet in person, and you do want to exchange that information via email or some other report, but as long as you're still seeing the value in the process, it doesn't make sense to change it. Unless it's negatively impacting somebody else's workload. Like they can't take the time to meet in person because they're working on X, Y, or Z.

So it depends on your situation, and I think it makes sense when you have processes that involve other people, such as recurring meetings, to check in regularly. To see if that in-person meeting is still providing value and kind of having that measurement because if you can have some sort of email that's automatically sent to them monthly that they fill out with the information that you might need or so on and so forth, or if it might just be easier to meet over Zoom for 15 minutes or something, it depends on your organization.

Ilia Semenov:

And what do you think about this trance that we can witness today around the CI/CD FinOps automation, where engineering teams are getting their pool request blocked? For example, consider the expected cost impact. Do you think that it’s like a type of automation that can change this whole interaction?

Dann Berg:

Yeah, that goes back to culture. Because there are a lot of organizations that have a culture where they don't want to slow down engineering, you do whatever you can to keep the pace of engineering fast, and then you respond.

So if there is a cost spike or a cost anomaly, you hopefully have systems in place to catch that and then go back and remedy it.

When you start having these FinOps practices that fit into the CI/CD workflow and are attempting to be proactive, hopefully, you're set up in a way that doesn't slow down engineering, but in the example that you gave, if somebody is trying to commit some code, that is over the threshold. You might put a red or a yellow light on that move and thus slow it down. But as a result, you caught something that you couldn't otherwise have caught. And I think depending on what you're doing or how you're setting the red and yellow lights, it could be okay to have them. In my experience, when people are committing code, you can only really catch things like cost anomalies if there is a mistake. So if you have some sort of thing in the CI/CD that catches mistakes like, Oh, I thought I was doing sixty, but it's six hundred, that's great.

It doesn't happen often, but if it does, it's great to catch it as it's happening. A lot of other times when there have been constant anomalies, it's less to do with code that's being committed and more to do with usage anomalies or something happening on that side. So having the automation in place will allow us to catch that and communicate it as soon as possible. Hopefully, before the bill arrives in even two days, later is really when you want to be doing it.

Ilia Semenov:

Yeah, that makes complete sense. It's like a dynamic environment, and just the ICD cannot be the answer to all the problems here.

It's like a nice segue into my next question. I'm very curious to know how you see the future of FinOps workflow automation tools. Do you have any visions or ideas?

Dann Berg:

I think it's in its infancy. So right now we're just seeing an explosion of different solutions in this space, and I think they fall into a few different categories. We touched on CI/CD tools, and I think that proactive CI/CD FinOps tools are a very nascent category. I think there's a lot of room for innovation in that category.

The other place is the reactive tools, or "cost explore plus," as I call them. It's a whole suite of tools that ingest detailed billing data and show back exactly where their meaning comes from. Even that, I think, is in its infancy. We kind of had the early players in the space that all sold very early, in my opinion.

And now we're seeing the second way of just coming through. And I think over the next decade, we're just going to see even more growth, then consolidation, and then some cool stuff happening. So it's really interesting for me to work in this space and be able to kind of watch this all play out because it's really interesting.

Ilia Semenov:

That's very interesting for me as well. Well, with the current AI hype going on, do you think that in 10 years this will be taken over by, say, ChatGPT 7.0, and you know, FinOps practitioners will have nothing to automate because everything will be done for them?

Dann Berg:

When it comes to AI, I have difficulty answering specific questions. Like, how will it impact FinOps? Just because I think that the impact that is going to have on society, in general, is going to be so seismic that in 10 years, I don't know if that is going to be a question that we can answer or ask at that time. I think when it comes to AI, we are on the edge of a knife right now, and it's very possible that in one direction we will go to human extinction, and the other direction would be grand, like humans living forever, living in harmony, and all the other beautiful things. It sounds like science fiction, but everything that I'm reading about the AI space makes me think that these risks are very real, and so right now I just feel terrified overall.

Ilia Semenov:

Thanks for sharing. I sincerely hope it is the latter rather than the former, and that we will all arrive on the right side.

Dann Berg:

Well, Ray Kurzweil says that in 2040 we're going to have the singularity, everything's going to be perfect, and he thinks humans and AI will be one in perfect harmony.

Nick Bostrom, who wrote Super Intelligence, has a bit of a different take on it in terms of the risks. These are two very smart people, so we'll see who's right.

Ilia Semenov:

Yeah, awesome.

Now we are getting to the end of our discussion, which was super interesting, and as per tradition, I want to ask you what is your recommendation to somebody starting their FinOps practice today and thinking about what they should automate first and what they should pay attention to first.

Dann Berg:

For anybody that's in this space. I think it's still so early, so there's still so much opportunity. Even when it's late, there's an opportunity, but if you start getting at it now, you are in there at the right time.

If I had to start, I would get certified. I'd join the foundation and just get started. Anyone who doesn't have a role yet should open up an AWS account, spin up some things, play around with Cost Explorer, get into the actual current data, and see if they can create some analysis that shows what they are doing and automate things around there.

Just get that practice, and then follow the job board and see what you can find.

Ilia Semenov:

Yeah, great recommendation; I second that. Thank you so much. It was a pleasure.

Dann Berg:

It was great being here. Thank you.

Make cloud costs a first class metric for your engineering organization.
Copyright © 2024 CloudThread Inc.
All rights reserved.
Copyright © 2024 CloudThread Inc. All rights reserved