alwaysOn

Things I probably should have known...

Home: Welcome

Cynthia Unwin

1 day ago7 min read

What I learned building an agentic ant colony.

Photo by Jorge Coromina on Unsplash Over the past week or so I have been working on coding an agent ant colony that restores service to a running application. The agents don't do complex RCA, log tickets, interact with engineers etc.. They just keep the application up and running. It was definitely fun, and I learned some interesting and useful things. What I built: A simple python based web ordering application with a front end supported by two micro-services each with

Cynthia Unwin

Jan 157 min read

Agents: How do we know they work?

Photo by Dean Pugh on Unsplash Agentic platforms are everywhere and we are pushing forward to use more and more AI driven software. As Site Reliability Engineers we need to really think about what it means to run diverse agent platforms at scale. We need to think about what needs to be in place to make them manageable. How do we know right now that our agents are working? What do we need to see in the logs to troubleshoot when they don't? What data needs to be gathered a

Cynthia Unwin

Jan 84 min read

Framing the Problem

Photo by Gaspar Uhas on Unsplash Following through on the the fundamental assumption that the key to solving a problem is understanding what that problem is and being able to ask the right questions about that problem to expose how to create a solution, it's time to step back and take a quick review of what it is that we, the AI Enabled SRE community, are talking about when we discuss AIOps. When we look at IT Operations through the lens of how we implement AIOps we deal wi

Cynthia Unwin

Jan 44 min read

Asking the right question

Photo by Camylla Battani on Unsplash "If I had an hour to solve a problem, I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." Albert Einstein I've been thinking about Bas Pluim's comment from my post the other day. It isn't a new thought but it is a really important one. As I look back on my career, it is clear how much time we (as an industry) spend solving the wrong problem and Bas's comment about not just achieving a goal but taking

Cynthia Unwin

Jan 24 min read

The first rule of Agent Driven AIOps

Photo by Raffaele Parente on Unsplash There are lots of rules for success when it comes to Agent Driven AIOps. Allowing non-deterministic software to take action in your critical environments is high risk. But it's also high reward. So, how do we manage this risk? There are a lot of layers to the answer to this question but let's start with something that is obvious, but is harder than it looks. Software running in a complex system is effected by circumstances external t

Cynthia Unwin

Jan 13 min read

It's really a search problem...

Photo by ün LIU on Unsplash Or more specifically it's a knowledge synthesis problem. Building AI agents or agent teams for AIOps systems isn't hard. Even if you build them from scratch, a bit of python and an API key gets you a piece of non-deterministic software that can legitimately do some cool things. It can even do some smart things. The trick is to get it to do consistently useful things. This is much harder. Lots of things contribute to this from choosing the ri

Cynthia Unwin

Dec 14, 20253 min read

Ants as Agents

Photo by Christian Holzinger on Unsplash Today I learned a new word. Stigmergy. It's a good word. Stigmergy is a form of indirect communication where agents coordinate their actions by modifying their shared environment, leaving traces (like pheromones or digital markers) that influence the subsequent behavior of other agents , creating complex, self-organized systems without central control. I learned this word when I was reading about ants. I was reading about ants b

Cynthia Unwin

Aug 4, 20256 min read

Getting AIOps Right

Photo by Immo Wegmann on Unsplash Several years ago I wrote an article about there no longer being a role for support teams who are just...

Cynthia Unwin

Jan 20, 20254 min read

What Oatmeal Taught me about Software

Photo by Andrea Tummons on Unsplash Recently I had a life changing moment. I was having lunch with a friend and he told me that he...

Cynthia Unwin

Jan 13, 20253 min read

We need to talk about Agile...

Photo by Trnava University on Unsplash I know a lot of companies and teams do an excellent job of Agile development at Enterprise scale....

Cynthia Unwin

Jun 5, 20235 min read

There is no more "keeping the lights on"

Several years ago I came across a presentation by an Architect at IBM named Simon Grieg. It was a presentation that I found when I was...

Cynthia Unwin

Mar 18, 20233 min read

Why Cloud Projects Fail

I talk to teams on a daily basis that explain to me that their solution is different. The normal rules don't apply to them because they...

Cynthia Unwin

Jan 17, 20235 min read

The Search for Broken Feedback Loops

Digital transformation is only partly about technology and in many very real ways technology is the easy part. The real crux of...

Cynthia Unwin

Jan 8, 20236 min read

Why is declarative state so important to the future of operations?

I spend a large amount of time working with teams that are struggling to manage the increase in solution complexity that is affecting all...

Cynthia Unwin

Apr 26, 20223 min read

Time to Think

Last week I went to the office for the first time in more than 2 years. I had to wear shoes. I lived through it though, and coming out...

Cynthia Unwin

Nov 17, 20215 min read

The power of testing your understanding.

I've been waging war against technical debt for the better part of two decades. I've watched good teams drown slowly beneath waves of...

Cynthia Unwin

Jun 26, 20217 min read

What does it take to make technical operations teams resilient?

Over the past year of lock downs and home schooling, of changing roles at work and at home, I have spent a lot of time thinking about...

Cynthia Unwin

Apr 15, 20204 min read

Episode Two: What Makes Mainframes Different?

If you started this journey with me in "Episode One" you will have read about mainframe computers being fast, resilient, secure and...

Cynthia Unwin

Apr 5, 20204 min read

Episode One: Why Mainframes?...

I have spent the last 20 years absorbed by computers, computing, and software and I like to think that I have a broad range of skills and...

Home: Blog2