top of page

alwaysOn

Things I probably should have known...

nasa-Q1p7bh3SHj8-unsplash.jpg
Home: Welcome
Search

What I learned building an agentic ant colony.

Photo by Jorge Coromina  on Unsplash Over the past week or so I have been working on coding an agent ant colony that restores service to a running application. The agents don't do complex RCA, log tickets, interact with engineers etc.. They just keep the application up and running. It was definitely fun, and I learned some interesting and useful things. What I built: A simple python based web ordering application with a front end supported by two micro-services each with

Agents: How do we know they work?

Photo by Dean Pugh  on Unsplash Agentic platforms are everywhere and we are pushing forward to use more and more AI driven software. As Site Reliability Engineers we need to really think about what it means to run diverse agent platforms at scale. We need to think about what needs to be in place to make them manageable. How do we know right now that our agents are working? What do we need to see in the logs to troubleshoot when they don't? What data needs to be gathered a

Framing the Problem

Photo by Gaspar Uhas  on Unsplash Following through on the the fundamental assumption that the key to solving a problem is understanding what that problem is and being able to ask the right questions about that problem to expose how to create a solution, it's time to step back and take a quick review of what it is that we, the AI Enabled SRE community, are talking about when we discuss AIOps. When we look at IT Operations through the lens of how we implement AIOps we deal wi

Asking the right question

Photo by Camylla Battani  on Unsplash "If I had an hour to solve a problem, I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions." Albert Einstein I've been thinking about Bas Pluim's comment from my post the other day. It isn't a new thought but it is a really important one. As I look back on my career, it is clear how much time we (as an industry) spend solving the wrong problem and Bas's comment about not just achieving a goal but taking

The first rule of Agent Driven AIOps

Photo by Raffaele Parente  on Unsplash There are lots of rules for success when it comes to Agent Driven AIOps. Allowing non-deterministic software to take action in your critical environments is high risk. But it's also high reward. So, how do we manage this risk? There are a lot of layers to the answer to this question but let's start with something that is obvious, but is harder than it looks. Software running in a complex system is effected by circumstances external t

It's really a search problem...

Photo by ün LIU  on Unsplash Or more specifically it's a knowledge synthesis problem. Building AI agents or agent teams for AIOps systems isn't hard. Even if you build them from scratch, a bit of python and an API key gets you a piece of non-deterministic software that can legitimately do some cool things. It can even do some smart things. The trick is to get it to do consistently useful things. This is much harder. Lots of things contribute to this from choosing the ri

Ants as Agents

Photo by Christian Holzinger  on Unsplash Today I learned a new word. Stigmergy. It's a good word. Stigmergy is a form of indirect communication where agents coordinate their actions by modifying their shared environment, leaving traces (like pheromones or digital markers) that influence the subsequent behavior of other agents , creating complex, self-organized systems without central control. I learned this word when I was reading about ants. I was reading about ants b

Getting AIOps Right

Photo by Immo Wegmann  on Unsplash Several years ago I wrote an article about there no longer being a role for support teams who are just...

What Oatmeal Taught me about Software

Photo by Andrea Tummons  on Unsplash Recently I had a life changing moment. I was having lunch with a friend and he told me that he...

We need to talk about Agile...

Photo by Trnava University  on Unsplash I know a lot of companies and teams do an excellent job of Agile development at Enterprise scale....

There is no more "keeping the lights on"

Several years ago I came across a presentation by an Architect at IBM named Simon Grieg. It was a presentation that I found when I was...

Why Cloud Projects Fail

I talk to teams on a daily basis that explain to me that their solution is different. The normal rules don't apply to them because they...

The Search for Broken Feedback Loops

Digital transformation is only partly about technology and in many very real ways technology is the easy part. The real crux of...

Time to Think

Last week I went to the office for the first time in more than 2 years. I had to wear shoes. I lived through it though, and coming out...

The power of testing your understanding.

I've been waging war against technical debt for the better part of two decades. I've watched good teams drown slowly beneath waves of...

Episode One: Why Mainframes?...

I have spent the last 20 years absorbed by computers, computing, and software and I like to think that I have a broad range of skills and...

Home: Blog2
  • Facebook
  • Twitter
  • LinkedIn

©2020 by alwaysOn. Proudly created with Wix.com

bottom of page