How to build useful DevOps Platforms?

You are currently viewing How to build useful DevOps Platforms?

DevOps is complex. It does not come with a prescription, installation guide, user’s manual, or maintenance manual. It doesn’t even have a standard definition! How can anyone expect to get good results with DevOps without clear and definitive guidance?

Marc Hornbeek, “Engineering DevOps: From Chaos to Continuous Improvement… and Beyond”

DevOps! DevOps everywhere! My goodness, we are doomed. Ok, I’m kidding. Or not? Maybe I should be worried instead – because if you ask “what the DevOps actually is?”, you will find many different answers. What worries me more is that many people in IT believes that implementing DevOps is about deploying bunch of tools like Kubernetes, Jenkins, ElasticStack and Prometheus – and this is it. Well… it’s not. Integrated DevOps Tools is just part of DevOps Platform. It is not enough just to deploy them and wait for the results. Even more – Platform can bring you more harm if you don’t focus on the culture, surroundings & actual needs of Delivery Teams. Why? Well, let’s first conclude what the DevOps Platform is – especially when it is skyrocketing in trends for two years.

“DevOps Platform” search trend in Google.

What the DevOps Platform is?

Kubernetes. No, kidding. KIDDING! And yet, I’ve met many engineers and managers who thinks like that. So let’s be clear on the very definition – Platform is not only the technology, but also the methods you are using it and human power to make it useful and usable. In short, we can talk about three pillars of DevOps Platform – and technology is only one of them.

DevOps Platform Components

A good platform provides standards, templates, APIs, and well-proven best practices for Dev teams to use to innovate rapidly and effectively. A good platform should make it easy for Dev teams to do the right things in the right way for the organization (…) This drive to “simplify the developers’ life” (as Conway Law puts it) and reduce cognitive load is an essential aspect of a good platform (…) A good platform will also serve to reduce the need for security and audit teams to spend time with the Dev teams.

Matthew Skelton, Manuel Pais, “Team Topologies”

Now it is time to cover what DevOps Platforms can do. And they can do A LOT – probably much more than you actually need (like Excel – you probably use only 20% of functionalities). In short – DevOps Platform is a groundwork, on which Business Applications Delivery Teams can design, build, deploy and run their applications, without a lot of headache how this groundwork works. So a well-designed Platform is not only a technology – but most importantly Services provided by DevOps Engineers (Platform Team) to remove technology burden related to infrastructure, deployment methods, security and resources.

It is not even a full potential of DevOps Platform – so do not try to implement it step by step, but choose wisely what you actually need! Besides – I haven’t found anywhere DevOps Platform Capability Map – maybe I was the first one who tried to do it? Let me know if you find it!

As you can see, DevOps Platforms are quite a big topic, and now you probably feel why I told you that it can bring more harm than good if not addressed correctly – but let’s conclude it and even more:

  • First, DevOps tools have a significant technology entry barrier. We can think about them as “backends for backends” – so configuration of those tools require not only programming skills, but also infrastructure, security, containerization, deployment, understanding how logging & tracing work in backend applications – all topics related to enterprise software delivery! It’s not a skill you will posses in one weekend. What is even worse – the tools are changing rapidly. The ones which we consider an innovation few years ago (like Jenkins, Elasticsearch or Kubernetes) are now standards. Just look on trends when DevOps Platforms started to be googled worldwide – not a long time ago! So many technologies are still within R&D phase and you won’t find DevOps Engineers with more than 10 years experience to fix all early-stage issues. Such people don’t exist yet – so you need to carefully select tools for well-recognised purposes.
  • Second – even if you create bug-free (not possible) DevOps Platform, you need to introduce is to your Business Applications Delivery Teams. If you will treat it purely technically (“look! we have created a Kubernetes Cluster with ArgoCD, now if you implement CI pipeline in Github Actions, you will be able to push your applications to environment!”) – what you will do is throw the full cognitive load of how platform works and how to use it on them. Besides technology, you also need to think about how are you going to serve your Customer. Yes, being a DevOps Platform Owner, Business Application Delivery Team is your Customer! What you need to do is define services and operating model which will simplify using the power of your Platform without a lot of additional learning from programmers – cause their focus should be on Business Features delivery. You don’t require a driver to know how the power steering gear works – but how to influence where the car is going with steering wheel movements.
  • Third – if you don’t do the right discovery of needs first, you can end up with a large, complicated Platform combined with many different technologies. And trust me, if you don’t think about the right scale and focus only on “all options Platforms can provide we will implement” – you are going to end-up with never-ending delivery project. Just look at the size of DevOps Platform Capabilities Map. It took me few hours just do describe how wide this topic is.

First – do not start with the Platform! Define it’s purpose first!

When you deliver the Business Application, you don’t start with playing with Java. First – you need to understand what your Business wants to achieve with the new app – how it can simplify their life or make more money (or both). In short – you start with Purpose and Business Scope of the Product.

The same approach I suggest to play with DevOps Platform. DevOps Platforms are designed to simplify Delivery Teams’ lives – so first, you must discover what makes their lives difficult. There are many techniques to do it (many of them I have described in Minimum Valuable Product article), but in order to gather the requirements and value delivery opportunities – my personal favourite is Value Stream Mapping, enhanced by technology & architecture review. It is a simple technique to map the delivery process as-it-really-is, which supports to discover the bold truth and think about possible bottlenecks or remedies the Platform can introduce. Technique is well defined in “Value Stream Mapping” book by Karen Martin and Mike Osterling, enhanced with technology and culture perspective in “Project to Product” by Mik Kersten.

Value Stream Mapping is a good technique to find bottlenecks & automation possibilities within Delivery Process – spaces, where DevOps Platform Team can help!

With Value Stream Mapping, DevOps Platform Owner can act as a Facilitator – but it is also a good idea to hire an external Consultant to be that person – because Value Stream Mapping is not a tool designed only for discovering DevOps Platforms functionalities idea – but also as a good starting point for DevOps Transformation Program. Value Stream Mapping is “hard to trick” – so you will not only discover technology or practice-related challenges, but you will also see the organizational culture and it’s true impact on how Delivery Teams are cooperating.

In my practice, I also like to dig into architecture details & tools used for delivery practice. I often ask about deployment pipeline & practice details, branching strategies, coordination between closely-coupled components testing & deployment, security tools and practices and key stakeholders. It helps me to understand Value Stream better and very often find a potential DevOps solution for bottlenecks or ways to speed-up delivery.

Remember – no matter which technique you use – you need to play it with as many Delivery Teams as possible, to catch most-common challenges within software delivery in your organization. If you stuck only with one or few teams – you can be mislead about the actual bottlenecks or showstoppers, or worse – build a Platform only for one Team, which may disrupt others.

The platform, therefore, needs a roadmap curated by product-management practitioners, possibly co-created but al least influenced by the needs of users (Dev teams).

Matthew Skelton, Manuel Pais, “Team Topologies”

Think about Operating Model, Scaling & Development Strategy

After Value Stream Mapping and discussions with your future Customers & Stakeholders (Delivery Teams and their Leaders) – you will probably be able to assume the scale of Delivery Organization you are about to serve – and what are the challenges you need to address first with your Platform. Now it is a good time to think about how your Platform and your Team will interact with other Teams in your organization.

According to Team Topologies, the preferable interaction mode for Platforms is X-as-a-Service. It means a simple contract of services your Team is deliverying to others, with well-structured, repeatable methodology. You can use Team APIs pattern to define it (more about Team API here). But – this is not enough – Team API should drive a definition of Operating Model, which contains not only the Services, but also architecture boundaries, service level objectives, interaction modes (besides those defined as x-as-a-service) responsibilities and deliverables between Teams (not only the Platform one!). Operating Model is not only a “paper documentation” – because it impacts the permissions, roles & ownership around particular components and their configuration in technical implementation of the Platform. If you are an experienced Engineer – you will be able to “guess” operating model just by looking at repositories, pull requests, access levels and roles within the tools and structure of cluster namespaces.

DevOps Platform Team should aim to provide well-defined services under agreed SLA*/SLO for Business Application Delivery Teams – and Operating Model is a first step to define responsibilities and draw the boundaries between Teams. This picture is under inspiration of Team Topologies book, which I highly recommend to read!

When you have covered the Operating Model topic – think about Development Strategy and Scale of the Product. You can do it on “Epic” level by prioritising, which capabilities of DevOps Platform needs to be rapidly delivered for Delivery Teams (based on Value Stream Mapping exercise) and which Services you are going to provide needs immediate automation (based on your organization needs scale). Define at least a high-level roadmap of platform development for a time period you can predict (it depends on the scale of initiative and organization). Many Agile Practitioners may argue that roadmap planning ahead is not a way to go – but at least a high-level idea what do you want to achieve with your Platform will help to prioritize its’ delivery and setting up services. And most importantly – to size your Team – because hiring will be a challenge.

Hire the best people – and make them a Team

This is what happens when you google DevOps Engineer. Yes – everybody is looking for them now. It will be a challenge for you as well.

“DevOps Engineer” search trend in Google

Even more difficult is to hire a good DevOps Engineer – trust me. DevOps Engineering is a complex profession, which requires a knowledge of all aspects of Software Delivery. Personally I like to work with DevOps Platform Teams combined with DevOpses with different origins: software development area, cloud/infrastructure area and administration & operations. Such combination is a good start for a great DevOps Platform Team.

This is how I see brilliant DevOps Engineers 🙂

The most difficult part in this hiring exercise will be to make those individuals a Team. So, what should you do? Hire the best Platform Owner and the best Scrum-Master (or Platform Manager if you will not deliver the Platform in Scrum). I don’t see any other options here. Trust me – it does not matter if you will hire DevOps Engineers directly, or by the outsourcing or body-leasing – the problem of Team Formation will be the same, and the mature, responsible Team is a key success factor for delivering DevOps Platform. DevOpses knows their value on the market – and what can keep them to work on your Product is its innovation, its impact on the organization and well-being as Team Member. Innovation you can handle with technology and space for R&D. The rest only a pair Product Owner + Scrum Master can provide.

Define customer-centric DevOps Platform backlog

The platform team’s main clientele is the product teams.

Kenichi Shibata, “How to build a Platform Team Now!”

Alike applications supporting Business Value streams, Platforms backlogs needs to be Customer-oriented. Why? There are few reasons:

First reason – in my opinion the most important one – is that structure of the backlog shapes the way Platform Team thinks about the essence of their work. Let me explain it by example of User Story:

  • A story in Jira was called “Implement Grafana Dashboards”. In the description of this story was a sentence: “Using Terraform Scripts, implement dashboard in Grafana which presents CPU usage of Virtual Machines”. What is the first noun you see in this story? Grafana. What are others? Dashboard, Script, Machine. So what is the essence of delivering this story? Architecture Components and used Technology. For whom we do it, why, how is it going to be useful? Not important.
  • How would I rephrase this story? “As Operator, I’d like to see CPU usage of machine my application is running in time intervals, so I can analyse long-term trends and proactively monitor my application”. What is the first noun you see in this story? Operator. My Customer! This story is also more challenging and exciting for the DevOps Engineer – our expert can now think about how to implement it, which technology to use, how to design the Dashboard so it can be useful for the Operator. We have provided this DevOps a context of ones delivery! On technology level – it will probably be a terraform script of Grafana Dashboard configuration, being deployable by some pipeline – so technically, it will be the same as first story. But on the mindset level (and backlog understanding by guests or stakeholders, when they look at it in our Jira) – it will not be some nerdy idea of some magical Grafana, but actual need addressing of Delivery Teams.

Second reason – possible capabilities of DevOps Platforms are enormous. If you get the DevOps Platform Capability map, and create stories out of it, you are going to end-up with a Product to be released in 2-3 years. What should Delivery Team do at this time – wait? Fight topic not related to business value deliver? Such a waste! There is a remedy for it, which is a customer-oriented backlog. You don’t need to implement ALL possible capabilities of the Platform – you need a platform which address your Delivery Teams real needs. The relation between Value Stream Map (or any other technique you used for analysis) and your backlog will be easier to track with sentences related not to technology, but to remedies you have concluded during your workshops with Delivery Teams. Simply – you won’t over-engineer your Platform if the backlog will always focus on the Developers needs.

Third reason – if you create customer-oriented backlog, you will most probably deliver the Platform faster. Why? Backlog oriented on the business features has complete (or almost-complete) stories from functional perspective. You know exactly when you can deploy useful backlog item – once it is completed! With technology-oriented backlog, you can’t be sure that if you deploy it, it will be fully usable for users. Let’s look at some example:

  • A story in Jira was called “Set-up Jenkins pipelines for using Sonarqube”. In description of this story was a sentence: “All pipelines in Jenkins needs to be able to run unit tests underlined within SonarQube test coverage check within Build Pipeline”. This story means that all tests in all pipelines available needs to be part of a pipeline. Sounds cool – but this is a big story! In Jenkins, we have also platform pipelines. This story indicates that deployment of platform components also needs to be covered within requirement implementation. Of course all of those mentioned functionalities are important, and at some point needs to be implemented in order to call our Platform Mature. But do they need to be implemented NOW?
  • I would decompose this story to smaller ones, with a customer need on the first place. Our programmers on Value Stream Mapping reported that they need to manually run unit tests to make sure that everything will go smoothly before they hit deploy button on Jenkins. Let’s remove this burden from them by allowing the pipeline to do it every time they build the app! So one of the stories will be “As Programmer, I’d like to have unit tests automatically run for my application before deployment to DEV Environment”. Other stories will relate to other types of tests and there will be stories related to actions our DevOps Engineers like to play before deployment of Platform Components. Such decomposed stories, oriented on actual, high-priority need can be deployed independently – and faster than those technology-oriented ones.
Technology-oriented backlog is a path to the Dark Side

Build the TVP and validate Platform Usefulness

In all cases, we should aim for thinnest viable platform (TVP) and avoid letting the platform dominate the discourse. As Allan Kelly says, “software developers love building platforms and, without significant product management input, will create a bigger platform that needed”. A TVP is a careful balance between keeping the platform small and ensuring that the platform is helping to accelerate and simplify software delivery for teams building on the platform.

Matthew Skelton, Manuel Pais, “Team Topologies”

You have a Team, a backlog, idea on architecture and tools to be used – not it is time to deliver it! What can go wrong within this step? Planning, Release Frequency and Feedback monitoring.

I have experienced some successful Platforms Deliveries. One big organization started with pre-configured product, which then had been adjusted to outcomes of analysis within Delivery Teams. The platform was big from the very beginning – but it had been used by more than ten business domains, with big number of engineers building solutions upon it. Other much smaller organization decided to integrate few deployment automation technologies and address their deployment bottleneck in Delivery Teams first. These organizations had two things in common: both Platforms initial releases addressed the high-priority needs first, being Thinnest Viable from the very beginning; and both Platforms had been delivered within few months. Also – both organizations make some feedback gathering after initial releases, and planned next steps based on real data.

Let’s start with Feedback Monitoring: Value Stream Mapping is a great tool not only to gather requirements, but also to monitor the outcomes. Some time after initial release (three weeks, month, two months – depending on how often Delivery Teams releases or how heavy operations they are running) it is viable to repeat the VMS exercise, just to see if Platform Tools & Services actually helped the initial situation. It will also be a booster for the next priorities of Platform delivery. Yes – it takes some time to wait for the outcomes – this is why Planning & Release Frequency is so crucial.

Even the well-shaped backlog of customer-oriented stories is for nothing if the outcomes of Team work is not being released to customers. My suggestion is to make a first release of small number of functionalities as fast as possible. You can fail at the beginning with helping the situation of Delivery Team – but it will still be better than a big release after a long time of waiting. Frequent releases will also uncover some DevOps Culture topics in your organization – by this we can say that DevOps Platforms can really boost the whole Transformation!

Frequent Releases and rapid feedback is not possible without the right Planning. It is not only a headache for Platform Owner – who should be supported here with the DevOps Platform Team. Platform Owner job is very difficult – because this person will not code the Platform a lot, loosing this hands-on experience with DevOps Tools (and you just need a few months gap with rapidly changing DevOps tools to risk a missing awareness of some technology possibilities and limitations). This is why I’ve created a dedicated paragraph for Team Formation – you really need the best people.

Boost Platform Maturity – and experiment!

Crucially, the evolution of the platform “product” is not simply driven by feature requests from Dev teams; instead, it is curated and carefully shaped to meet their needs in longer term.

Matthew Skelton, Manuel Pais, “Team Topologies”

Platforms are not a short-term Sprints – they are a long-run Marathons. As I see it, this long-term is being driven by three factors:

  • Uncovering the organization potential: Within feedback gathering (using VSM or any other techniques) at some time you will see that the Value Stream is not only improving – but also changing in it’s structure. DevOps Platform, combined with Agile Culture & DevOps Transformation will uncover the full organization potential of Software Delivery – which will bring new challenges and topics to be addressed by more and more mature Platform. Your Platform architecture, Platform services and your team will evolve. Platform is not something you introduce ones – however, after some time you will notice that there will be more operations automation than new features delivery – especially when the number of Delivery Teams using your Platform will increase. Uncovering the organization potential will eventually lead to Platform scaling.
  • Scale: Successful Delivery Team, which increased delivery potential thanks to Platform Services is going to attract other Teams to use the Platform. Be ready for this from the very beginning – by measuring efficiency and predictability of Platform Services to plan the automation. Scaling the Platform is not only related to technology (with Cloud – this part should be easy) – but mostly on capacity of Platform Services. Even most successful Platform can became a bottleneck if we don’t fulfill the demand from Delivery Teams.
  • Technology: As I’ve already mentioned many times – DevOps Tools are changing fast. Many of those changes brings some early-stage technology burden – but even more brings new possibilities for Platforms features, Operating Models and Services. Terraform brought deployment simplification and almost endless possibilities with infrastructure-as-a-code approach. Github introduced Actions for building pipelines without a need to configuring dedicated toolset. ElasticStack integrated applications observability tools into one package. Even such big platforms not originally designed as DevOps Platform like MuleSoft goes into this direction – to simplify deployment & operations. What you need is to be up to date. This is why DevOps Platform Team needs to have some dedicated time for technology R&D and migrations – to unblock more possibilities while your organization growths the maturity and scale of software delivery.

And speaking about R&D…

No! Try not! Do! Or do not. There is no try.

Yoda, “Star Wars Episode V: The Empire Strikes Back”

Experimentation is crucial in long term for your Platform to be successful. Your Team needs to have a dedicated time for it. However, experimentation also needs to be justified and have the clear goal. Playing with tool just for fun (trying) is not enough – we need to know what will be the outcomes of those actions. You need to do the experiment for a reason – like checking the entry barrier and possible simplifications the new AWS Service can bring compared to self-provisioning, validating options to increase the performance of deployment with new CI/CD pipeline, integration with Active Directory possibilities within Opensearch compared to currently used open-source version of Elasticsearch etc. In short – R&D should answer questions, not just produce empty lines of code. In Team Topologies you can even find a dedicated Team type for such actions – Enabling Team. Especially with a lot of demand from Delivery Teams, it is worth to run such dedicated Enabling Team for bigger R&D processes within DevOps tools and practices. Smaller topics I would still keep in DevOps Platform Team.