Platform Engineering is easy, right? You just automate Kubernetes Clusters creation, deploy Backstage to trigger it, set-up ELK for monitoring – and that’s it, right? Well, this sentence is just as right as “building Business Applications is easy – you just need to create docker file with Spring Boot application there”. If you look only for a technical perspective – maybe both sentences are correct. But it’s like describing building the car with “oh, you just need to bend the sheet metal appropriately and insert the pistons” without even asking… where this car will be used? On a desert? On the F1 race?
I tell you that: Platform Engineering is not a technical initiative, focused on automating tools. Platform Engineering is a method to organise core IT capabilities to work as a service – and you won’t do it just with “automating Openshift”.
But the good news is – building a Platform is no different than building any other Software Solution. Internal Developer Platform is also serving business functionalities for business users, supporting the business processes in the Company. In that case – our users are Application Teams; our business process is SDLC – and our functionalities are core IT capabilities to run and operate business apps. And just like each Software Delivery, we should start Internal Developer Platform initiative with… functional & system analysis.
Internal Developer Platform Analysis Framework
Before we even start the analysis, let’s quickly clarify why we practice Platform Engineering.
The goal is a frictionless, self-service developer experience that offers the right capabilities to enable developers and others to produce valuable software with as little overhead as possible. The platform should increase developer productivity, along with reducing the cognitive load.
Source: Gartner
And… what is a cognitive load? In short, you can say that this is everything that developers need to do in order to deliver business value – not related to this value itself. Cognitive Load can be managing clusters, Cognitive Load can be creating documentation, Cognitive Load may be fulfilling some regulation-required tasks like reporting software to validation authority. Cognitive Load is definitely not writing Business Logic in Application and then performing Business Operations in Production – this is what we pay developers to do, and this should be their primary focus. We do not pay developers to fight or play with technology.
Our primary goal as Platform Team is reducing Cognitive Load of our Application Teams in software delivery lifecycle & boosting their productivity by reducing technology & repeatable, core-IT related tasks, like: network connectivity integration, managing runtime, managing CI/CD toolset, managing observability, security, cloud services etc – and instead, lower the entry-barrier & provide core-IT functionalities as a Service. Tricky part is – what is a Cognitive Load for one Company, may not be one for the other. So we need to detect Cognitive Load first, before we even start with the Platform. Then, we need to decide how the Platform & Platform Team should work (designing Operating Model & Services) – and only then, select the technology which will match the requirements and do not make a revolution… causing more Cognitive Load.
How to assess Cognitive Load?
For starters, the easiest part is… asking developers what are their struggles. The first few ideas which are being repeated by the Teams you interview, are probably causing 80% of the troubles. And just implementing 20% of initial ideas for your Internal Developer Platform may be enough to solve it and have a quick win. Sometimes you don’t even need a fancy portal…
Ok, I’ve mentioned interviews – this is one of the technique, but it can be time-consuming. What I recommend here is to combine some SDLC process mapping with self-assessments, interviews &… current practices and technology review. Why? Because you don’t want to create a revolution and a lot of migrations with your Platform, which will be.. another cognitive load for the developers. A nice self-assessments you can find here, coming from Mark Hornbeek.
Detailed review of techniques to use & how to use them with supporting materials – to analyse processes, practices & technology; how to approach analysis in a large enterprise and what should be the result of such analysis in terms of building Internal Developer Platform – you will find in my Efficient Platform Manager Course, Module 3, Lesson 2.
What is Platform Operating Model, what are the Services, why we need it – and how to design it?
First, let us quickly have a common understanding of the terms used in this article. Operating Model in IT is a wide term, determining how do we approach SDLC in our Company in terms of governance, sourcing, processes & organisational structure. In short we can tell, that IT Operating Model is a definition of organising IT Teams daily work. We can define it for a Platform Team as well, so in that case Platform Operating Model defines what the Platform Team is responsible for, how it’s structure, what architecture components & tasks are in their responsibility or accountability – and what Services Platform Team provides to the others. Services have well-defined inputs and outputs, strictly defined execution time, end products & method on how and who can trigger their fulfilment.
Platform Operating Model Design
Platform Operating Model design should be a result of Cognitive Load recognition – especially the overall maturity of the Stream-aligned Teams and the technologies they are already using. Just like business processing is unique in the Company compared to the competitors, we are going to have different Platform Operating Models:
- We are going to have a different model for a startup building AI-based apps: where Teams may require own clusters & wider permissions to CI/CD and Observability Tools, just to validate if the model training is going in the right direction – in this case, Platform Team will only provide cluster templates & runtime;
- We are going to have different model for a large enterprise, doing classic applications delivery – where Teams may not want to interfere into runtime, CI/CD, observability tools & secret management, because it’s standardised already and fitting most of the applications – in this case, Platform Team will be responsible for everything which is not writing business application code, testing it & operating it in production.
Platform Operating Model we can describe with a RACI Matrix, concluding all SDLC-related tasks & jobs to be performed – and including all Teams we are planning to serve as a Platform (or – type of Teams). Such RACI Matrix should be an analytical input for designing permission model and implementing RBACs in DevSecOps Tools managed by the Platform.
Materials for Operating Model design, with ready to fill RACI Matrix for Internal Developer Platform you will find in my Efficient Platform Manager Course, Module 3, Lesson 3.
Services Design
Platform Services should directly address Cognitive Load of the Teams – and match the planned Platform responsibilities. In short, each service should have defined:
- Input – easy to provide by Developers, with the information they already possess & in a language they understand (for example, being a Platform Engineer do not ask Developers to provide IP of the system they want a network connection with – maybe it’s better to simply ask the name of the system / API & just find it by yourself?)
- Output – well defined result of the Service provisioning (for example, if the Platform service is New Application Onboarding – the result will be a link to Github on where Developer can access, a link to demo application exposed in each environment, ideally up to production – and instruction on how to trigger the deployment)
- SLA/SLO – well designed Services should have metrics & KPIs to verify, if they are working correctly – and if they are provisioning expected value. Execution Time is one of them, but Platform Team should also have metrics related to business applications they serve, hence the goal of Platform Engineering is to boost productivity of the Stream-aligned Teams.
Services definition should be public. You can use Team API Template from Team Topologies.
Example of Services, with template to design them & SLA metrics – how to measure Platform Success – you will find in my Efficient Platform Manager Course, Module 3, Lesson 4.
How to select Technology?
Revolution or Evolution? However I personally prefer evolution, there is no golden rule in terms of introducing the Platform Tech. All decisions you make about Platform Technology should be filtered with results of Cognitive Load discovery, fit to Platform Operating Model & Services – and the technology portfolio you currently have in the organisation. Also, the level of automation and SDLC coverage with the toolset should be a Platform business-case driven: there will be a need for almost full coverage and automation for a large enterprise; and maybe just a simple orchestration is needed for a company already having DevSecOps tools, without a lot of stream-aligned Teams. Sometimes here you may decide that actually no Platform is required, just a DevSecOps Engineers in the cross-functional Teams; or SRE Engineers supporting business-critical applications.
The build vs buy dilemma will also appear here – just like another, Portal first vs Platform first. Again, there is no single rule here as well, and technology selection should be based on mix of analysis outcomes, Platform Strategy, number of stream-aligned Teams, current technology portfolio & tech possibilities to implement Services and RBAC for each selected Platform Tool.
I personally have my own “algorithm” for build vs buy dilemma for Platform Tools, and already seen how Platform Services & Operating Model are defining Platform Architecture. All this I’ve placed in my my Efficient Platform Manager Course, Module 3, Lesson 5.
Do I really need all of that…? Platform Pillars & what happens if you skip them.
First, let’s start with the successful Platform Pillars. I see three necessary elements here to even speak about Internal Developer Platform:
- You need a Platform Team – so developers have one place, with one responsible party – to be provided with everything they need to build & operate business applications;
- You need Operating Model – so everyone knows, who is responsible for what – and what are the Company Standards which Platform supports – and what Services to expect from the Platform;
- You need SDLC coverage with automated tools. Automation is necessary to have the configuration orchestration within the tools, which is repeatable & fast. Otherwise, we risk human errors and bottlenecks leading to shadow-DevOps in stream-aligned Teams, which will be their Cognitive Load.
Here is a video with detailed explanation of what will happen if we skip any of those parts. This is also a fragment of my “Efficient Platform Manager” Course, Chapter 3, Lesson 1. The course is designed to help you design & manage Internal Developer Platforms – providing the techniques & frameworks to analyse cognitive load, plan Operating Model & Services, build a strong business case – and avoid the mistakes I’ve made being a Platform Manager for few Internal Developer Platforms since 2020.