Zero Defects – Defining Excellence Post WannaCry
Zero Defects – Defining Excellence Post WannaCry
I started out my professional career on a manufacturing scholarship in a FTSE 100 tobacco firm in the early 1990s. Gallaher was a leader in a wave of innovation which ultimately saw the automation of cigarette production. The process replaced floors where thousands (or tens of thousands) of workers manually rolled cigarettes, with a mechanised world where machines humming in the background spat out a thousand cigarettes a minute. The impact on the business was profound, at the time compensation related costs accounted for just 6% of turnover.
The experiences in Gallaher eventually lead me on a journey through Japanese manufacturing principles, then Fordism and mass production. Several underlying engineering themes underpin all of these, including the use of automation to drive productivity gains, one minute changeovers, iterative refinement to drive platform evolution and aiming for “zero” defects.
From my perspective analogies can be drawn between the mechanisation of the cigarette business, or manufacturing as a whole and the wave of evolution now crashing through the Microsoft platform and IT in general. Cybersecurity is the latest driver in an accelerating process, not least as each patch cycle signposts the vulnerabilities in previous releases, a hackers dream.
The WannaCry incident a few months ago highlighted the plight of IT departments across the world. A lot of big names got hit, but when you consider that the majority of smaller incidents were swept under the carpet and the fact that the virus got blackholed quickly prevented a critical mass, it could be inferred that the global Microsoft install base narrowly missed catastrophe.
Speaking to CIOs, CTOs and other IT practitioners post Wanna Cry has left me with the impression that the entire sector has been wrong footed and pretty much everyone is scrambling to catch up.
Most concerning (to me at least) is that the vast majority of firms have yet to adopt a zero defects approach to desktop patching, undoubtedly the primary attack surface.
In conversation with IT professionals over the last few months it seems like most firms haven’t even figured out what the goal is when it comes to desktop patching, yet alone set their sights on 100% compliance. Senior technologists in some of the leading outsourcers and MSPs seem most advanced in their thinking, with clear targets of 95%, 98.5% or 99% coverage. Even here there is still a lot of confusion as legacy dictates that each IT management team ponders which specific updates to apply, a true legacy of an era where firms had whole teams dedicated to deciphering and beta testing patches, often one update at a time, to assess the impact on critical systems, both internal and vendor software.
The overriding question here is whether the sector has the luxury of methodically analysing, directing and beta testing updates one at a time in a new world where each release is eagerly awaited by the cyber criminal. The reality of course is that the Microsoft Windows ecosystem is a rich tapestry of vendor products, and patch cycles break things. Many professionals rightly question the wisdom of applying Microsoft major patch releases before major vendors assess the impact on their platforms. Citrix is a great example, where a number of recent Microsoft releases have triggered somewhat catastrophic issues with key components including VDA agents.
All this leaves many IT managers and CIOs feeling like they’ve been dealt an impossible hand. Predatory cyber criminals are poised to pounce the moment each new vulnerability is announced, so IT departments need to move fast. Moving fast introduces all sorts of risks that in all reality can do even more damage than the Cyber criminal. A key point should not be overlooked. If a firm’s IT platform blows up in a global Cyber attack that impacts thousands of firms, the CIO/CTO has an easy out – everyone else got caught too. Meanwhile, if overzealous patching brings down internal systems her neck is on the block. Layer on top of this panicked executive teams and engineering organisations deeply engaged in glacial legacy practices and you have the perfect storm. It’s impossible to move. If you don’t move you’re dead, if you do move you’re dead, and to make it worse all eyes are on you.
Unsurprisingly, but I’m aware of an increasing number of senior IT managers who’ve made the sensible move and got out of the sector.
What is the solution?
Coming from the process engineering background my view is somewhat simplistic. In my mind, for the desktop in particular, the optimal strategy (if not the only viable strategy) is to aggressively pursue a policy of complete major Microsoft patch compliance, ideally targeting 100% coverage before the start of the next patch cycle (or zero deviation). Also no cherry picking. Yes things will break from time to time, but it has to be said that recently Microsoft’s QA process seems way tighter than it used to be and we have only seen a handful of issues since we adopted the approach.
CIOs/CTOs in other MSPs we spoke to when we were formulating this objective thought it was madness. Any 100% baseline in desktop land is tricky to begin with, there is always a sales person’s laptop or conference room PC that someone has forgotten about.
Beyond that, while we don’t run the biggest desktop estate in the world, it is very complex, with over 100 customers and 250 separate offices spread across 25 countries. Many of the customers have neither the budget nor the inclination for a major hardware refresh, so there is some very exotic kit, and pretty much all of them are power Excel users with lots of proprietary vendor or in-house software systems (including plug-ins). Many of the power users are plugged into the markets 24 hours a day through the week so have an innate resistance to change windows or reboots, with the quants and other true power users online over weekends. Oh did I say, traders have extreme intolerance to their systems being down in market hours (one for a separate blog post). All in all an IT managers’ worst nightmare.
The retort of course is that picking clear, simple and measurable objectives to pursue is a cornerstone of great engineering, forcing the process engineer to challenge all the assumptions.