Skip to main content

Digital Business Operations : Fire Your CMDB

As outlined in a prior post, Digital Business operations require new thinking and new technologies. As Operations evolves to meet the needs of Digitalization, so too must the core systems of record for operations. When running a data center with physical assets, or even user assets such as laptops, printers, and desktops the CMDB was a useful construct to understand what you had, things were static, and thus the problem was more easily solved. In reality, almost no one had an accurate CMDB, they most often hovered around 80% coverage, based on the beliefs of staff, and often were driven by a combination of automated and manual processes. The use cases for the CMDB are often tied to ITSM processes such as request, incident, problem, and change management. By having good data capture to record asset and component ownership and configuration it made these processes more robust, reliable, and accurate.

By using discovery tools which crawled technologies or leveraged the network for data discovery was accurate, leading to a well maintained CMDB. In my personal experience, I found the network approach to be great (I was a very early nLayers customer at Thomson Reuters), but it had challenges around packet capture and aggregation. These are challenges today with any packet based collection and are made significantly worse in public cloud environments, which were not an issue last decade.

When virtualization entered the fold, the number of workloads increased and became more dynamic, but it wasn’t a major problem for the existing systems to handle this change, aside from an increase in scale. As applications evolved and configuration moved from being defined within the application server configuration, for example, database connections, connection pools, message queues, Memcache systems, and other components. Those have transitioned to being defined in the runtimes, and discovery tools increasingly have issues collecting configuration information. Most enterprises have dozens of configuration management tools and automation stacks. Ranging from legacy solutions provided by BMC, HP, IBM, CA, and others, but these teams have also added new-ish opens source such as Chef, Puppet, Ansible, Salt, and more. Today these teams are looking at orchestrating infrastructure and creating new layers, and examples include the open source project terraform. The reason for the fragmentation in configuration management is evolving applications and associated infrastructure, which also rely and depend on older applications, running on classical infrastructure (ex: VMWare, Tibco, and Mainframe). Over time the stack becomes complicated and costly. None of the new players in this space seem to support legacy technologies, and the legacy vendor solutions make it cumbersome to support modern architectures. The result is that we have a mess, and no path forward to reduce debt.

In today’s world with a high degree of self-service (public and private cloud including SaaS), containers, and orchestration these discovery tools do not work. The processes which consist of an ITSM strategy, often underpinned by ITIL (more on this later) no longer function in an efficient and highly automated system  Finally, building dependency maps and graphs are no longer possible or feasible via a centralized repository. Adding technology support to a non-functional process would not fix the problem. For example, CMDB discovery tools which add Docker support or attempt to handle Kubernetes or Swarm are missing the point and lack the capability to collecting data from ephemeral systems. The technology is not the only thing which has changed, but also the desired business outcomes. The net result is agility is paramount, implemented by cross functional product engineering teams. That shift requires a culture change within IT, and yesterday's solutions do not support these initiatives.

The business demands rapid innovation via incremental but continuous improvement, which results in a high frequency of change across infrastructure (physical and logical), applications, and environments. Discovering and controlling these systems is a challenge from both a security and audit perspective, but also from a service assurance perspective. Decentralized IT organizations driven by the business need to move quickly, and experiment and the mandate of innovation often contradict centralized IT organizations methodologies. Technology which relies upon access to systems that are often outside of the area of control within IT is an ever growing challenge. These issues require us to shift data collection from an approach of crawling and cataloging data towards instrumentation (scraping web services is fine if your data is lightweight, but typically depth is not captured in these exposed API endpoints). The approach of instrumentation provides a more accurate understanding of dependencies and user experience, and a dynamic way to understand relationships between physical and logical components, and allow us to create new use cases for this data to solve some of the problems the CMDB was designed to address. The next generation of  CMDB will be dynamically modeled. I have seen sprinklings of this in a couple of startups, but they are missing the right type of instrumentation to provide real-time capabilities. Events, logs, and metrics are not enough, ingesting topological models is not sufficient, this must be instrumentation across application and network.

My next post on this subject will focus on ITSM, which today is disjointed with both customer support and development. I am a strong believer in fixing these issues the right way with teamwork, and I’m looking forward to sharing my thoughts on this topic.


Popular posts from this blog

Misunderstanding "Open Tracing" for the Enterprise

When first hearing of the OpenTracing project in 2016 there was excitement, finally an open standard for tracing. First, what is a trace? A trace is following a transaction from different services to build an end to end picture. The latency of each transaction segment is captured to determine which is slow, or causing performance issues. The trace may also include metadata such as metrics and logs, more on that later. Great, so if this is open this will solve all interoperability issues we have, and allow me to use multiple APM and tracing tools at once? It will help avoid vendor or project lock-in, unlock cloud services which are opaque or invisible? Nope! Why not? Today there are so many different implementations of tracing providing end to end transaction monitoring, and the reason why is that each project or vendor has different capabilities and use cases for the traces. Most tool users don't need to know the implementation details, but when manually instrumenting wi

NPM is Broken

As someone who bought and implemented NPM solutions, covered them as an analyst, and now watches the industry, one cannot help but notice that NPM(D) is broken. According to Gartner themselves, the data center is rapidly changing, the data center is going away, m aybe not as quickly as Capp states, but it’s happening. This is apparent by the massive public cloud growth posted by Amazon, Microsoft, and Google in their infrastructure businesses. This means that traditional appliance-based NPMD offerings will not work, nor will traditional ways of collecting packet data. Many of the flow offerings do not handle the new types of flows which these services generate, but most importantly they do not understand the internet, which is the most important part of assuring services in cloud hosted environments. The network itself is not just moving to overlay a-la NSX and ACI, it's moving inside of orchestrated containers, and new proxy/load balancing systems typically built off component

F5 Persistence and my 6 week battle with support

We've been having issues with persistence on our F5's since we launched our new product. We have tried many different ways of trying to get our clients to stick on a server. Of course the first step was using a standard cookie persistence which the F5 was injecting. All of our products which use SSL is being terminated on the F5, which makes cookie work fine even for SSL traffic. After we started seeing clients going to many servers, we figured it would be safe to use a JSESSIONID cookie which is a standard Java application server cookie that is always unique per session. We implemented the following Irule (slightly modified in order to get more logging): (registration is free) when HTTP_REQUEST { # Check if there is a JSESSIONID cookie if {[HTTP::cookie "JSESSIONID"] ne ""}{ # Persist off of the cookie value with a timeout of 2 hours (7200 seconds) p