Skip to main content

Digital Business Operations : Fire Your CMDB

As outlined in a prior post, Digital Business operations require new thinking and new technologies. As Operations evolves to meet the needs of Digitalization, so too must the core systems of record for operations. When running a data center with physical assets, or even user assets such as laptops, printers, and desktops the CMDB was a useful construct to understand what you had, things were static, and thus the problem was more easily solved. In reality, almost no one had an accurate CMDB, they most often hovered around 80% coverage, based on the beliefs of staff, and often were driven by a combination of automated and manual processes. The use cases for the CMDB are often tied to ITSM processes such as request, incident, problem, and change management. By having good data capture to record asset and component ownership and configuration it made these processes more robust, reliable, and accurate.

By using discovery tools which crawled technologies or leveraged the network for data discovery was accurate, leading to a well maintained CMDB. In my personal experience, I found the network approach to be great (I was a very early nLayers customer at Thomson Reuters), but it had challenges around packet capture and aggregation. These are challenges today with any packet based collection and are made significantly worse in public cloud environments, which were not an issue last decade.

When virtualization entered the fold, the number of workloads increased and became more dynamic, but it wasn’t a major problem for the existing systems to handle this change, aside from an increase in scale. As applications evolved and configuration moved from being defined within the application server configuration, for example, database connections, connection pools, message queues, Memcache systems, and other components. Those have transitioned to being defined in the runtimes, and discovery tools increasingly have issues collecting configuration information. Most enterprises have dozens of configuration management tools and automation stacks. Ranging from legacy solutions provided by BMC, HP, IBM, CA, and others, but these teams have also added new-ish opens source such as Chef, Puppet, Ansible, Salt, and more. Today these teams are looking at orchestrating infrastructure and creating new layers, and examples include the open source project terraform. The reason for the fragmentation in configuration management is evolving applications and associated infrastructure, which also rely and depend on older applications, running on classical infrastructure (ex: VMWare, Tibco, and Mainframe). Over time the stack becomes complicated and costly. None of the new players in this space seem to support legacy technologies, and the legacy vendor solutions make it cumbersome to support modern architectures. The result is that we have a mess, and no path forward to reduce debt.

In today’s world with a high degree of self-service (public and private cloud including SaaS), containers, and orchestration these discovery tools do not work. The processes which consist of an ITSM strategy, often underpinned by ITIL (more on this later) no longer function in an efficient and highly automated system  Finally, building dependency maps and graphs are no longer possible or feasible via a centralized repository. Adding technology support to a non-functional process would not fix the problem. For example, CMDB discovery tools which add Docker support or attempt to handle Kubernetes or Swarm are missing the point and lack the capability to collecting data from ephemeral systems. The technology is not the only thing which has changed, but also the desired business outcomes. The net result is agility is paramount, implemented by cross functional product engineering teams. That shift requires a culture change within IT, and yesterday's solutions do not support these initiatives.

The business demands rapid innovation via incremental but continuous improvement, which results in a high frequency of change across infrastructure (physical and logical), applications, and environments. Discovering and controlling these systems is a challenge from both a security and audit perspective, but also from a service assurance perspective. Decentralized IT organizations driven by the business need to move quickly, and experiment and the mandate of innovation often contradict centralized IT organizations methodologies. Technology which relies upon access to systems that are often outside of the area of control within IT is an ever growing challenge. These issues require us to shift data collection from an approach of crawling and cataloging data towards instrumentation (scraping web services is fine if your data is lightweight, but typically depth is not captured in these exposed API endpoints). The approach of instrumentation provides a more accurate understanding of dependencies and user experience, and a dynamic way to understand relationships between physical and logical components, and allow us to create new use cases for this data to solve some of the problems the CMDB was designed to address. The next generation of  CMDB will be dynamically modeled. I have seen sprinklings of this in a couple of startups, but they are missing the right type of instrumentation to provide real-time capabilities. Events, logs, and metrics are not enough, ingesting topological models is not sufficient, this must be instrumentation across application and network.

My next post on this subject will focus on ITSM, which today is disjointed with both customer support and development. I am a strong believer in fixing these issues the right way with teamwork, and I’m looking forward to sharing my thoughts on this topic.

Comments

Popular posts from this blog

Dynatrace Growth Misinformation

For my valued readers: I wanted to point out some issues I’ve recently seen in the public domain. As a Gartner analyst, I heard many claims about 200% growth, and all kind of data points which have little basis in fact. When those vendors are asked what actual numbers they are basing those growth claims on, often the questions are dodged. Dynatrace, recently used the Gartner name and brand in a press release. In Its First Year as an Independent Company, Gartner Ranks Dynatrace #1 in APM Market http://www.prweb.com/releases/2015/06/prweb12773790.htm I want to clarify the issues in their statements based on the actual Gartner facts published by Gartner in its Market Share data: Dynatrace says in their press release: “expand globally with more than three times the revenue of other new generation APM vendors” First, let’s look at how new the various technologies are: Dynatrace Data Center RUM (DCRUM) is based on the Adlex technology acquired in 2005, but was cr

Vsphere server issues and upgrade progress

So I found out that using the host update tool versus Vcenter update manager is much easier and more reliable when moving from ESXi 3.5 to 4.0. Before I was using the update manager and it wasn't working all that reliably. So far I haven't had any issues using the host update tool. I've done many upgrades now, and I only have 4 left, 3 of which I am doing this weekend. Whenever I speak to vmware they always think I'm using ESX, when I prefer and expect that people should move to the more appliance model of ESXi. With 4.0 they are pretty much on par, and I'm going to stick with ESXi. On one of my vsphere 4.0 servers (virtualcenter) its doing this annoying thing when I try to use the performance overview:   Perf Charts service experienced and internal error.   Message: Report application initialization is not completed successfully. Retry in 60 seconds.   In my stats.log I see this.   [28 Aug 09, 22:28:07] [ERROR] com.vmware.vim.stats.webui.startup.Stat

Misunderstanding "Open Tracing" for the Enterprise

When first hearing of the OpenTracing project in 2016 there was excitement, finally an open standard for tracing. First, what is a trace? A trace is following a transaction from different services to build an end to end picture. The latency of each transaction segment is captured to determine which is slow, or causing performance issues. The trace may also include metadata such as metrics and logs, more on that later. Great, so if this is open this will solve all interoperability issues we have, and allow me to use multiple APM and tracing tools at once? It will help avoid vendor or project lock-in, unlock cloud services which are opaque or invisible? Nope! Why not? Today there are so many different implementations of tracing providing end to end transaction monitoring, and the reason why is that each project or vendor has different capabilities and use cases for the traces. Most tool users don't need to know the implementation details, but when manually instrumenting wi