Wednesday, May 20, 2015
Thursday, May 7, 2015
John we couldn’t agree more with this letter, the specific issue is context. Too many of the tools and systems in use both commercially and with open source are metric or event (log) collectors, providing dashboards with little context about what is happening. In order to provide the proper context and operational visibility one must understand relationships and data flows between metrics and events.
This well written letter makes many points we completely agree with at AppDynamics. The use of words predictive or fixing issues automatically are not something we prescribe. Gartner has also long condoned the use of predictive in ITOA scenarios (“IT Operations Analytics Technology Requires Planning and Training” Will Cappelli December 2012). The area we disagree with is having early warning indicators of problems which are escalating. If technology is employed which collects end user experience from the browser and that performance is baselined by geography, as degradation occurs across the user community this often is an early warning indicator that something is behaving abnormally. We have customers who have seen a vast reduction in complete outages (P1 issues), and an increase in degraded service issues (P2 issues). This means we have evidence that the use of AppDynamics can in fact reduce the number of outages by providing early warning indicators. We have other evidence showing legacy enterprise monitoring tools are far too slow, this is a coupling of older technology, and organizational or process issues. This prevents the alerts from getting into the right hands in a timely manner. For example in a enterprise with siloed teams and tools, a storage contention bottleneck on a particular array would often be seen by the storage team, but lack of application operations visibility and escalation in a timely manner would result in service issues. This of course can be solved by fixing organization issues, but that is a challenge at scale.
Monday, May 4, 2015
Sourcing strategies are evolving rapidly. CIOs have to handle this shift, where innovation isn't coming from the usual vendors. There is a clear movement toward those who can facilitate the most critical agendas to enable digital business. New organizational structures are required that allow for a culture of innovation and experimentation. In order to innovate, teams that typically consist of larger, slower-moving units must transform into small, startup-type or project-based units.
AppDynamics CEO, Jyoti Bansal, often speaks about his methodology for creating a startup within a startup. Many new product initiatives begin with a small team of two-to-four engineers and product managers tasked with building a minimum viable product. In this model startup teams are augmented with small acquisitions which come with talent. This new talent provides new perspectives, skills, and core intellectual property allowing the acquirer to meet customer needs quickly. This is similar to the model employed at Google, Facebook, Yahoo, or Twitter allowing for incubation and experimentation with new, unorthodox ideas. At Google, some of these new ideas become winners like Gmail and Google Apps for business, while many fail. Some of these new products take several iterations before they can be productized. Google Talk is one example that was allowed to “sunset” but was reborn as Google Hangouts, which has had rampant adoption with its second iteration and integration of other services. Google Hangouts on Android has over 500 million installations.
This is the new pace of innovation, something CIOs must now replicate in their own operations. A Gartner headline says, "Digital Business Economy is Resulting in Every Business Unit Becoming a Technology Startup." What this means is if you do not evolve your organization, thinking, and sourcing, you will be unable to compete. The furious and often large acquisition strategy, results in non-integrated technologies which fail to compete in a time where technology becomes a major advantage in digital transformations.
Within IT operations management, an area I've studied and tracked for over two decades, we saw many large acquisitions over the years from HP, IBM, BMC, and CA. These acquisitions provided growth and breadth, by adding new capabilities that were slotted into a portfolio. That growth strategy though, which had once been effective, ended up creating too much complexity. The complexity required services, which in turn led to challenges in staying current.
This technology debt and lag do not meet the needs of today's technology buyers. The large acquisition strategy has slowly fallen off; meanwhile, these big players have had major challenges making the transition to organic innovation. Gartner predicts that "By 2017, at least two of the "big four" IT operations management vendors will cease to be viable strategic partners for 60 percent of I&O organizations." We've already seen this play out with the privatization of yesterday's innovators and ongoing execution issues with others.
New innovative upstarts in ITOM run at a different speeds, allowing them to create, build, and provide the value customers want, without the heavy services, consulting, and integration burdens. These new vendors have highly differentiated strategies and approaches that keep them innovative and give them a competitive advantage.
The question remains for CIOs: Of these innovative vendors, which will be able to continue down the innovation path as they move from being medium-sized companies to become large companies? As revenues of these innovative vendors continue to increase substantially, innovation at scale remains an open question. But at AppDynamics, we're focused on culture and strategy to enable and retain our innovation as we build the next generation ITOM technology for software-defined businesses.
A decade ago, when I first learned of Apdex, it was thanks to a wonderful technology partner, Coradiant. At the time, I was running IT operations and web operations, and brought Coradiant into the fold. Coradiant was ahead of its time, providing end-user experience monitoring capabilities via packet analysis. The network-based approach was effective in a day when the web was less rich. Coradiant was one of the first companies to embed Apdex in its products.
As my colleague Jim Hirschauer pointed out in a 2013 blog post, the Apdex index is calculated by putting the number of satisfied versus tolerating requests into a formula. The definition of a user being "satisfied" or "tolerating" has to do with a lot more than just performance, but the applied use cases for Apdex are unfortunately focused on performance only. Performance is still a critical criterion, but the definition of satisfied or tolerating is situational.
I'm currently writing this from 28,000 feet above northern Florida, over barely usable in-flight internet, which makes me wish I had a 56k modem. I am tolerating the latency and bandwidth, but not the $32 I paid for this horrible experience , but hey, at least Twitter and email work. I self-classify as an "un-tolerating" user, but I am happy with some connectivity. People who know me will tell you I have a bandwidth and network problem. Hence, my level of a tolerable network connection is abnormal. My Apdex score would be far different than the average user due to my personal perspective, as would the business user versus the consumer, based on their specific situation as they use an application. Other criteria that affect satisfaction include the type of device in use and connection type of that device.
The thing that is missing from Apdex is the notion of a service level. There are two ways to manage service level agreements. First, a service level may be calculated, as we do at AppDynamics with our baselines. Secondarily, it may be a static threshold, which the customer expects; we support this use case in our analytics product. These two ways of calculating an SLA cover the right ways to measure and score performance.
This is AppDynamics’ Transaction Analytics Breakdown for users who had errors or poor user experience over the last week, and their SLA class:
Simplistic SLAs are in the core APM product. Here is a view showing requests that were below the calculated baseline, showing which were in SLA violation.
The notion of combining an SLA with Apdex will result in a meaningful number being generated. Unfortunately, I cannot take credit for this idea. Alain Cohen, one of the brightest minds in performance analysis, was the co-founder and CTO (almost co-CEO) of OPNET. Alain discussed his ideas with me around this new performance index concept called OpDex, which fixes many of the ApDex flaws by applying an SLA. Unfortunately, Alain is no longer solving performance problems for customers; he's decided to take his skills and talents elsewhere after a nice payout.
Alain shared his OpDex plan with me in 2011; thankfully all of the details are outlined in this patent, which was granted in 2013. But OPNET's great run of innovation has ended, and Riverbed has failed to pick up where they left off, but at least they have patents to show for these good ideas and concepts.
The other issue with Apdex is that users are being ignored by the formula. CoScale outlined this issues in a detailed blog post.They explain that histograms are far better ways to analyze a variant population. This is no different than looking at performance metrics coming from the infrastructure layer, but the use of histograms and heat charts tend to provide much better visual analysis.
AppDynamics employs automated baselines for every metric collected, and measures based on deviations out of the box. We also support static SLA thresholds as needed. Visually, AppDynamics has a lot of options including viewing data in histograms, looking at percentiles, and providing an advanced analytics platform for whatever use cases our users come up with. We believe these are valid approaches to the downsides of using Apdex extensively in a product, which has it’s set of downsides.