- Historical data management: Every monitoring tool must handle historical data, this is required to build reports and dashboards or understand trends.
- Streaming data management: Most monitoring systems use a combination of batching and streaming to collect, analyze and send monitoring data. Streaming data collection and analysis is a requirement for most APM and NPMD solutions.
- Log data ingestion: Due to the uptake of solutions built on top of the ElasticSearch project log analytics has become a feature of most monitoring tools. By my count, there have been at least 10 new products introduced this year which collect and analyze logs.
- Wire data ingestion: Although packet data is a useful data source, it becomes hard to scale in public or private clouds. Nonetheless, many products out there use packet data, but in today's environments, often this data is captured on each OS instance via distributed analysis versus taps/spans, which come with additional challenges or limitations. Interestingly enough, Gartner only mentions taps/spans and not distributed collection and analytics.
- Metric data ingestion: Every monitoring tool must collect time-series metrics.
- Document text ingestion: This is less well understood in the market since many of the log analytics tools do an excellent job of ingesting human-readable documents and apply natural language processing to that data. Much of the advances are beginning to become table stakes as the ElasicSearch platform now has more advanced capabilities built into it, which continually improved.
- Automated pattern discovery and prediction: Just two years ago very few monitoring tools had automated pattern discovery, but today many solutions have capabilities to reduce the human effort involved in problem detection. The challenge in this category is that many data sources once ingested lose the relationship or structure of the data, yet many can infer them from the data, but do so with a high number of false positive correlations.
- Anomaly detection: Similar to the last category this has been around for a long time but has become more common since manually managing thresholds is not scalable or sustainable. Some products on the market have had this capability for a decade or longer.
- Root cause determination: This is probably the area in this model where the most innovation is occurring. There are very few products on the market which do this effectively. Most of the offerings are using rule-based expert systems, but these have many flaws when they see new problems they were not programmed to handle. If this problem were solved well, then we wouldn't see massive multi-hour outages occurring regularly. In the next two years, this area will evolve quickly.
- On-premises delivery: This one is fascinating as almost all software is becoming SaaS-delivered. Gartner is even now saying that By 2025, 80% of enterprises will migrate and then shut down their traditional data centers, versus 10% today. As those data centers close, SaaS delivery becomes the de facto method of purchasing the software used to manage applications and the infrastructure running in public or managed clouds.
Great, so if this is open this will solve all interoperability issues we have, and allow me to use multiple APM and tracing tools at once? It will help avoid vendor or project lock-in, unlock cloud services which are opaque or invisible? Nope! Why not?
Today there are so many different implementations of tracing providing end to end transaction monitoring, and the reason why is that each project or vendor has different capabilities and use cases for the traces. Most tool users don't need to know the implementation details, but when manually instrumenting with an API, t…