Skip to main content

Realtime market data systems

I don't really talk about this much, but we deliver a lot of real time market data to thousands of customers. Part of my responsibility is running a group which monitors the real time environment. With very strange things like multicast, and other abnormal requirements that most standard products don't deal with. A good example is the exchange holidays, open and close times for each exchange out of the 200+ exchanges we bring feeds into global POPs.

The infrastructure has more changes by development than anything else at the company. Figuring out the proper "state" is next to impossible, which make monitoring a challenge to say the least. They have a custom tool, which we are working with the developers on to get a web services interface so we can better understand state before presenting a false alarms to the Realtime operators.

We are cleaning up the rules by pushing the responsibility onto developers to write the proper rules. This should fix things as we audit the existing 75,000 rules in the custom monitoring tool. Going forward the rules are required with each software release.

Part of all of these custom, old, homebuilt, somewhat crappy tools is that we need to extract metrics which are non-standard and use them for capacity analysis. Aside fromt he standard system, and network capacity planning there also has to be software capacity planning. The team generates a monthly report, which is very large a complex. Taking anywhere from 60-100 man hours to create I want to automate the report more, or build some kind of self-service reporting or BI portal. These are initial thoughts, but something we need to start discussing.

Have a good weekend, please leave comments!


Popular posts from this blog

Misunderstanding "Open Tracing" for the Enterprise

When first hearing of the OpenTracing project in 2016 there was excitement, finally an open standard for tracing. First, what is a trace? A trace is following a transaction from different services to build an end to end picture. The latency of each transaction segment is captured to determine which is slow, or causing performance issues. The trace may also include metadata such as metrics and logs, more on that later.
Great, so if this is open this will solve all interoperability issues we have, and allow me to use multiple APM and tracing tools at once? It will help avoid vendor or project lock-in, unlock cloud services which are opaque or invisible? Nope! Why not?
Today there are so many different implementations of tracing providing end to end transaction monitoring, and the reason why is that each project or vendor has different capabilities and use cases for the traces. Most tool users don't need to know the implementation details, but when manually instrumenting with an API, t…

F5 Persistence and my 6 week battle with support

We've been having issues with persistence on our F5's since we launched our new product. We have tried many different ways of trying to get our clients to stick on a server. Of course the first step was using a standard cookie persistence which the F5 was injecting. All of our products which use SSL is being terminated on the F5, which makes cookie work fine even for SSL traffic. After we started seeing clients going to many servers, we figured it would be safe to use a JSESSIONID cookie which is a standard Java application server cookie that is always unique per session. We implemented the following Irule (slightly modified in order to get more logging): (registration is free)
# Check if there is a JSESSIONID cookie
if {[HTTP::cookie "JSESSIONID"] ne ""}{
# Persist off of the cookie value with a timeout of 2 hours (7200 seconds)

NPM is Broken

As someone who bought and implemented NPM solutions, covered them as an analyst, and now watches the industry, one cannot help but notice that NPM(D) is broken. According to Gartner themselves, the data center is rapidly changing, the data center is going away, maybe not as quickly as Capp states, but it’s happening. This is apparent by the massive public cloud growth posted by Amazon, Microsoft, and Google in their infrastructure businesses. This means that traditional appliance-based NPMD offerings will not work, nor will traditional ways of collecting packet data. Many of the flow offerings do not handle the new types of flows which these services generate, but most importantly they do not understand the internet, which is the most important part of assuring services in cloud hosted environments.
The network itself is not just moving to overlay a-la NSX and ACI, it's moving inside of orchestrated containers, and new proxy/load balancing systems typically built off components or …