Skip to main content

Netapp Issues

We’ve had a few Netapp issues since we went live. One of them was that we were shipped the wrong card, and so we had to reconfigure the nodes with the proper 10GE fiber cards right before we went live. We weren’t aware of the cluster requirements that you had to set the Partner IP. Last night we rebooted both nodes and did a failover test after adding in the partner IPs into the multiple interfaces we are using. The failover worked great, and everything is good.

Every Sunday we have also been seeing a degrading performance issue on one of the filers, it starts out by some pack loss over the LAN, and cascades down to the filer eventually either being rebooted or going down to ping. This effects the 1g interfaces and the 10g interfaces as well. This filer is serving NFS for vmware, as well as CIFS for standard fileserving to a farm of webservers.

I’ve had a case open with Netapp, but the response time of the engineers has been lackluster, which is surprising since we have 1 outage per week on this node. I just noticed yesterday that we were seeing errors on the filer 10g interface (only 1 of them) but after the reboot there were none. The switch wasn’t seeing any errors, only the filer.

Data has been changed below (Network and Address):
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collis Queue
e0a* 1500 none none 0 0 0 0 0 0
e0b 1500 10.10.5/24 *SNIP* 1m 0 1m 0 0 0
e0c* 1500 none none 0 0 0 0 0 0
e0d 1500 10.10.3/24 *SNIP* 25m 0 16m 0 0 0
e2a 1500 10.10.5/24 *SNIP* 38m 2m 3m 0 0 0
e2b 1500 10.10.2/24 *SNIP* 65k 0 18m 0 0 0
lo 8160 127 localhost 30k 0 30k 0 0 0

Comments

Popular posts from this blog

Misunderstanding "Open Tracing" for the Enterprise

When first hearing of the OpenTracing project in 2016 there was excitement, finally an open standard for tracing. First, what is a trace? A trace is following a transaction from different services to build an end to end picture. The latency of each transaction segment is captured to determine which is slow, or causing performance issues. The trace may also include metadata such as metrics and logs, more on that later. Great, so if this is open this will solve all interoperability issues we have, and allow me to use multiple APM and tracing tools at once? It will help avoid vendor or project lock-in, unlock cloud services which are opaque or invisible? Nope! Why not? Today there are so many different implementations of tracing providing end to end transaction monitoring, and the reason why is that each project or vendor has different capabilities and use cases for the traces. Most tool users don't need to know the implementation details, but when manually instrumenting wi

NPM is Broken

As someone who bought and implemented NPM solutions, covered them as an analyst, and now watches the industry, one cannot help but notice that NPM(D) is broken. According to Gartner themselves, the data center is rapidly changing, the data center is going away, m aybe not as quickly as Capp states, but it’s happening. This is apparent by the massive public cloud growth posted by Amazon, Microsoft, and Google in their infrastructure businesses. This means that traditional appliance-based NPMD offerings will not work, nor will traditional ways of collecting packet data. Many of the flow offerings do not handle the new types of flows which these services generate, but most importantly they do not understand the internet, which is the most important part of assuring services in cloud hosted environments. The network itself is not just moving to overlay a-la NSX and ACI, it's moving inside of orchestrated containers, and new proxy/load balancing systems typically built off component

F5 Persistence and my 6 week battle with support

We've been having issues with persistence on our F5's since we launched our new product. We have tried many different ways of trying to get our clients to stick on a server. Of course the first step was using a standard cookie persistence which the F5 was injecting. All of our products which use SSL is being terminated on the F5, which makes cookie work fine even for SSL traffic. After we started seeing clients going to many servers, we figured it would be safe to use a JSESSIONID cookie which is a standard Java application server cookie that is always unique per session. We implemented the following Irule (slightly modified in order to get more logging): http://devcentral.f5.com/Default.aspx?tabid=53&view=topic&postid=1171255 (registration is free) when HTTP_REQUEST { # Check if there is a JSESSIONID cookie if {[HTTP::cookie "JSESSIONID"] ne ""}{ # Persist off of the cookie value with a timeout of 2 hours (7200 seconds) p