Friday, February 20, 2009

Lots of updates, projects, running and ducking

Sorry for the lack of posting, I have been travelling and working a lot trying to get a bunch of projects out the door. We have about 6 weeks left of craziness until there is a break before some other larger projects. At least most of the infrastructure and underpinnings will be done for that which should make things less crazy.

We are close to moving our marketing site over to Tridion (beta launch today, and release next Friday), so marketing can deal with the content independently of the code and products. Tridion is a crappy product to deal with, it breaks a lot, and its hard to keep the publishing system working. The upside is that it's very flexible. As far as CMS's go its very overpriced, the support is sub-par (compared to higher end tools I have dealt with) and not something I would recommend to most companies.

I am headed to Geneva in 1.5 weeks to finish the rest of some migrations we are doing. Essentially moving some servers and integrating some products together.

Our Netapp is still crashing weekly due to a bug with the 10G card. Netapp is having a hard time debugging the core files we have provided. I'm pretty surprised this has taken 6 weeks, and I've escalated to my sales folks to hopefully get them moving.

I'm trying to sell a bunch of surplus gear that we pulled from our old datacenter, which we moved out of on Sunday. Bunch of servers and other random hardware. Curious to see what the lot will get.

Things are good, we have a lot of projects to deliver, but we are progressing, learning, and advancing to the new product launch. Hope all of my readers are doing well and you enjoy the weekend.

Wednesday, February 4, 2009

Netapp Issues

We’ve had a few Netapp issues since we went live. One of them was that we were shipped the wrong card, and so we had to reconfigure the nodes with the proper 10GE fiber cards right before we went live. We weren’t aware of the cluster requirements that you had to set the Partner IP. Last night we rebooted both nodes and did a failover test after adding in the partner IPs into the multiple interfaces we are using. The failover worked great, and everything is good.

Every Sunday we have also been seeing a degrading performance issue on one of the filers, it starts out by some pack loss over the LAN, and cascades down to the filer eventually either being rebooted or going down to ping. This effects the 1g interfaces and the 10g interfaces as well. This filer is serving NFS for vmware, as well as CIFS for standard fileserving to a farm of webservers.

I’ve had a case open with Netapp, but the response time of the engineers has been lackluster, which is surprising since we have 1 outage per week on this node. I just noticed yesterday that we were seeing errors on the filer 10g interface (only 1 of them) but after the reboot there were none. The switch wasn’t seeing any errors, only the filer.

Data has been changed below (Network and Address):
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collis Queue
e0a* 1500 none none 0 0 0 0 0 0
e0b 1500 10.10.5/24 *SNIP* 1m 0 1m 0 0 0
e0c* 1500 none none 0 0 0 0 0 0
e0d 1500 10.10.3/24 *SNIP* 25m 0 16m 0 0 0
e2a 1500 10.10.5/24 *SNIP* 38m 2m 3m 0 0 0
e2b 1500 10.10.2/24 *SNIP* 65k 0 18m 0 0 0
lo 8160 127 localhost 30k 0 30k 0 0 0