Tuesday, December 15, 2009

Back from vacation and diving in head first!

We had an awesome cruise in the Caribbean with my wife for the last 10 days. I am feeling very good and back in the swing of things. I missed my great colleagues, house, and cats.

1st day back and we ended up seeing some odd packet loss around 6pm in our office. We ended up finding a couple of bad switches, and essentially rewired the whole office. It was a good time. We also put an order in for a couple of better Dell switches that support STP, management, and Spanning. These features are something we really wanted, but we had these crappy web managed Dell switches which cannot do much of anything. It was a fun 16 hour day yesterday, and thank you to Jamie for working so late with me to fix the problem. We still have 1 issue left to fix with the router -> firewall connection, but it should be done tonight late or tomorrow morning.

We have a software push tonight to production systems, should go smoothly, but we keep risking not letting the code settle long enough before we push it.

Also our VM environment is ever growing; supporting our enterprise product is really becoming a major drag on the infrastructure. Still have a lot of cleanup to do on the legacy environments, but It's mostly being pushed back as far as the dates go. We are starting to be too risky on this side of things for my comfort.

Been seeing major Database growth as well, we are doing a online volume expansion using Snapdrive and the iSCSI LUNs we host on the Netapps. Done this before without issue, but there is always some risk involved.

Hope all the readers are doing well, and I will update soon with what's been going on

Wednesday, November 25, 2009

Phone system

We elected to trial one office upgrade with a new phone system, on much different technology than myself or my colleagues are used to. We decided to go with a Fonality system. The support seems great, the product is very advanced, and the price is VERY good. We are doing our first SIP turnup and phone system install in the Paris office this weekend. I hope it goes well. If this works out we'll be rolling out Fonality across the company over time. Munich is next on the list.

It's essentially a modified asterix system. Fonality also manages the development of Trixbox, which is the largest distribution of Asterix on the market. Looking forward to seeing it in action!

Really amazing features and I like the fact that we can use almost any supported phone provider. (Cisco, Polycom, Grandstream, Aastra)

Fonality PBXtra - http://pbxtra.fonality.com/

Office 2010

I have been using Office 2010 for about 6 months now. It was a bit rough the last 4 months or so. I was also very sad that I didn't have my favorite addins such as plaxo, xobni, and others. I have been running them on a desktop I can remote into if I need them or need to sync my outlook data via these extensions. I got the new build installed after having to mess with the uninstall process. It was really annoying but it worked out. The new beta build is very nice. It's more stable, and there are several items added which didn't exist previously. You can also see in outlook how they are starting to integrate social media into email, something xobni has been doing very well for quite a while. The Microsoft social media integration isn't complete or even really working, while the xobni version is very advanced and works great.

I wish that Microsoft built some kind of wrapper to allow the previous addins to work with outlook, but I guess that's not going to happen.


 

Xobni - www.xobni.com

Plaxo - www.plaxo.com

Thursday, November 5, 2009

Windows 7 UAC articles

This is really upsetting me. I keep seeing this as I read my news tonight:

http://www.betanews.com/article/Sophos-study-suggests-Windows-7-UACs-default-setting-is-selfdefeating/1257455306

I was one of the only ones who seemed to think Vista was a good user interface and OS upgrade from XP, of course it could have been more optimized and even better, which is what windows 7 is. I also found the UAC feature in Vista to be very good, and similar to those of us who use unix are used to working. You su to root when you need to do something elevated, otherwise you operate at user level. The typical end user complained "it keeps asking me to elevate so often, I don't understand what this means". On windows 7, Microsoft decided to elevate only under certain cases (by default), and of course the inconvenience of the extra click, otherwise known as security, was removed essentially. This makes Windows 7 in its default setting much less secure than vista.

Being a systems and infrastructure guy, we get the same Vista feature in Windows Server 2008 (based on Vista), and R2 (based on 7). They kept the same escalation we had in Vista enabled out of the box on both platforms. This is especially good for a server OS. I have been seeing some of the admins (not in my group, but DBAs) disable this feature, and I always implore them to turn it back on. I explain the reason it's there, and it will save them, either from doing something by accident, or by something running in their session they aren't aware of.

Then you get other poorly designed software such as HP's Quicktest Professional which still cannot run with any level of UAC enabled. It takes 4 years to make your application work with UAC? Really?

So basically, user feedback promoted Microsoft to reduce the nags (otherwise known as security), and then the press and AV vendors are touting Windows is less secure? Seems like a catch 22 for Microsoft, they want to sell operating systems, but they also need to placate people like me who would like a secure OS. I understand they are shipping the servers hardened, and the clients less so, but is that a good idea? I think my mom will thank them J

We did it

We finally launched the new platform, its been pretty difficult both pre and post launch supporting the business, developers, QA folks, doing perf testing. Lots of stuff to fix, and I'm really looking forward to the cleanup part. Its always fun to recover space and processing power which is not needed on the new platform. Things are progressing well.

http://www.techcrunch.com/2009/11/02/mfg-com-takes-off-the-cuffs-with-manufacturing-marketplace-redesign/

We have to move a couple offices in the near future, and we're trying to open one in India. All of that planning and work is keeping us busy as well.

I am very happy to have a new global helpdesk manager onboard. Great addition to our team!

Wednesday, October 21, 2009

Ipmonitor, spiceworks, and vendor maintenance

I'm happy that solarwinds has released a major upgrade to ipmonitor. Too bad they didn't notify me my maintenance expired about a month ago. I'm renewing it now, and looking forward to v10. This product is excellent, cheap, and does a great job with agentless monitoring. You can also tweak it to monitor pretty much anything as needed. Such a good deal for a great all around product.

Speaking of maintenance, I also just investigated and found out that our F5 Big-IP maintenance expired in April. Glad we have a HA pair in case of issues in the next little while, but I don't understand why vendors and resellers don't keep on top of customers. It's essentially free money they aren't going to get if they don't chase folks about it.

Vendors who do a good job with maintenance:

  • IBM
  • Oracle
  • Sybase
  • HP Software
  • Cisco

Vendors who are horrible with maintenance in my experience:

  • HP hardware
  • Dell
  • F5

What cisco does that's really cool is they integrate maintenance into other tools so when you have inventory such as HP Network Automation System (Formerly opsware nas) as well as Cisco's own Ciscoworks.

Another thing we do because we are cheap and love good free software is we leverage this awesome product called Spiceworks. I can't believe what you can do with the product, we've been using it for over a year now, and it's completely free:

  1. System inventory
  2. Hardware details and changes
  3. Software installs and changes
  4. Up/down monitoring
  5. Event log monitoring
  6. Disk monitoring (only used as a blanket monitor, non production)
  7. Exchange monitoring
  8. Antivirus definitions monitoring
  9. Active Directory integration
  10. Interface graphing (firewalls, routers, switches)
  11. Network mapping (relationships of devices and switches)

Its very easy to find a users system based on the login, its very easy to see changes in software and hardware, its also easy using this script:

http://community.spiceworks.com/how_to/show/197

This awesome script populates the inventory with the warranty expiration on dell devices. (including servers, printers, switches, desktops, and laptops)

Right now we have 5 collectors feeding one instance, so I can do global scanning and aggregate the results in a single repository.

Wednesday, September 16, 2009

Netapp SATA perf

I decided that when we built out the netapp gear that we would put low volume and QA data on the SATA disks and save the FC disk for databases, VMware, and other intensive stuff. Now looking at performance on the QA VMware boxes with the SATA disks and I'm thinking I shouldn't have done that. It's been quite good in general, but when there is a backup running or other disk intensive actions occur it grinds to a halt. I really need to figure out a way to move onto a FC aggregate at some point.

Vmware issues resolution

I had support on the line for a while to fix my errors with the reports. They finally fixed it, it was some obscure bug which is fixed the the next major patch for Vsphere 4. It was a complex fix they had to do, but it works finally. Nice job by support.

Friday, August 28, 2009

Sonicwall Sonicpoints

These things are still having issues. We got them stable for 2 weeks, but now one of them is on the fritz again. The N access points seem to be more problems than the G ones. I have another service case open with Sonicwall, their support is pretty unresponsive in general. Annoying.

Too bad, because the product is great!

Vsphere server issues and upgrade progress

So I found out that using the host update tool versus Vcenter update manager is much easier and more reliable when moving from ESXi 3.5 to 4.0. Before I was using the update manager and it wasn't working all that reliably. So far I haven't had any issues using the host update tool. I've done many upgrades now, and I only have 4 left, 3 of which I am doing this weekend.

Whenever I speak to vmware they always think I'm using ESX, when I prefer and expect that people should move to the more appliance model of ESXi. With 4.0 they are pretty much on par, and I'm going to stick with ESXi.

On one of my vsphere 4.0 servers (virtualcenter) its doing this annoying thing when I try to use the performance overview:


 

Perf Charts service experienced and internal error.


 

Message:

Report application initialization is not completed successfully. Retry in 60 seconds.


 

In my stats.log I see this.


 

[28 Aug 09, 22:28:07] [ERROR] com.vmware.vim.stats.webui.startup.StatsApplicationLauncher : Task execution produced an error. Re-initialization attempt #26 will startup after 60 seconds...

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: com.vmware.vim.stats.webui.StatsReportException: Unable to open VC DataSource.

    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:205)

    at java.util.concurrent.FutureTask.get(FutureTask.java:80)

    at com.vmware.vim.stats.webui.startup.StatsApplicationLauncher$1.run(Unknown Source)

    at java.lang.Thread.run(Thread.java:595)

Caused by: java.lang.IllegalStateException: com.vmware.vim.stats.webui.StatsReportException: Unable to open VC DataSource.

    at com.vmware.vim.stats.webui.startup.StatsApplicationLauncher$1$1.run(Unknown Source)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)

    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)

    at java.util.concurrent.FutureTask.run(FutureTask.java:123)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)

    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)

    ... 1 more

Caused by: com.vmware.vim.stats.webui.StatsReportException: Unable to open VC DataSource.

    at com.vmware.vim.stats.webui.startup.VcDataSourceInitializer.init(Unknown Source)

    at com.vmware.vim.stats.webui.startup.StatsReportInitializer.createInitializers(Unknown Source)

    at com.vmware.vim.stats.webui.startup.StatsReportInitializer.init(Unknown Source)

    ... 9 more

Caused by: org.apache.tomcat.dbcp.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (The connection to the named instance has failed. Error: java.net.SocketTimeoutException: Receive timed out.)

    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1225)

    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.getConnection(BasicDataSource.java:880)

    at com.vmware.vim.stats.webui.startup.VcDataSourceInitializer.openVcDataSource(Unknown Source)

    ... 12 more

Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection to the named instance has failed. Error: java.net.SocketTimeoutException: Receive timed out.

    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(Unknown Source)

    at com.microsoft.sqlserver.jdbc.SQLServerConnection.getInstancePort(Unknown Source)

    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(Unknown Source)

    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(Unknown Source)

    at org.apache.tomcat.dbcp.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)

    at org.apache.tomcat.dbcp.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:294)

    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1247)

    at org.apache.tomcat.dbcp.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1221)

    ... 14 more


 

I have an open case with vmware right now. See if I can get it fixed.

Friday, August 14, 2009

Ahh my favorite

So we pay a LOT for support on some software, you are talking about 30-50k. I just got this email from one of my vendors. They happen to be one I am not fond of which I have commented on before:

Only 2 months and 10 days for a ticket reply for a new license.

Date of email reply - 8/20/2009 3:37am

Dear Jonah

Please find attached the license requested

Kind regards,



From: support.emea@somecompany.com [mailto:support.emea@somecompany.com]
Sent: Wednesday, June 10, 2009 03:22
To: Support EMEA
Subject: INC000000007516 License-request

Tuesday, July 28, 2009

Uverse hell

Day one - Thursday

I got my uverse installed, install went okay, took about 5 hours to get it running. Once the tech left I switched from my Clear Wimax connection to the uverse. Looked good, then I got on the PTPP connection at work, and it started dropping me every 5-9 minutes. I called them and worked with support for 2 hours and tried lots of settings on the router. These included lowering the MTU and upping the timeout. Nothing fixed it at all, they were keeping my case open in case I wanted to call back. Then I was playing with it later on and lowered the timeout and it seemed to fix the issue. The router has a very small connection table which is a known issue, I think with the 1 day timeout there were lots of open connections which weren't being cleaned up.

Day two - Saturday

Turned on the TV to start setting up some of the DVR recordings, and the DVR wasn't working. I called ATT and the box was dead, we tried to do a factory reset and OS reload several times and the box was toast. They promptly sent out a tech a few hours later. He tried with 4 different boxes and was unable to get it working. He was there for about 3 hours, and he was out of ideas. It was 8:30PM on Saturday. He also reset my router since the TV connects to the router and uses the line that is shared.

Day Three – Sunday

I reset up the router with the same settings I had before, but I started once again to get PTPP issues and drops. I messed with it for a while and gave up again.

Day Four – Monday

I turned on the DVR, and what do you know, it started working perfectly, makes no sense to me. I called support once again on my VPN issues, I gave them my case number and they told me they don't work on VPN issues and regardless of my case and the tech I worked with I should go away. I spoke to the shift supervisor and she told me the same thing, but was nicer about it. I told them I would disconnect service if I wasn't able to use a simple VPN. They were happy to lose me as a customer, and transferred me to the disconnection line. When I got someone there, I asked them if I could have TV only, and they said I need to have internet as well. I told them I would work on it some more, and let them know if I wanted to disconnect.

We happen to have firewalls which give us SSL VPN as a free option, so one of my co-workers set it up, and it worked perfectly. No drops, and no issues. At least I have a fix for now, so I can work at home if needed.

I called customer service and got a credit for my outage time and a discounted rate for the next 6 months. All in all I've wasted about 20 hours on this mess, but glad its working now. The customer support has been good, aside from the lady who refused to help me.

Service impressions

The base 6down/512up is decent, but no better than the clear wimax service. The TV seems very good, and the DVR is better than others I have seen, but I still miss my Tivo HD XL. Nothing beats Tivo!

I wish I had my FIOS like I had back in Boston L

Wednesday, July 22, 2009

Move - clear wimax - uverse

I haven't been posting becuase I just moved, and I'm also getting married in the next few weeks, but enough of the personal stuff. Lets get into some good tech talk :)

I have a verzion wireless card in my laptop which comes in handy when I don't have other ways to access the internet. Its a good connection, and usable for working as needed. The main issue is there is some lag/latency in the connection.

We have a clear Wimax device at work which is our backup internet connection if we lose the fiber link. I decided to borrow this connection during my move to deal with my lack of internet. All I have to say is that the connection quality and speed is amazing. The lag is almost non-existant, and the throughput is superb. I can even download a lot on it without an issue.

In my new location I can get ATT uverse, which is similar to the Verizon FIOS connection I had back in Boston. I loved FIOS, and was very sad when I had to move down to Atlanta and get on Comcast. I had a stuggle getting the FIOS installed, but eventually it worked and was rock solid for the duration of my service. It was also about $35 per month less than comcast.

Uverse costs as much as comcast, which is fine if the quality is there. I've had a small struggle so far getting uverse installed, but tomorrow is the big day when my place will be fully provisioned hopefully. I will post more on the quality of the connection and the TV capability compared to Comcast, Verizon (FIOS/Wireless), and Clear.

Another major difference is that on FIOS they ran standard cable from the Fiber demarcation point to the TV, so I used my Tivo HD XL box with cablecards, which was perfect! On Uverse they use a full IPTV device, which means my TIVO isn't going to work, but the picture quality and features should be better. The Uverse box also supports recording 4 channels at once, versus my TIvo's capability to record 2 channels at once. http://www.att.com/Common/totalhomedvr/

We'll see if it works well, more on that this weekend.

So I should be posting more, telling you all how it is to deal with vendors and technology we use in every day work and home life.

Leave comments, I like a good dialog!

Vmware ESXi upgrades from 3.0 to 4.0

I've been trying to move hosts over to 4.0 from 3.0, but i've run into countless issues on each migration. I'm trying to get a good method do this consistently, but its just not really possible from what I've been seeing.

When I do speak to support, they keep confusing esxi with esx, and ask me to do things that cannot be done on esxi. I've also been studying on if I should run esx or esxi. For the way we deploy and the lightweight nature of esxi it makes more sense, and I believe its the proper model. They still need to develop a proper CLI/SDK or some way to get into the boxes better in a controlled manner without using the "unsupported" login method that support always asks you to do. That method requires console access which we don't have on lots of our hosts.

So far we have 2 hosts upgraded, and quite a few to do somehow or other. I'll keep plugging away.

Vmware support is very good, and responsive, good company to deal with, and great technology. They just haven't figured out the upgrades.

Microsoft print spooler i hate you

I've been debugging crashes on the print spooler on a Windows 2008 x64 domain controller which is a print server. With all of the advances Microsoft has put into Vista and Windows 7 on the client side to help debug and diagnose issues such as the automated collection technology inside Performance and Reliability monitor you'd figure that they could figure out a way to isolate a fault down to a vendor driver. Thats not the case, I've had to jump through hoops for about 8 weeks to figure out what the causes of the crashes are. I think its partially related to running x64, and x64 drivers, but having mostly x86 clients printing. Its kind of a mess, but its been improving with patches and the countless dumps and other files I've been sending them.

Issues with Tridion

Updated on 7/24:

We use the tridion CMS, which is a high end CMS product, we have had a lot of trouble with the product in the past. (www.tridion.com) The content manager is a strange beast which uses a combination of vb, .net, and other technologies. Its always breaking, and is not reliable. We need to debug messed up stuff on it on a regular basis. We have the systems under change control, but it still seems to manage to break easily. The good news is that it spits out generated jsp pages, which seem very reliable running on Resin application servers.

The support is always excellent and responsive, which helps us deal with code issues that we have with development. In the last case they actually went out of their way to get our content database and replicate the issue pointing to our code. This is something that few vendors would do for a small customer as we are. It still doesn't make up for the strange design of the system. I am not sure if this is related to our implementation or a problem with the product itself.

Here is a snippet from my emails with support:


It is not supported to run other versions of .Net on the Content Management server - only the version listed in section 2.2.d of the "SDL Tridion R5 Product Prerequisites 5.3.pdf" (Microsoft .NET Framework version 2.0) document are supported. Please uninstall all versions of .Net Framework, reboot, and install the supported version.


Last time I checked Microsoft made .NET fully backwards compatible, and even keeps the frameworks in different folders:

For example on my machine:

Directory of C:\Windows\Microsoft.NET\Framework

04/22/2009 02:17 AM v1.0.3705
04/22/2009 02:17 AM v1.1.4322
06/29/2009 03:14 PM v2.0.50727
04/22/2009 05:01 AM v3.0
05/18/2009 02:58 PM v3.5
06/29/2009 11:08 AM VJSharp

Since my last post was a bit too harsh, and I didn't credit where I meant to credit I now have to meet with the Tridion folks on Monday... what did I get myself and my poor colleagues into. Oops...

Wednesday, July 1, 2009

Sorry for lack of updates, lots of stuff

Have been in Geneva for the last 3 weeks now. Been busy over here getting things in order and making sure we are ready for a major product launch over here. I am also working on some staffing and support issues for here in Europe.

Our CEO was just on CNBC, and he did an awesome job, yay Mitch!
http://www.cnbc.com/id/15840232?video=1169688483&play=1

Upgraded our Vcenter servers up to Vsphere 4. The upgrade of the servers themselves went well, but I tried to do a hypervisor upgrade, which failed. I have a case open with Vmware on the upgrade. I'm also have some strange connectivity issue on one of my physical systems. The new product is very cool, the new alarming system is a much needed improvement. Overall very happy with it so far. I am looking forward to testing the performance with the new hypervisor.

Some cool stuff my team has done. Integrated all of our linux boxes with winbind to AD. Its much easier to admin them and control access now, very good stuff. We can give developers granular access to QA and Dev systems now.

We are also building a syslog infrastructure which is similar to the one I built in my previous company. Essentially its a syslog-ng frontend to splunk, where we can forward and log streams as needed. The cool thing is that my sharp engineers at work are figuring it out on their own and learning new stuff in the process. This is the kind of thing that makes doing infrastructure and systems cool.

Working with Microsoft on a spooler issue thats been annoying, printers are so fussy.

We closed a deal with HP to get a bunch of new software. My team will be working on implementing some new technologies to add onto our existing HP investments. We currently use HP Performance Center, HP Quality Center, and HP Quicktest Pro. We added more users into Performance Center. We also puchased HP Real User Monitor (RUM, which requires BAC... I thought I got away from BAC at my last gig). We also will be implementing HP Diagnostics for Java, which should be a very useful tool for us leading up to our new product release.

That should do it for now :)

Tuesday, June 16, 2009

In Geneva

Just trying to wrap up a few things, and get some more help on my team here. With the language barrier I face in Geneva, meeting and spending time face to face really helps a lot. I'm headed up to France for my brothers wedding for a few days, then a week back in the office and headed home. I will post more info on my trip and some details of the projects and such I have going on when I am back (or flying across the ocean and not sleeping). Been working long days, and long nights. I have been here almost a week now, but it seems like 3 days. I miss home :)

Wednesday, May 27, 2009

Production Java Profilers

I have been looking at java profiling tools. Mostly focusing on the following vendors:

  1. Precise – Good on java, Superb on the database, no RUM capability
    1. Would make a good tool for us, to help upgrade our current toolset of Idera SQLdm (which is a great product as well, but more limited).
  2. Dynatrace – Great on java, no DB, Superb RUM capability
    1. Coradiant brought me into this tool, which I had discounted due to a previous bad experience (about 2 years ago) when the product was very new into the .NET space. Product has a lot of similarities to the way that the Identify Appsight works, where you export a file and the developers open the file and do the deeper analysis.
    2. I also see they just announced the partnership today here : http://www.prweb.com/releases/Performance_Management/solution/prweb2464644.htm
  3. Wily – Best on java, no DB, not sure on RUM
    1. Put together a great deal for us, but I'm not sure how well the RUM products work, since they are new to the game.
    2. Not sure I want to give up my Coradiant…
  4. HP Diagnostics – Good on java, no DB, no RUM
    1. Best pricing, since we already do business with them, but I didn't see a lot of advancements in the tool since HP purchased them.

So that's the current project, along with a new helpdesk/bugtracking system (JIRA) we are building. We are also doing some additional work on our SEO. That sums it up other than that just the standard engineering work, support, and smaller projects.

Cool post from Alistair

Big fan of what Alistair has been up to, and he has a new site here: http://www.rednod.com/ I really loved this post, and the overview of the process and software needed to build software : http://www.rednod.com/index.php/2008/12/07/testing-and-launching-a-web-app-what-every-startup-needs-to-know/

Lots of what big companies do in this space are overly complex, or missing key parts to the process. Using these types of tools will save time and create better software.

Wednesday, May 13, 2009

Been a good week – Windows 7 and Engineering Update

Sorry I haven't been posting as much as usual. Last weekend, I had some free time, and ended up re-installing one of my desktops at home with Windows 7. I liked the OS so much I ended up doing the same on my laptop. The OS is very responsive, and although I do like Vista, I agree that Windows 7 is a major improvement in many areas. The new taskbar took some getting used to, but the search and overall responsiveness are welcomed.

In more fun server news, I have been working on lots of different issues/engineering tasks:

  1. Tape backup drive is hosed, and I've been eating lots of time with Dell on this issue, I wish they would just swap it out. Annoying, but the service is much improved from what it was before when I was dealing with them previously.
  2. Redoing a bunch of stuff on the netapp to get ready for the new product release, its all coming together now J
    1. Having some odd issues with multi-protocol permissions and such, also having an issue with some of our NFS exports not showing up in the web gui (Filerview) but showing up on the CLI. Strange one.
  3. Put up a MSSQL cluster, but had a strange issue. Debugged it with Microsoft for a few days.
    1. Issue creating failover cluster - Getting Event ID: 1570.
    2. Tried to turn on the firewall as per microsoft's recommendation. I don't see why this matters, but we did it anyways. I had to make some custom ports for UDP 3343 and TCP 3343.
    3. Ended up having a user in AD with the same name as the computer account. This was giving us the access denied error. Renamed the user, and the cluster was fine.
    4. Was on the phone and email for too long with the engineer, but happy it got resolved.
  4. Getting a new conference calling solution for cheaper calls than what we pay now with BT. We use the conference calling internally since we have offices in Europe and Asia. We prefer and try to use Skype as much as possible.

Have to head back to Geneva next month for 3-4 weeks, which should be a good useful trip! Looking forward to working with my company over there, my main engineer has been there for a week, and will be there for another 2 weeks. He has been really fixing things up in the office, which are all much needed improvements.

Last item… Go Celtics! Go Bruins J

Monday, April 20, 2009

Been a bit crazy

I have been travelling quite a bit for work, I spent some productive time in Geneva, and Munich. I have been gone for about 3 weeks total. I had a nice weekend in Amsterdam as well.

The weeks consisted of a datacenter move, some engineering and upgrades, and fixing some issues in our other offices as well. Overall the trip was good, but a bit too long of course. Made some good progress communicating with our core development team, and nailing down a lot of things on the table. I'm happy to be back, and looking forward to the big push coming to a new product launch. There is always a big project on the horizon that we work towards.

I've also been trying to beef up and implement for centralized storage in order to properly utilize our ESXi implementations. We have a homegrown NAS for NFS in one office, an older AX150i in another office, and I'm looking to get some more storage in the other office. I wish we could afford to put netapps everywhere and replicate... maybe next year :)

Speaking of Netapps, it seems that the upgrade about 11 days ago from 7.3.1 to 7.3.1P3 has finally fixed the weekly crashing issues we have been having for the last 4 months. It took a long time to get it fixed, but glad it seems to be fixed.

I have one last sonicwall I am trying to get upgraded with the sonicwall trade-in program, then i'll be fully single firewall platform finally. Should make things easier to manage and to train the staff on. They are also great products and nice to work with. UTM is the way to go.

Anyone have any comments on good replacement tools for acrobat pro (editing PDFs). Please spare me the mac user comments.

Monday, April 6, 2009

In Geneva

Finished some moving around of servers and one of our products this past weekend.  Also did some cleanup so we can vacate some older cabinet space.  It runs much faster on the new environment, and we are doing lots to optimize it even more.  We have plans to VM a lot of it, which should help with management and scalability.  Everything went really well over here, we had some issues but not a ton.  Learned a lot about the other environments and how we can streamline a lot of it. 

I’m going to spend the rest of the week meeting with various people, and spending time with my engineer over here.  He’s pretty excited about a lot of what’s going on, and that’s great.  Its always nice to teach eager people.

I’m headed to Munich office next week to fix some ongoing connectivity issues, we have a DSL line which is over ISDN (Annex B).  Its been very problematic, and we’ve had connectivity issues.  I’m also installing a new firewall to get them onto the corporate standards, as well as deploying a new access point. 

Been working non-stop since I arrived last Thursday morning, I hope the rest of this week isn’t as long days.  I think I worked 28 hours last weekend alone.

Wednesday, March 25, 2009

Cool Skype Addons

We use a lot of Skype at work, and it’s just nice and easy especially with the Skype Polycom speakerphones (http://www.amazon.com/gp/product/B000GG0EFY), I love them.

I was searching for two nice apps for use with skype. We really like Zoho meeting (free 1 on 1), but I wanted something with better features similar to the glance application we use in sales over here. I found this app Oneeko (http://www.oneeko.com) which is really slick, and has excellent skype integration. It’s by far the coolest skype/screen sharing type app I have used.

I also wanted to record my skype conversations to mp3 versus having to pay $20-$30 to buy an app, I found a pretty decent app called Callgraph (http://callgraph.biz) which works very well and puts all my conversations and meetings into mp3.
Now I am all set with my new tools :)

Sunday, March 15, 2009

Released new CMS, Wireless

A couple weeks late, but we finally launched the new CMS for our corporate site. Its good marketing is in control of the site now, but the CMS has some major issues. I’m not going to get into the details, but it doesn’t seem like the right technology for most people who want to implement a CMS. We changed our main domain, and further split the marketing site from the application, which makes my team’s lives much easier in operations. Glad to have Coradiant in place, its very handy for these kinds of moves. Also happy that we put in Omniture with this new CMS release. I would like to test Coradiant Edge and the Omniture integration at some point.

We have been putting in sonicpoints to replace the hackjob dd-wrt I tried to put into place. I really like that product, and its very nice for my home use, but its got some issues with the N radios I guess, not too sure. Anyways the sonicpoints allow us to control them from our NSA appliances from Sonicwall. Makes security and implementation easier and faster. They are also very cheap, so that’s another positive. Trying to get them rolled out fully this week. Bought some for our Geneva office too, I’m headed there in a couple weeks.

I also implemented some new Netapp monitoring and such which is also pretty cool.

I’m headed to London on Tuesday night for some vacation and to spend time with my brother and his fiancé. Will be back a week from tomorrow (Monday), so you probably will not get another post for at least 10 days.
Have a good week.

Thursday, March 5, 2009

Updates – Netapp, HP, Coradiant

Our netapp (one of the nodes) is still crashing even after upgrading from ONTAP 7.3 to 7.3.1 . So far there is no advice from netapp on the issues. Obviously I am not happy since this has been going on for 7 weeks now. I just had a meeting with my Netapp rep, and he introduced and assured me the support would be better. I hope this is the case as its been pretty bad so far. Speaking of bad support the HP support is pretty much the worst support of any company on the market. I’ve had so many support issues just trying to get licenses and upgrades for all the HP (Mercury) software that we own for QA. Its been very painful. I’m trying to switch off HP support (aside from software upgrades) but we shall see.

Finally got the Coradiant box hooked up in Geneva, and I just did an upgrade this morning to the newest OS. Its working well. I have no idea what’s in the release notes, but I will soon find out 

Friday, February 20, 2009

Lots of updates, projects, running and ducking

Sorry for the lack of posting, I have been travelling and working a lot trying to get a bunch of projects out the door. We have about 6 weeks left of craziness until there is a break before some other larger projects. At least most of the infrastructure and underpinnings will be done for that which should make things less crazy.

We are close to moving our marketing site over to Tridion (beta launch today, and release next Friday), so marketing can deal with the content independently of the code and products. Tridion is a crappy product to deal with, it breaks a lot, and its hard to keep the publishing system working. The upside is that it's very flexible. As far as CMS's go its very overpriced, the support is sub-par (compared to higher end tools I have dealt with) and not something I would recommend to most companies.

I am headed to Geneva in 1.5 weeks to finish the rest of some migrations we are doing. Essentially moving some servers and integrating some products together.

Our Netapp is still crashing weekly due to a bug with the 10G card. Netapp is having a hard time debugging the core files we have provided. I'm pretty surprised this has taken 6 weeks, and I've escalated to my sales folks to hopefully get them moving.

I'm trying to sell a bunch of surplus gear that we pulled from our old datacenter, which we moved out of on Sunday. Bunch of servers and other random hardware. Curious to see what the lot will get.

Things are good, we have a lot of projects to deliver, but we are progressing, learning, and advancing to the new product launch. Hope all of my readers are doing well and you enjoy the weekend.

Wednesday, February 4, 2009

Netapp Issues

We’ve had a few Netapp issues since we went live. One of them was that we were shipped the wrong card, and so we had to reconfigure the nodes with the proper 10GE fiber cards right before we went live. We weren’t aware of the cluster requirements that you had to set the Partner IP. Last night we rebooted both nodes and did a failover test after adding in the partner IPs into the multiple interfaces we are using. The failover worked great, and everything is good.

Every Sunday we have also been seeing a degrading performance issue on one of the filers, it starts out by some pack loss over the LAN, and cascades down to the filer eventually either being rebooted or going down to ping. This effects the 1g interfaces and the 10g interfaces as well. This filer is serving NFS for vmware, as well as CIFS for standard fileserving to a farm of webservers.

I’ve had a case open with Netapp, but the response time of the engineers has been lackluster, which is surprising since we have 1 outage per week on this node. I just noticed yesterday that we were seeing errors on the filer 10g interface (only 1 of them) but after the reboot there were none. The switch wasn’t seeing any errors, only the filer.

Data has been changed below (Network and Address):
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Collis Queue
e0a* 1500 none none 0 0 0 0 0 0
e0b 1500 10.10.5/24 *SNIP* 1m 0 1m 0 0 0
e0c* 1500 none none 0 0 0 0 0 0
e0d 1500 10.10.3/24 *SNIP* 25m 0 16m 0 0 0
e2a 1500 10.10.5/24 *SNIP* 38m 2m 3m 0 0 0
e2b 1500 10.10.2/24 *SNIP* 65k 0 18m 0 0 0
lo 8160 127 localhost 30k 0 30k 0 0 0

Thursday, January 29, 2009

Keeping Busy – Akamai changes, Change windows, and QA

Been super busy cleaning up from the move, and trying to make progress on the QA environment buildout and other projects. We need to get QA up so we can shut down the old datacenter, and stop a bunch of replication scripts we wrote. We’ve been building out a lot of VMs, and messing with the Netapp Flexclones and such, pretty useful. Should be done with QA later next week hopefully, but with the late start I’m not sure. It also depends how the weekend release goes.

I am trying to provision a bunch of new properties with Akamai and the China CDN. It’s always slow going getting approval from the government when we provision, but it works out well. Kind of annoyed I bought a Akamai SSL certificate (up to ten domains) and now they need professional services when I want to use a domain off it. Its like nothing is ever simple with them, too bad they own the market, if I don’t like it I can’t go elsewhere

Need to nail down some better maintenance windows and communications about releases and timing, this should help the sanity of all the IT folks, not just my area (Techops). I wish I had a good change management system that was simple and good for maintenance. Every solution I look for for simple notifications and change management and notification is complex, expensive, and overkill for my needs.

QA is using a terminal server to do testing, which avoids them having to do hostfile hacks. It should help the testing accuracy, and we can much better control how they do their testing. Simple fix for an annoying problem!

Wednesday, January 14, 2009

Datacenter cutover this weekend


We are putting the finishing touches on a project which will move us from an old design (many 1U boxes, and some DBs with DAS arrays) to a modern, flexible infrastructure.  In the new design there is NAS, Vmware, and much less hardware.   It should also allow us to scale much better while fully utilizing storage and server hardware.  We have plans to launch a new product later this year, so this prepares us to better launch the product.  We still have some loadtesting to do, but we are in pretty good shape. 
After this project is over we have a couple of other large projects to complete before March to help merge together environments in our hosting facility.   Should be a very long weekend, but looking forward to being on the new environment.