Skip to main content

Posts

Showing posts from 2008

Akamai EDNS

My team is running yet another move from internally hosted DNS to Akamai EDNS. I've done the same at CCBN and Thomson on other projects previously. The product is great, and the design is superb. We should start moving on Tuesday for web forwarding (from godaddy) and using Akamai EDNS on Wednesday. I look for improves response times and much better DNS resiliency than we had before. More when its implemented J

Been in Geneva doing a datacenter build out

Built out our new datacenter, I was in Geneva for the last 12 days. We are getting ready to move our US hosting facility to Geneva to get ready to merge the MFG and sourcingparts environments. Its going to take several months overall, but the first steps are going to occur in the next few weeks. Everything went really well, we installed and moved a lot of gear, servers, F5 Big-ips, (Unknown Vendor) HA firewalls, and a nice Netapp 3040 dual head cluster. The only issue was that we got a quad 1gig card in it versus the 10G copper card we wanted. Netapp only makes a 10G fiber card, so we had to buy different switch modules as well. Netapp ate the extra cost on the cards for us, which was nice. We also can't run LACP on the 10G fiber cards, so we lose switch redundancy. The Netapp is currently serving Vmware ESXi over NFS, and it screams. We also are using iSCSI for our MSSQL clusters. The speeds on Vmware and iSCSI are very good even with the current 4G we are running unt

Trip to Geneva

I arrived in lovely Geneva on Tuesday morning, and let some of the guys, as well as one of the engineers who works for me here. Great cool city, reminds me of Banff. We spent some time in the datacenter preparing, and getting acquainted with the staff and facility. We unpacked all of the gear, around 4 tons total, it was a lot of work. Today we started racking and configuring after working on some stuff in the office first. Trying to install some new DCs, access points, CA, and RADIUS authentication for 802.1x using dd-wrt. Still need to finish this work tomorrow, we also have Netapp coming in to finish the configuration of the new FAS3040s we bought. Had a great dinner with some of the guys here in France, and doing a bunch of work tonight to catch up. More as we progress, I'll be here for another 8 days working on the datacenter. Nite.

Week in review – Hosting Move planning, and other projects

Had a productive week, and didn't kill myself. I write this from the office on Saturday, I am working on fixing our fileserver which isn't configured properly (on the disk config). I and a colleague booked a trip to Geneva for 12/2 for 10 days. We finished getting everything out the door for that work, aside from the f5 cluster, which we are still working on. We swapped our two production boxes out for a loaner on Wednesday, that went ok. We are still working on our new hosting contract in Geneva. Everyone wants no downtime, but we need time to build the environment. I didn't have the cash to buy everything, so we have to do some shuffling of equipment. Much of that is repurposing machines with ESXi and moving the old machines to VMs. Other projects we are working on include a Microsoft licensing deal, new web conferencing, office videoconferencing, a new ticketing system, and some project planning.

Bad bad week, kinda…

From REALLY BAD to AWESOME J REALLY BAD : We had a little exchange issue this week, which was not recovered from so well. That's fixed going forward. SOMEWHAT BAD : On top of that my biggest project is starting to be realized, which is a datacenter move. That's going okay, but we have some interesting swapping to do. I need to move my f5s this week, and ship a whole bunch of networking gear. I have a Netapp, and lots of servers ready to be installed. BAD: The issue is that my new contract isn't done yet, which is another thing I have to do. I also just realized we aren't going to do managed backups due to the $5k a month they want to charge us, so now I'm ordering a library as well. BAD : Found out we opened a German office without any IT signoff, no firewall, connectivity issues, and the staff there speaks no English. Trying to get my hands on that, but it's a bad situation. Need to order some gear. BAD : We have 4 web conferencing tools in use he

Changes at my Company

I'm not running the group as of 2 weeks ago. That means all operations, enterprise, production, networks, systems, etc etc. I have a good crew of guys, and looking to better augment some of our staff. Happy to have been given the opportunity, and looking forward to making an even larger impact than I have up until this point. Lots of things to get in order which are eating my time aside from all the project management, purchasing, and other engineering I also have to do. Its been very crazy. I'll be posting more on some of the highlights of this week as well.

Monitoring, Reporting, and other stuff :)

Over the last week, I was sick a couple days. I used some of that time to build a cool management dashboard based on data coming from various tools: Coradiant – E2E, Host, Status Codes, Sessions, good versus errored Google Analytics – Pageviews ACE – Ticket metrics We are also evaluating the Dejaclick product from Alertsite to enhance the basic remote reporting we are using. The tool is very slick, and the pricing is pretty reasonable for what you get. We are also going to be looking at Coradiant Truesight Edge next week to get more data from our Akamaized traffic (which most of it is). Other than that just ordering a bunch of RAM for some boxes we are turning into ESXi machines, and also ordering some new Dell switches to run 10Gig Ethernet to the new Netapps we are putting in place.

Domain fun

I have been busy moving about 400 domains to godaddy. It's been a lot of fun (NOT) consolidating across our current 14 registrars. Once this is done we can start to do web forwarding and migrating onto our new DNS infrastructure. The new one is fully split, where we have a proper external, internal, and update system running. Good design, and should serve us much better.

Enterprise Fun

I am rewiring a bunch of servers, and reconfiguring a large fileserver on Friday. We need to repartition a 8TB volume into smaller slices. I'm going to use the knoppix CD and some of the tools on that distro to move and recut the NTFS partition so that we can run VSS. Progressing on the Exchange 2007 migration, everything seems to be 100% now, all we have to do is get some of the mailboxes small enough so we can move them over properly. We are shooting for under 500MB, too bad some of our bad users have 7G mailboxes L Pain.

Outages

Our site has been having a lot of issues, so we offloading the image serving onto its own 2 boxes. We also added another 15 spindles on the production DB server, which seems to be helping a lot. We are seeing higher traffic volumes on our Coradiant reports, and we are also seeing lower latency. Good work by the whole team to fix the environment.

Licensing

Working on some licensing for Redhat as well for our new environment. I don't really like dealing with Redhat, but in order to have a good supported OS we have no choice. They are already using it in some places. We were deciding if we were going to switch to CentOS, but figured it was smarter to have the fallback on a vendor.

Vmware and Firewalls

We've been pushing more esxi here, and we are going to run our remote offices off a single esxi host running a domain controller, exchange server, and vpn server. We did the same setup for both Shanghai and Geneva offices. We are going to ship them with the new firewalls which should be a good setup, and easy to manage.  

Tons of progress this week – VM, Exchange, Firewalls, Security, Storage/NAS

Had a very productive, and busy week. Balancing out server crashes taking down the production site, and the build-out project we are working on the team was very busy. I built over a dozen VMs, and 3 ESXi boxes. I also worked on some of the config to finish out the main exchange 2007 implementation. We are waiting for ESXi boxes to ship to the remote offices, which will house a mailbox server, a domain controller, and possible a client access server. One of my colleagues (who knows the setup of the WAN well) is working on the configuration and testing of our new firewall infrastructure. We are putting in all Sonicwall NSA series appliances. We have a 2400 for Shanghai, and 3500's for Atlanta and Geneva. The production environment will be off a pair of HA 4500 series boxes. The features, ease of setup, and price were excellent. We also have good resources from the reseller and the vendor if we need them. We are also running a pilot (which has been going quite slow) of the Q

Updates on some projects – NAS/Firewalls

Senior management has given us a date of December to have our current production hosting here in Atlanta (colo, downtown) moved to Geneva (hosting facility we have there). In order to do this we'll need to virtualize a lot, and build out a much better network and storage infrastructure. This will also help us in the future as we grow the business as well. We pretty much decided at this point that Bluearc is a better solution for us. It was very close between them and Netapp, but it was a matter of performance over more advanced software. We are looking to wrap up the deal soon and get the purchase completed. We have also decided to go with a sonicwall solution at 5 locations. The product looks very good, and will enable us to have a proper VPN mesh between all sites. We will also be replacing some of our web and spam standalone filters with the new NSA UTM device. Making some plans to move our Sharepoint and Spam filter from our colo to the corporate office. Not much else

Week updates

Been working on a F5 which is having issues. I was going to reload the OS, but I think I can fix it. Still working on it. Waiting for a reply from support. Finally figured out a way to export from sharepoint collection, and import them into the root collection. Something I'll need to do before we can reorganize the structure of our sharepoint here. I should have this finished early next week. Have to schedule some downtime to do the export/import. Got my first ESXi machine, with the embedded hypervisor. Me and Jamie will rack it up and get it running on Monday, they we should be good to start loading the backlog of about 5 VMs to start. Should be making a decision on NAS and Firewalls next week. Have to nail down the specific configs and pricing. Threw SQL 2008 (DB, SSAS, Report server) on my development box, seems pretty cool so far. Completed the implementation of DB monitoring and machine monitoring on production and enterprise hosts. I think that covers this week's

Wireless networks using certificates painful

Another project which is being annoying is hooking up a few hacked dd-wrt boxes to my Active Directory CA by using IAS. Still doing some testing, and I'll post my findings and details once the team figures out why Windows XP doesn't seem to want to work (we are trying SP3 now). Works like a charm on Vista J   Love Vista.

Sharepoint annoyances

Have a bit of a time trying to fix up this mess of a sharepoint site. First thing we got our 3GB database off SQL express. Now we are on standard, I need to fix up the database itself. Myself and a co-worker (Steve) are having a tuff time figuring out how to restructure the site, since it was setup in a very odd manner. I think we have some good ideas, so lets hope we can get a good structure to it. I would post the hierarchy we are moving towards here, but I probably shouldn't J

Storage/NAS Evals

We are looking for a strong NAS system, I'm leaning towards a clustered scale up/out system versus buying a box that I have to replace every 4 years. I think Isilon is the proven leader in this space, and we've selected them to go up against the king (Netapp). Here are the criteria we used to get it down to these two. We are looking into them in depth now.   Requirement Sub-Feature Weight Architecture             Centralized Management 6     Clustered Device (NAS, Controller, and Power) 10     Appliance Model 8     Add storage with no downtime 10     Add bandwidth and nodes without downtime 10     Add cache without downtime (increase IO) 8     Auto balance IO across disks and connections 10 Security             AD Integration 10 Data Management             Tiered design 2 or 3 tier 8     Migration of data based on usage and other vars 8     Content aware 5     Ability to snapshot multiple times, and replicate a snap 10     Rapid snapshot 6     NDMP support 5     Backup Exec

Things happening - Firewalls

We decided to evaluate Sonicwall and Cisco based on our assessment. We are digging into depth in the next week or so. Here is the criteria. These are needed for both hosting/collocation and for our 3 major offices:   Requirement Sub-Feature Weight Architecture             Centralized Management 10     Clustered Device 8     Appliance Model 8     QOS management 6     Appliance must support up to 2G (external) and 3G (internal) 10 Security             Stateful Firewall 10     Full packet inspection and content filtering 10     IPSEC VPN Support 10     SSL VPN Support 8     OpenVPN Support 8     PTPP Support 9     AD integration for authentication 6     Anti virus/Auti Spam/Anti spyware 4     IDS/IPS 9     Enforce desktop patchlevel and AV, Quarantine user 4     Wireless security 4     Behavioral analysis 7 Management             Reporting via Web 8     Email reporting/Monitoring and alerting of issues 8     Usage/compliance reporting per user (AD integration) 6     Backup/Restore 8  

Missed a couple things

I'm moving us onto SQL Standard from SQL Express which is what the corporate intranet site is running. That upgrade was tested, and I'm doing production today. I've also setup some proper backup jobs for the database. Starting tomorrow, I (and another engineer here) are redoing the Sharepoint structure, permissions, etc. It should be a lot more clear, and easier to understand what people are up to. I haven't done a lot of Sharepoint administration, so it should be a good learning experience!

Recap of week 1.25 :)

So first week on the new job, and making some good progress. I am learning the infrastructure and some issues that have been bothering us. We have done the following items: Monitoring Redid the Coradiant Truesight setup to better catch items and view backend information. Got visibility to additional network areas. Implemented Solarwinds IPMonitor. We are installing it at the colo and at our enterprise office. Testing Idera DM, deciding if it will work for us. We need better DB monitoring and diagnostics. Infrastructure planning Did initial grading of clustered scale up/out NAS solutions. I will post more details as the project progresses. Did requirements for new firewall solutions, still have yet to nail this down and grade them. Built plan around fixing exchange, and moving to a multi-site international infrastructure on Windows 2008 and Exchange 2007. Started planning a DNS revamp, and proper split domain configurations. Working on a new wireless implementation as we speak, us

The fun begins

I had a smooth 15.5 hour drive down to Atlanta from Boston on Saturday. I have to do it again with my girl, cats, and other car in a couple weeks. Just working on renting my place in Boston, should be done in a couple days. I got my place on Sunday, which is awesome. It's on Grant Park, and It's huge and empty. The city is great, and I'm doing a reverse commute. I've been digging in on some things (yes its only day 2) Looked at some network issues, but I haven't determined anything really yet. Sharepoint migration plan from SQL Express to SQL Standard. Temporary VM setup for testing Purchasing a proper VMware box for real dev VM Looking at some NAS vendors, and putting together a requirements plan. I will post that once we nail it down today or tomorrow. Support for Virtualization (ESX clusters) Exchange clustering SQL clustering Located in Geneva development, Colocation, and Corporate office for DR/replication of data. Looking at deploying www.dd-wrt.com an

NAS NAS NAS

I've onlyr eally delat with a couple scalable NASes, and the EMC Celera was a nightmere for many reasons.  I always did like Netapp, they work well and are easy to manage.  Now there are the "next gen" NASes out there, at the new gig we need something good, which can scale easily and run clusters off MS and Linux technologies.  I was looking at the following "cool" vendors.  Has anyone used these before? These are in order of which ones I think are coolest: Isillon - I know we have some at my current company, and they work well.  Nice product, no idea on costs. Onstor - Seen these guys when I was looking at 3par, product looked good, cheap, and solid. Netapp - old faithful is something thats good with storage :)  Easy to find people who've used it.  Who hasn't? Bluearc - seems good Ibrix - Looks interesting since you don't need to buy hardware, but might be too "out there" Pillar - looks okay...

News time- Moving jobs and moving locations

Now that I told most of the people who I work with, and the vendors who I deal with often.  I've decided to accept a position at a very exciting startup.  The company is located in Atlanta, so I'll be moving down south.  It should be interesting since with the high growth Atlanta is not as "southern" as it could be.  The company name is MFG.com (www.mfg.com) and its a very interesting business.  They match up companies who want to have parts manufactured with manufacturers.  The kicker is that they have a good presence in europe, and a large presence in China.  It allows for smaller shops to get the benefits of the big companies regarding leveraging the global economy.  It has some very high profile investors, and has a profitable, and fast growing business. I will be doing all kinds of infrastructure work there.  There is a hot list of major issues which I hope to start to tackle right away, but I'll be moving back into a role further from the tools and techn

A bit crazy

Sorry I haven't updated you all.  I will be updating on Friday.  Mostly its to announce something which I am going to be undertaking.  It should be interesting, and a bit of a shift back to my roots. More on Friday!

HP Software Universe

I attended and spoke at HPSU this year. It was a good conference, and HP is making a lot of progress on the products which we are very interested in. I'm going to go into each of these and explain some of what I saw, and what we are up to with each product. HP NAS New UI is coming down the road. Additional of bare metal provisioning. Internally We are moving from 7.0 to 7.2 here in the next few weeks. We are just finishing up a MS-SQL Multimaster setup with the UK this week. It's been painful, and taken us 4 attempts now, but we are close. HP SAS Some interesting stuff with other customers, best practices, and other tips on it. Internally We are moving towards a small SAS deployment in 2008, probably 200-400 systems. Just firming up some budgets, since we are splitting the cost among a couple budgets. HP OO Tons of new content, which makes OO above and beyond. Multimaster is coming down the pipe, which should be really nice. Since the platform of OO is very similar to N

HP Software Universe 2008

I'm giving a talk at HPSU on our conversion off OVIS onto BAC, BPM, and Sitescope. Should be pretty good, and straightforward, it's taken a while to get it all baked and integrated, but things are progressing well now.     Are any readers planning on attending the show? It would be good to hear from you guys who are, and we can meet up. Thanks!

HP – Real User Monitoring (RUM)

I like what I see from HP RUM, my issue is that the pricing and model for it don't work in a shared environment. I can't nail down each application, webserver, application server, or environment. I prefer to deploy it at the edge of the datacenter and just use it as needed. This is how we deploy and use our current real user monitoring from Coradiant. HP has been trying to be flexible, and create a pricing model that works for us over the last 6 months, but we haven't made the kind of progress that I need to see.     We can't test or deploy a solution which isn't cost effective compared to what I am getting from my current vendor. I don't want to POC the product if it doesn't make financial sense. Hopefully at HP Software Universe we'll make more progress, and enable us to test the product. Ultimately the solution is very compelling and fits nicely with our usage of Business Availability Center (BAC) and related products.

Tool Replacements

I have a couple of short term tool replacements we are gearing up to normalize toolsets across the company. We are a HP NNM shop, and we've decided to finally get rid of it, due to the numerous problems we've been dealing with over the past several years. The amount of work that this tool needs is immense. This is why HP is doing a ground up rewrite of it for 8.0. We can't wait for it, since its not mature enough. We are implementing IBM ITNM (Formerly Precision). We are also going forward with enhancing our event management by using IBM Impact. Both of these products will provide a lot of value for us in terms of efficiency and increasing the effectiveness of the event management layer we already use from IBM. We are also looking to get HP SAS in for a small install which will scale out. It's looking like 400 systems to start with. More on that as we nail down the project and timelines. More on the HP side, we are also looking to scale out HP OO and start usi

IBM Pulse 2008 - Review

I spent Monday-Wednesday at IBM Pulse in Orlando. It was a good show, but quite a few of the sessions were full when I arrived. It was frustrating because they didn't offer them more than once. The morning sessions were mostly pie in the sky, and not very useful to me. I got to spend a lot of time with senior people in engineering, architecture, and acquisitions/strategy. I also got to meet people I knew from online or other dealings with IBM. Overall, the show was a good use of my time, and I found it enjoyable. Here are some of my highlights: ITM 6.2.1 improvements including agentless capabilities and such. New reporting framework based on BIRT which will be rolling forward. New UI which is being pushed and was on display from TBSM 4.2. Hearing about what other customers are up to (mostly bad decisions from what I've seen). Affirmation of ITNM (Precision) as a best of breed tool, with a excellent roadmap. Some things which are bad and make no sense: Focus on manufactur

Week in review

Been working out some organizational stuff with my counterpart of the company we purchased. I've also been spending time figuring out the future toolset we will be using for server/app monitoring, event management, network monitoring and management. It will be a combination of our toolsets based on what makes sense. Automation is a huge goal for us of course. Need to put together some documentation on what we have, and then complete a strategy in the next 4-5 weeks.

Conference Updates

I am speaking at HP Software Universe in June in Vegas. It should be interesting. I am talking about our conversion off OVIS onto HP Sitescope, HP BPM, and HP BAC. We are very happy with the new products, and most of all my monitoring customers absolutely love the visibility and clarity these new tools give them. I am attending IBM Pluse as well in Orlando this weekend. This is my first IBM show, and I'm very interested in ITNM (which we are moving to from NNM in the next 12 months), Omnibus, Webtop, and some of the other IBM tools we use. The show doesn't seem as well put together as shows from CA, HP, or EMC. We'll see how it runs. The venue is definitely questionable… I'm not a big fan of the Disney junk J

Been a little out of it – Updates on my job

Once again I've been quite busy dealing with the integration of a large company we purchased. They have a lot of major toolset issues which need to be worked through. I feel there are a lot of architecture problems with what they are using for monitoring. Since the project is 18 months old, and not progressing as well they are still sinking money into a sinking ship. Should be interesting to see when they want to scale this back, or actually design a portfolio which is best of breed, versus trying to implement a full solution from a single large vendor. Should be a matter of time…. Then they will ask for my help and I'll be willing to assist. I don't want to be the architect when I can't make major changes to the design which has a lot of flaws. I am interested in potentially running a transformation group, which would include global reach of automation, virtualization, etc. This would be a new area for us, and we need to get serious with products like HP OO, SA

Travels

Been travelling a bit the last week, and I am off to London for most of next week to work on strategy and organizational structure. I was offered a seat at IBM Pulse, but I don't think I'm going to make it. I will still be attending HP Software Universe in mid-June, which should be interesting given the advances in the HP products from Mercury and Opsware integrations.

Toolset Changes

    As many of you know, we just completed a large purchase of another company, so in the next few weeks I will know what I am going to be doing. In the meantime we are figuring out how to converge our toolsets. There will be more activity here as we work through the issues and determine the final set of products we are going to be pushing out across the universe of the company. Based on some of the deals we make with IBM and other vendors it could also create other areas where we can standardize. More to come in this area.

IBM POTs

This week we were offsite at the tech center in NYC for a day trip. We looked at IBM Tivoli Provisioning Manager (TPM), the provisioning, deployment product. It is one of the products we are considering standardizing on. I wanted to get clear picture of what it can and cannot do versus Opsware SAS. The product looks good, but I still need to write up the full gap analysis. It definitely would meet most of our patching, inventory, and deployment requirements, but it doesn't fill the system administration, or complex audit and control requirements we are given due to customer audits and regulatory compliance. Last week IBM brought us a POC for IBM Tivoli Monitoring (ITM), which is the monitoring platform. It compares to HP Openview Operations (OVO). We are going to bring it in house to do more testing, but upon the initial 1 day with the product, we found the following comparison to be true:   Issues in POC: Multiple times the agent died, and the server died. There was no in

Realtime market data systems

I don't really talk about this much, but we deliver a lot of real time market data to thousands of customers. Part of my responsibility is running a group which monitors the real time environment. With very strange things like multicast, and other abnormal requirements that most standard products don't deal with. A good example is the exchange holidays, open and close times for each exchange out of the 200+ exchanges we bring feeds into global POPs. The infrastructure has more changes by development than anything else at the company. Figuring out the proper "state" is next to impossible, which make monitoring a challenge to say the least. They have a custom tool, which we are working with the developers on to get a web services interface so we can better understand state before presenting a false alarms to the Realtime operators. We are cleaning up the rules by pushing the responsibility onto developers to write the proper rules. This should fix things as we aud