Skip to main content

Posts

Showing posts from 2007

Away

I have been very tied up with finals for my classes, but I've enjoyed a quiet period at work the last week. We have a lot of projects and upcoming initiatives for 2008 which I will post about when I am back. I am going out of town for a week, more later. Have a nice new years!

SevOne

Update on this stuff…. We purchased the product, and we are deploying soon. Can't wait to get rid of Vitalnet and get a product which is easy to maintain and scale!

Future Tools

We are having a lot of problems with getting value from NNM still for simple topology based suppression and up/down monitoring. We have a new account team and a new technical lead, so maybe that will help HP fix our problems. The other option is NNM 8, which we are looking at, but will be missing some featured until 8.1 comes out. We had a very good evaluation and POC with IBM Precision and Impact. Precision looks good, but we need to spend more time looking at it and comparing the amount of work to keep it up and running with NNM. Impact was an easy sell. It will help us immensely with event enrichment and some of the automations we currently have in Omnibus. We'll see how the pricing negotiations work out.

So many tools, so many problems

It seems like many of my more "complex" legacy tools are not setup "right" for what we want to do. I'm always struggling with some of these products. A good example today is HP NNM which is our MPLS, and topology manager. The main goals of the tool are to build a layer 2/3 topology and do server and upstream suppression to avoid alarm floods. Of course none of it works right, and it's a constant main in the rear. We have tried on many occasions to fix this with the help of HP. We have tried numerous suggestions. Finally I got on the phone today with a product manager and got the real story. We can implement some changes which would fix it for now, the downside we'd lose some of our DR capabilities. The product wasn't designed to actually operate in a DR type of way (strange for a market leading product). They have a new version coming out in roughly a year which should fix this, and we'd have to re-architect the infrastructure yet again

VOIP Monitoring Project

Initially we deployed VOIP as a interoffice communication savings. With the presence in India we have, and its growth it has some huge savings attached to it. We then decided to modernize the call centers that we have deployed. That meant we were deploying Cisco solutions that were very network centric to function using call manager, and the rest of the suite. In the next phase we realized that we were spending a huge amount of money on conference calling services. We use a LOT from every vendor you can think of. We even max out some of our vendors when we run conference calls for earnings periods. We started to implement Cisco Meetingplace. The product has been working well, and we are monitoring the same way we monitor any set of cisco devices, and a few windows machines. One of our CEOs was on a conference call and it started to have voice quality issues. That they brought up the fact that we are missing a whole bunch of ways to debug and fix issues relating to voice and o

Traffic Aggregation

We are having somewhat of an issue getting visibility properly across the network traffic. Netflow is useful, but limited in the data it can give you. There are more and more products which really need to see the traffic in order to work properly. I came up with a business case on why we need to do this back in June. Coradiant monitoring of the traffic We seem to miss pockets, and we cannot get a proper view of the backend web server response and quality by looking at data on the edge of the datacenter. VOIP monitoring This will be the topic of my next post Sniffing and debugging issues IDS deployment We currently have scattered IDS deployments, which we need to centralize. It's finally gotten to the point where we want to implement it. One of our network guys came up with a nice design using the Apcon switches, and cisco switches to help with multiplexing the data more easily. This will allow us to implement monitoring and spanning during the day without reconfiguring product

Homegrown Tools

Due to recent organizational changes, I am working closely with other parts of my business. At the same time we are buying another large company. We now have 3 strategies involved in the company that I find very interesting and conflicting at the same time. I know that I feel open for change regardless of direction. It just needs to make sense financially and have a solid technological foundation. It needs to meet the needs of the consumers of our tools internally and externally. One of our divisions moved off IBM tools for server monitoring, to modified open source tools. The products are cost effective in terms of deployment to many machines, but they need full time developers on staff. My team uses mostly HP OV products on the systems, but we use a lot of other tools for other areas. We have a very good deal from HP on the agent pricing. The company we are buying is a big IBM shop, all they use is IBM. This means we have 3 camps: Open/Free Closed/Diverse Closed/Single Vend

Appliances and Upgrades

We have a bunch of devices which are "appliances" the vendor has slowly refreshed the hardware. Our devices are 2.5 years old, and the new devices came out earlier this year. Now the new revisions of the software are not going to have all of the features on the "older" appliances. The machines are just older CPUs and less memory. They want to charge me something, I haven't seen the quote yet, to upgrade the appliances. I think they will charge me more than just the cost of the new hardware. I'm kind of upset about the strategy to get some sales for the end of the year…. We still pay maintenance, which should include all of the upgrades we need… Not nice!

Capacity Management

We have a serious issue around capacity management, and what the options are for doing a good job in capacity management. I feel there are the following types of methodologies going in order from fewest to most. Ignore capacity and just fight the fires as they arise. Pros: Easy, no additional work involved. Cons: Unhappy customers, lots of fires, no way to budget for growth or system changes. Treat capacity as a overall metric across unrelated systems, networks, and software. Pros: Gives you a overall idea of the usage of your capacity. Cons: No actionable information, thus you don't actually fix any capacity issues. Treat capacity as alarms, where we get a message and a ticket based on a capacity being passed. Pros: Gives you actionable alarms on capacity problems. Cons: Doesn't give you any priority, or prevent the alarms from being ignored (crying wolf). Treat capacity as reports. Pros: Gives you an idea of what you must take action on. Cons: No idea when you will run out o

Appliance v. Software

Along my path as an IT god J I have found myself going back and forth on the appliance model: Year 2000:     Why would I want to buy an appliance, when it's the same software, and I can't choose the exact form factor or implementation of hardware? Year 2003:     I can just buy this box and plug it in and I have a solution. When I have a problem I just swap out the box and load a config file onto it. Year 2005:     If I use my own hardware I can use the same box for 5 things at once, thus allowing the box to do more with the same power and space consumption. Especially when I introduce virtualization. Year 2007:     I'm so sick of vendors blaming hardware, and having the battles between hardware and software vendors. The worst is when you have IBM, Microsoft, and vendor x all fighting. I also like the fact that I know my appliance will be in warranty, because it's a single renewal. Not to mention some of these appliance makers that have been around for a while are

Open Source Downsides

Open source is awesome, weather you need a swiss army knife, or some glue between two solutions that don't quite mesh the way you need them. It's also great for analysis and basic tasks that you need at your fingertips. Its versatile and deep in functionality. The best part, is the cost (aside from the legal ramifications of using it commercially). The downside is that working in a large complex environment where I have responsibility across so much diversity and complexity is that many of my needs are not fulfilled to warrant a platform selection: Manageability Policy Templates Audits Grouping Deployment Ssh telnet wmi remote command Scale Distributed systems globally Failover Reporting Complex reporting needs Logging sophistication Most of these issues are completely missed in most open source tools, as it turns a swiss army knife into a nuclear weapon. There is nothing wrong with it, but either companies need to understand and implement these around open source tools (

Xangati Install and comments

Just got this new small startup's product/appliance installed today. It's an interesting concept. Take the technology developed in IDS and NBADS and implement a monitoring and profiling system designed to monitor for issues and network throughput stats. It's all netflow based, so it's a quick implementation. It's quite interesting, and we'll see how it runs over the next 8 weeks. Looking forward to working with them!

Congrads to Opsware

Been a big fan of the direction and type of software Opsware has been working on since I got involved in this area about 13 months ago. The company is really on target for the needs of companies the size of mine. Its great news that HP bought them on Monday. HP is a strategic partner of ours, and I'm very interested to see what the better integration of the products can do. Specifically: Server/App : OVO, Sitescope, BAC, BPM, and Opsware SAS Network : NNM, RAMS, and Opsware NAS What great integration ideas will come of Opsware PAS and its integration with ServiceCenter and Assetcenter. Very interesting ideas, and we should see a lot of good things over the next 12 months. You are now seeing the fruits of the Mercury acquisition in the replacement of some of the weaker HP products with more solid offerings from Mercury. The migration to these products is still a bit of a work in progress. HPs big missing piece is still event management. OVO is good for this, but it's n

Network graphing tools

We are currently looking to replace Vitalnet, Ehealth, and MRTG/Cricket/Cacti. The products we are looking at are Netscout (which we have a small install for), Sevone (we have a POC running with them now), Netreo, and Nimbus. We have a specific list of requirements, but we are also looking to have it fill other needs as well. More on this as we better define this. Thanks.

SiteScope, BPM, and BAC

We have both products deployed, and we are rolling to production very soon. We just have a bit more testing to do, and some integration testing with OVO for event flow. Otherwise we will try to push agentless as much as possible. We will only use OVO agents when we need more details stats or script execution. Hopefully we can use Opsware PAS to even minimize the use of those agents.

Host Monitoring – Agent vs Agentless

We monitor all of our hosts with an agent based solution. You get the most flexibility, but the administration and upgrade of agents is time consuming and people intensive. I want to move us towards having both solutions. A development box doesn't need the ability to run complex operational scripts, and we overpay for that monitoring. In the future I hope to re-prioritize the monitoring tools for the needs of the environment. More on this as we move forward with the transition from HP to the Mercury tool.

Log processing

I am looking at a way to help network support deal with the huge number of log entries coming from the firewalls. I could use a cool tool like splunk, but there is so much data that the cost is high. I am thinking about using a sniffer logging product maybe. I have to talk to more people, but I'm not sure what the best tools are for the job at hand.

Surface computing

Surface computing in depth demo      http://www.popularmechanics.com/technology/industry/4217348.html Microsoft Milan site             http://www.microsoft.com/surface/ Really cool stuff from Microsoft. The video is very impressive. I saw the beginnings of this back in the day. There are a couple of cool older projects which remind me of much of the stuff that I see in the video: Bumptop      http://youtube.com/watch?v=M0ODskdEPnQ http://www.bumptop.com MIT Media Lab AudioPad     http://youtube.com/watch?v=lxAD1QIv_dw http://www.jamespatten.com/audiopad/index.php Enjoy these cool new pieces of technology.        

HP Software Universe

I will be attending HP Software Universe in mid June. I am particularly interested in this show for many reasons: Mercury integration, specifically BAC/BPM and SiteScope into the old OV product line. Evolution of NNM. Migration of the 2 Helpdesks and Asset systems to a single HP software system. ITIL v3 changes and what HP is doing to support it. MPLS capabilities across other HP tools. UDDI (Systinet) progress.

Interop Las Vegas - 2007

I was privileged to speak at the Interop show in Las Vegas. My discussion was on Datacenter Modeling. Specifically around using Dependency Mapping and the CMDB to effect monitoring. The discussion went well, and I was given quite a few interesting questions to answer. Overall I got very good feedback on the session. The rest of the show was decent. The educational content was a bit too high level. I would have liked to have seen some kind of ranking that you typically see in other shows. A good range of rankings would be introductory, executive, intermediate, advanced, expert. The show floor was huge and excellent. I met quite a few interesting vendors, which I am interested in perusing. Overall the show was useful but not amazing. Some of the all day workshops would be more useful and in depth. I would have liked to have attended one of them, but I didn't have the time to do so.

Configuration management requirements

I seem to be working on several configuration management projects at once. Everyone uses a different set of scripts, or tools. No one has any documented information about what they are solving with the systems, and what they want to do. They seem to have started with the "backup the configuration of x" requirement, and evolved into: Monitoring (Error detection, service outages, checking for known problematic conditions) Capacity management Reporting The issue without having any documentation of what the systems intended uses are, as well as current uses would make standardizing on software much easier. I'm battling trying to accomplish this, and it's very slow and painful. I think authority will help force changes, which should start to happen here.

Netbrain Workbench

We are using a really cool piece of software called Netbrain. The tool has many overlaps with lots of tools we use and are looking at. It's a really nice "workbench" product for troubleshooting and designing changes. The person who runs the company is incredibly smart, and is a CCIE. It's new and early for this firm, but it's worth checking them out. The product has many interesting ideas and capabilities.

Speaking Engagements – Splunk / Interop

I will be speaking on behalf of Splunk about how we use the software, where we want to go with the tool. It's a very interesting product, which is used in other parts of my company as well. I am happy to share our experiences with this excellent software. The event is in New York on May 1 st from 10am-1pm. Please feel free to email me or contact Splunk if you wish to attend. I am also going to Interop and I will be speaking about some of our management issues around complexity and the relationships between applications, servers, and networks. The show looks very interesting and there are several sessions I am looking forward to attending. I will post more as it comes closer. We are still 1 month out.

London and Security/Privacy

I was in London for a week on vacation. I was hanging out with my Brother for his 30 th . It’s crazy over there how little privacy you have. There are cameras EVERYWHERE, and where there aren’t they have trucks that drive around with mounted cameras. There are speed cameras, cameras in the cabs, cameras in the tube, and in bars. Over the last 5 years London has overtaken NY as a financial and banking capital of the world. More deals and companies operate there. The reason is the over regulation of the USA. Much of this is due to SOX and other laws. Now that the NYSE owns Euronext, which operates many of the exchanges in the EU and UK, will we be changing? Will the changes entail moving towards more regulations in the UK, or will the US realize these oppressive regulations are hurting the US economy? With all of the security and regulations in focus everywhere it’s important to know the landscape. I feel this is getting gaining relevant knowledge. I do this

Onaro Sanscreen

We have been trying (since our last major storage outage in November 2006) to get this product configured properly for our environment. The issue is that we run so much old software to manage our storage that it's a constant struggle to get things working together. Our storage team is too busy to devote time to getting products to help them operational. We need to come up with a plan to keep management software current. This includes all of our tools. The product is excellent for the operational needs, but it's not something that I think our team would use well in engineering. I do see value in the storage architects using the product. We'll see how it pans out. The salespeople at Onaro are very aggressive, which can make them hard to deal with. On a side note, their performance product looks very nice, and it's something the business side has been trying to get for years.

Network configuration management – Wrap Up

I am working on wrapping up a couple of projects so I haven't been blogging as much. I am currently finishing up our network configuration management project. It's been going very well and we are finding so many uses for the technology: Here is a snippet from our business case on the product: This product enables large amounts of functionality that we don't currently have from the CiscoWorks LME implementation. The major advantages are multi-vendor support (Cisco Wireless, Cisco, Nortel, Checkpoint, PIX, and F5). The product tracks all changes, captures configurations, and allows for software and configuration upgrades centrally. It allows for dynamic complex grouping, enabling us to track the environment when devices are added and removed. The tool allows for policy management, inventory, and vulnerability management. Ability to proxy into the legacy environment to manage devices we've never been able to access. Opsware NAS allows for the sharing of information

Research group?

Can someone please post comments on these questions. Does anyone here work for a software or IT company? If you do, and you design product, or deal with software/IT architectures, how much time do you spend doing research? How much of that time do you spend using Analysts and web searches? Qualifying technology and direction? Do you have an internal department which does research, and helps coordinate that stuff? I find that lots of people who use the analyst seats are the senior management, and those people don't need to use analysis. I am fighting over Gartner right now, but it's very frustrating.

DST?

Its funny how many of my house appliances and cars and such just have the Daylight Savings Time hardcoded. It's going to be a pain, because not only will they not change this weekend, but they will change in a month or two. I've heard a little press about the change, but it will cause outages and issues. It will not be the non-issue that year 2000 was when it happened. As for my company, it has been in planning at our company since August 2006, but it wasn't seriously engaged until January 2007. We've been frantic trying to get things fixed in time. We have done as much as we can, so we'll see how it affects us. If we had better auditing, and deployment tools this would have been a lot easier to manage. I am hoping that this gives me and ammunition that I need to convince people to start taking this more seriously. We will start a larger Opsware SAS pilot once DST has passed, and we've fixed those issues which popped up.

Coradiant Meetings @ Montreal

Had a great trip to Montreal to meet with the Development and R&D people for Coradiant. This product has been great for us, there is some learning curve to using the product effectively, but there is no shortage of uses for their excellent products. They are always thinking ahead, and we are trying more and more to leverage their technology for monitoring, troubleshooting, and analytics. I met with some of the product folks, support people, and development leads. Really good talent making great tools. Keep up the good work.

Opsware NAS progress, Alterpoint setbacks.

Opsware has been very helpful to us with the NAS product. Its been doing everything we need and then some. The reporting is excellent as well. We are going to present the solution soon to the rest of network engineering and architecture. Alterpoint was supposed to start over a week ago, but they had to push it off due to concerns they were having with the deal. I think their product is excellent, and I'd like to get it in the hands of the folks who use these tools day to day and get an honest opinion of it. Alterpoint has a great reputation, and I'm looking forward to evaluating the solution.

MOM 2007 RC2

I installed, and I am playing with MOM. We may decide to evaluate it as part of the consolidation project. Its a lot better now with this release, and the linux support seems more solid (provided by Quest software). Still have a lot to test (such as network devices, etc). More later.

Vista on primary machine

I took the plunge 2 weeks ago, and installed Vista Ultimate on my primary work laptop. I love the new OS. Its a lot faster, easier to use, and better. I have been running the betas and release on my main desktop at home, but I was prevented from using it for work since we didn't have a workable Nortel Contivity VPN client. Now that I have a beta that finally works, it was all systems go. Issues: We disable the built in firewall here at work, so I am running the Jetico Firewall betas which work on vista. Zonealarm doesn't work on it, which is odd, since they built the windows firewall initially. Our antivirus doesn't support Vista. I am running the open source clamwin, which seems to work well. I don't get viruses anyways. The built in indexing was working fine for the first week, but now it doesn't index my pst that is open in outlook. I am using the inferior google desktop, but it works fine :) Things I like: Sidebar is really nice, there ar

Opsware NAS POC / Alterpoint

The POC went off without a hitch regarding the network tools. So far its been very helpful in reporting and understanding our infrastructure. The product is working perfectly as to our requirements. Alterpoint was scheduled to start the POC today, but we had to move it due to some political issues. Its been quite confusing, but they seem to have sorted things out internally. The sales person I am dealing with has been quite removed from the opportunity, but they are fixing that.

Forward plans

Just posting a note that we should have much more solid plans after the 5th. This includes our forward plans for what we are planning on tacking and when. We will also know how the resourcing will look at this is a lot of additional work for all of our teams.

Application management

I work really hard to get a tool that people need across the organization, and I can’t seem to get any takers for trying it. People are so busy doing the day to day, and they are unable to step back and spend time on making efficiencies. We spend a lot of time chasing issues, and we have no way to audit configurations, servers, and applications. If people were to step back and actually look at the technology hey maintain and support and the tools they use they would have a lot more time to manage projects and a lot less time doing operational work. I’ve asked my boss to help me work out these issues, and get people to pay more attention. The last thing we need is a took which is in use by our groups in TechOps, but not in use by the application support, qa, and development people. Another frustrating Friday... i'll be at work until 730 again :(

Datacenter Automation Decisions - Rev2

As referenced by my last post, you can see this is high priority for everyone at my company. We have decided that Bladelogic can fit one area very well, but the vision, strategy, and direction of the company is not in line with our needs. Bladelogic had a stronger POC team, and better talent as well. If Opsware can do what we need for the application configuration management and deployment, then it would be the best choice for the following reasons: Dependency mapping and use of the agents to do that. The EMC/nlayers solution is better but not feasible for our network Opsware has an excellent visual application manager which can help us troubleshoot problems and changes quickly. End to end view of Network and soon Storage assets. We have started a POC with the NAS product, and it can deliver very good network data to the server tool. Alterpoint is next week, who Bladelogic partners with for network info in the tool. The integration is not as tight, which is understandable. S

Consolidation

As part of our overall technology consolidation across my company we have determined a high priority list of item as follows: Technology Sub-Technology Need Difficulty Phase Server Application Deployment 10 6 1 Basic Host Agent 10 2 1 Realtime Performance/Capacity Planning 7 2 1 Bare Metal Provisioning 2 3 5 Configuration Management 4 4 5 Patching 4 1 1 Asset Management 8 2 2 Database Realtime Performance/Alarming 10 8 3 Configuration Management 2 5 4 Application/HTTP Real User 10 3 1 Synthetic Static/Transaction 5 5 4 Outsourced Monitoring 8 6 4 Java/NET 6 9 4 Reporting Outsourced Web Analytics 2