Friday, December 28, 2007

Away

I have been very tied up with finals for my classes, but I've enjoyed a quiet period at work the last week. We have a lot of projects and upcoming initiatives for 2008 which I will post about when I am back. I am going out of town for a week, more later.

Have a nice new years!

Friday, December 7, 2007

SevOne

Update on this stuff…. We purchased the product, and we are deploying soon. Can't wait to get rid of Vitalnet and get a product which is easy to maintain and scale!

Future Tools

We are having a lot of problems with getting value from NNM still for simple topology based suppression and up/down monitoring. We have a new account team and a new technical lead, so maybe that will help HP fix our problems. The other option is NNM 8, which we are looking at, but will be missing some featured until 8.1 comes out.

We had a very good evaluation and POC with IBM Precision and Impact. Precision looks good, but we need to spend more time looking at it and comparing the amount of work to keep it up and running with NNM.

Impact was an easy sell. It will help us immensely with event enrichment and some of the automations we currently have in Omnibus. We'll see how the pricing negotiations work out.

Wednesday, November 7, 2007

So many tools, so many problems

It seems like many of my more "complex" legacy tools are not setup "right" for what we want to do. I'm always struggling with some of these products.

A good example today is HP NNM which is our MPLS, and topology manager. The main goals of the tool are to build a layer 2/3 topology and do server and upstream suppression to avoid alarm floods. Of course none of it works right, and it's a constant main in the rear. We have tried on many occasions to fix this with the help of HP. We have tried numerous suggestions.

Finally I got on the phone today with a product manager and got the real story. We can implement some changes which would fix it for now, the downside we'd lose some of our DR capabilities. The product wasn't designed to actually operate in a DR type of way (strange for a market leading product). They have a new version coming out in roughly a year which should fix this, and we'd have to re-architect the infrastructure yet again.

    The good news is that we are ahead of HP and we've been using products they have recently acquired. Our use of these products and the fact that we are ahead of the game makes them want to talk to us. This is a good thing, because we can help create the future for them. Its helps us and it helps them.

    In the meantime, I'm deciding if we can afford to do a short term fix, and then a major re-architecture in a year. I want to investigate IBM Precision, as its really slick and nice integration to Netcool Omnibus/Webtop would make our lives easier. It's really going to be a matter of cost and functionality. We looked at Precision several years ago when it was a Micromuse product, it was recently acquired by them and the product was not "working" in our environment. It was a young product, and its been completely rewritten since then. I have high hopes, we'll see how the POC goes, which should be well underway in early December.

Monday, October 22, 2007

VOIP Monitoring Project

Initially we deployed VOIP as a interoffice communication savings. With the presence in India we have, and its growth it has some huge savings attached to it. We then decided to modernize the call centers that we have deployed. That meant we were deploying Cisco solutions that were very network centric to function using call manager, and the rest of the suite.

In the next phase we realized that we were spending a huge amount of money on conference calling services. We use a LOT from every vendor you can think of. We even max out some of our vendors when we run conference calls for earnings periods. We started to implement Cisco Meetingplace. The product has been working well, and we are monitoring the same way we monitor any set of cisco devices, and a few windows machines.

One of our CEOs was on a conference call and it started to have voice quality issues. That they brought up the fact that we are missing a whole bunch of ways to debug and fix issues relating to voice and overall network quality. We came up with the following criteria, and set of vendors we are looking at:


 

Metrics

These metrics must apply across anywhere that KPI is used : Jitter Delay, R-Factor, MOS, Cocec Transcoding, QOS (load, cpu, memory)

Monitoring and Packet Capture

  

Network General

NetQOS

Wildpackets

General

Must Provide 7 days at 4gb/s of buffered capture of full packet data

  

  

  

Monitoring

Provide network and service monitoring based on metrics defined above

  

  

  

Operations

Provide Real-time network packet capture to monitor and troubleshoot problems quickly

  

  

  

Design and Planning

Ability to conduct VOIP quality testing and simulate traffic to measure and fix potential problems

  

  

  

Design and Planning

Provide MOS scores as well as R-Factor values for each call.

  

  

  

Operations

Analyze and track against PKIs to ensure voice quality and performance

  

  

  

Operations

Ability to provide per call analysis and support multiple signaling protocols for voice

  

  

  

Operations

Analyze packet level details of RTP, RTCP, MGCP, SIP, H323 streams

  

  

  

Operations

Evaluates packet delay variations, packet loss, jitter,

  

  

  

Operations

Provide Application Response Time (ART) analysis

  

  

  

Operations

Provide full 7-layer decodes, alarms, triggers

  

  

  

Design and Planning

Provide call playback or replay of captured data to hear and see VOIP impairments

  

  

  

Operations

Ability to extend monitoring and troubleshooting at remote locations

  

  

  

Operations

Perform analysis of mission critical applications both in real-time and post capture.

  

  

  

Operations

Provide Efficient per-call filtering to expedite problem resolution and minimizes packet transfer

  

  

  

Operations

Provide clear identification marking of VOIP calls, video conference calls and other services

  

  

  

Operations

Utilize standard SNMP and RMON probes. Ability to monitor relevant OIDs

  

  

  

Operations

Capable of importing or exporting file to standard formats

  

  

  

Monitoring

Monitor CISCO and other telephony devices from AVAYA or NORTEL.

  

  

  

Operations

Analyze call setup protocol of SCCP

  

  

  

Design and Planning

Measure network performance of RTP, MGCP, H323.

  

  

  

Display Web Reporting \ Dashboard view of information and events

  

  

  

  

Executive

Reports generated in HTTP web format for easy viewing. No use of plug-ins or Java on the client.

  

  

  

Executive

Reports can be formatted or imported into a dashboard style display

  

  

  

Executive

Provide Canned and Custom Reports

  

  

  

Executive

Historical data archived or saved for 1 year, and exportable via automated manner to standard DB platforms, MSSQL, Oracle, or MySQL

  

  

  

Executive

Able to import or export files from third party or freeware monitor tools to create reports

  

  

  

Executive

Report on Key performance indicators and measurements

  

  

  

Executive

Provide measurement of SLA levels

  

  

  

  

  

  

  

  

Traffic Aggregation

We are having somewhat of an issue getting visibility properly across the network traffic. Netflow is useful, but limited in the data it can give you. There are more and more products which really need to see the traffic in order to work properly. I came up with a business case on why we need to do this back in June.

  1. Coradiant monitoring of the traffic

    We seem to miss pockets, and we cannot get a proper view of the backend web server response and quality by looking at data on the edge of the datacenter.

  2. VOIP monitoring

    This will be the topic of my next post

  3. Sniffing and debugging issues
  4. IDS deployment

    We currently have scattered IDS deployments, which we need to centralize.

It's finally gotten to the point where we want to implement it. One of our network guys came up with a nice design using the Apcon switches, and cisco switches to help with multiplexing the data more easily. This will allow us to implement monitoring and spanning during the day without reconfiguring production switches. Its something we really need and will help immensely.

Homegrown Tools

Due to recent organizational changes, I am working closely with other parts of my business. At the same time we are buying another large company. We now have 3 strategies involved in the company that I find very interesting and conflicting at the same time. I know that I feel open for change regardless of direction. It just needs to make sense financially and have a solid technological foundation. It needs to meet the needs of the consumers of our tools internally and externally.

One of our divisions moved off IBM tools for server monitoring, to modified open source tools. The products are cost effective in terms of deployment to many machines, but they need full time developers on staff. My team uses mostly HP OV products on the systems, but we use a lot of other tools for other areas. We have a very good deal from HP on the agent pricing. The company we are buying is a big IBM shop, all they use is IBM. This means we have 3 camps:

  1. Open/Free
  2. Closed/Diverse
  3. Closed/Single Vendor

Let's see which will win over the next couple years J

Appliances and Upgrades

We have a bunch of devices which are "appliances" the vendor has slowly refreshed the hardware. Our devices are 2.5 years old, and the new devices came out earlier this year. Now the new revisions of the software are not going to have all of the features on the "older" appliances. The machines are just older CPUs and less memory.

They want to charge me something, I haven't seen the quote yet, to upgrade the appliances. I think they will charge me more than just the cost of the new hardware. I'm kind of upset about the strategy to get some sales for the end of the year…. We still pay maintenance, which should include all of the upgrades we need…

Not nice!

Thursday, September 27, 2007

Capacity Management

We have a serious issue around capacity management, and what the options are for doing a good job in capacity management. I feel there are the following types of methodologies going in order from fewest to most.

  1. Ignore capacity and just fight the fires as they arise.
    1. Pros: Easy, no additional work involved.
    2. Cons: Unhappy customers, lots of fires, no way to budget for growth or system changes.
  2. Treat capacity as a overall metric across unrelated systems, networks, and software.
    1. Pros: Gives you a overall idea of the usage of your capacity.
    2. Cons: No actionable information, thus you don't actually fix any capacity issues.
  3. Treat capacity as alarms, where we get a message and a ticket based on a capacity being passed.
    1. Pros: Gives you actionable alarms on capacity problems.
    2. Cons: Doesn't give you any priority, or prevent the alarms from being ignored (crying wolf).
  4. Treat capacity as reports.
    1. Pros: Gives you an idea of what you must take action on.
    2. Cons: No idea when you will run out of capacity, floods email boxes with reports.
  5. Treat capacity as a statistical analysis.
    1. Pros: You can proper analyze the timeline for upgrades, hot spots, and products of importance.
    2. Cons: Requires more investment in software, and people.

We are at various maturities, but overall the idea is to get closer to number 5.

Thursday, August 30, 2007

Appliance v. Software

Along my path as an IT god J I have found myself going back and forth on the appliance model:

Year 2000:
    Why would I want to buy an appliance, when it's the same software, and I can't choose the exact form factor or implementation of hardware?
Year 2003:
    I can just buy this box and plug it in and I have a solution. When I have a problem I just swap out the box and load a config file onto it.
Year 2005:
    If I use my own hardware I can use the same box for 5 things at once, thus allowing the box to do more with the same power and space consumption. Especially when I introduce virtualization.
Year 2007:
    I'm so sick of vendors blaming hardware, and having the battles between hardware and software vendors. The worst is when you have IBM, Microsoft, and vendor x all fighting. I also like the fact that I know my appliance will be in warranty, because it's a single renewal.

Not to mention some of these appliance makers that have been around for a while are starting to refresh hardware. That's a brilliant idea! I get new boxes with new features for a fraction of the cost of the older box!

Appliance is my vote now!

Open Source Downsides

Open source is awesome, weather you need a swiss army knife, or some glue between two solutions that don't quite mesh the way you need them. It's also great for analysis and basic tasks that you need at your fingertips. Its versatile and deep in functionality. The best part, is the cost (aside from the legal ramifications of using it commercially). The downside is that working in a large complex environment where I have responsibility across so much diversity and complexity is that many of my needs are not fulfilled to warrant a platform selection:

  1. Manageability
    1. Policy
    2. Templates
    3. Audits
    4. Grouping
  2. Deployment
    1. Ssh
    2. telnet
    3. wmi
    4. remote command
  3. Scale
    1. Distributed systems globally
    2. Failover
  4. Reporting
    1. Complex reporting needs
    2. Logging sophistication

Most of these issues are completely missed in most open source tools, as it turns a swiss army knife into a nuclear weapon. There is nothing wrong with it, but either companies need to understand and implement these around open source tools (like groundworks is attempting to do slowly) or we need to stop using open source management and monitoring tools in large scale environments.

I really hope that these companies take a stand and do this, as it will help reduce cost, increase choice, and make a better more maintainable system for me to manage and implement. The flexibility is key, and open source is the king of flexibility.

Xangati Install and comments

Just got this new small startup's product/appliance installed today. It's an interesting concept. Take the technology developed in IDS and NBADS and implement a monitoring and profiling system designed to monitor for issues and network throughput stats. It's all netflow based, so it's a quick implementation. It's quite interesting, and we'll see how it runs over the next 8 weeks. Looking forward to working with them!

Wednesday, July 25, 2007

Congrads to Opsware

Been a big fan of the direction and type of software Opsware has been working on since I got involved in this area about 13 months ago. The company is really on target for the needs of companies the size of mine. Its great news that HP bought them on Monday. HP is a strategic partner of ours, and I'm very interested to see what the better integration of the products can do. Specifically:

Server/App : OVO, Sitescope, BAC, BPM, and Opsware SAS

Network : NNM, RAMS, and Opsware NAS

What great integration ideas will come of Opsware PAS and its integration with ServiceCenter and Assetcenter. Very interesting ideas, and we should see a lot of good things over the next 12 months. You are now seeing the fruits of the Mercury acquisition in the replacement of some of the weaker HP products with more solid offerings from Mercury. The migration to these products is still a bit of a work in progress.

HPs big missing piece is still event management. OVO is good for this, but it's not as customizable is something like IBM/Netcool.

Monday, July 9, 2007

Network graphing tools

We are currently looking to replace Vitalnet, Ehealth, and MRTG/Cricket/Cacti. The products we are looking at are Netscout (which we have a small install for), Sevone (we have a POC running with them now), Netreo, and Nimbus. We have a specific list of requirements, but we are also looking to have it fill other needs as well. More on this as we better define this.

Thanks.

SiteScope, BPM, and BAC

We have both products deployed, and we are rolling to production very soon. We just have a bit more testing to do, and some integration testing with OVO for event flow. Otherwise we will try to push agentless as much as possible. We will only use OVO agents when we need more details stats or script execution. Hopefully we can use Opsware PAS to even minimize the use of those agents.

Friday, June 15, 2007

Host Monitoring – Agent vs Agentless

We monitor all of our hosts with an agent based solution. You get the most flexibility, but the administration and upgrade of agents is time consuming and people intensive. I want to move us towards having both solutions. A development box doesn't need the ability to run complex operational scripts, and we overpay for that monitoring. In the future I hope to re-prioritize the monitoring tools for the needs of the environment. More on this as we move forward with the transition from HP to the Mercury tool.

Log processing

I am looking at a way to help network support deal with the huge number of log entries coming from the firewalls. I could use a cool tool like splunk, but there is so much data that the cost is high. I am thinking about using a sniffer logging product maybe. I have to talk to more people, but I'm not sure what the best tools are for the job at hand.

Thursday, May 31, 2007

Surface computing

Surface computing in depth demo     http://www.popularmechanics.com/technology/industry/4217348.html

Microsoft Milan site            http://www.microsoft.com/surface/

Really cool stuff from Microsoft. The video is very impressive. I saw the beginnings of this back in the day. There are a couple of cool older projects which remind me of much of the stuff that I see in the video:

Bumptop     http://youtube.com/watch?v=M0ODskdEPnQ

http://www.bumptop.com

MIT Media Lab AudioPad    http://youtube.com/watch?v=lxAD1QIv_dw

http://www.jamespatten.com/audiopad/index.php

Enjoy these cool new pieces of technology.


 


 


 


 

HP Software Universe

I will be attending HP Software Universe in mid June. I am particularly interested in this show for many reasons:

  1. Mercury integration, specifically BAC/BPM and SiteScope into the old OV product line.
  2. Evolution of NNM.
  3. Migration of the 2 Helpdesks and Asset systems to a single HP software system.
  4. ITIL v3 changes and what HP is doing to support it.
  5. MPLS capabilities across other HP tools.
  6. UDDI (Systinet) progress.

Interop Las Vegas - 2007

I was privileged to speak at the Interop show in Las Vegas. My discussion was on Datacenter Modeling. Specifically around using Dependency Mapping and the CMDB to effect monitoring. The discussion went well, and I was given quite a few interesting questions to answer. Overall I got very good feedback on the session.

The rest of the show was decent. The educational content was a bit too high level. I would have liked to have seen some kind of ranking that you typically see in other shows. A good range of rankings would be introductory, executive, intermediate, advanced, expert.

The show floor was huge and excellent. I met quite a few interesting vendors, which I am interested in perusing. Overall the show was useful but not amazing. Some of the all day workshops would be more useful and in depth. I would have liked to have attended one of them, but I didn't have the time to do so.

Monday, May 7, 2007

Configuration management requirements

I seem to be working on several configuration management projects at once. Everyone uses a different set of scripts, or tools. No one has any documented information about what they are solving with the systems, and what they want to do. They seem to have started with the "backup the configuration of x" requirement, and evolved into:

  1. Monitoring (Error detection, service outages, checking for known problematic conditions)
  2. Capacity management
  3. Reporting

The issue without having any documentation of what the systems intended uses are, as well as current uses would make standardizing on software much easier. I'm battling trying to accomplish this, and it's very slow and painful. I think authority will help force changes, which should start to happen here.

Thursday, April 19, 2007

Netbrain Workbench

We are using a really cool piece of software called Netbrain. The tool has many overlaps with lots of tools we use and are looking at. It's a really nice "workbench" product for troubleshooting and designing changes. The person who runs the company is incredibly smart, and is a CCIE. It's new and early for this firm, but it's worth checking them out. The product has many interesting ideas and capabilities.

Speaking Engagements – Splunk / Interop

I will be speaking on behalf of Splunk about how we use the software, where we want to go with the tool. It's a very interesting product, which is used in other parts of my company as well. I am happy to share our experiences with this excellent software. The event is in New York on May 1st from 10am-1pm. Please feel free to email me or contact Splunk if you wish to attend.

I am also going to Interop and I will be speaking about some of our management issues around complexity and the relationships between applications, servers, and networks. The show looks very interesting and there are several sessions I am looking forward to attending. I will post more as it comes closer. We are still 1 month out.

Wednesday, April 11, 2007

London and Security/Privacy

I was in London for a week on vacation. I was hanging out with my Brother for his 30th. It’s crazy over there how little privacy you have. There are cameras EVERYWHERE, and where there aren’t they have trucks that drive around with mounted cameras. There are speed cameras, cameras in the cabs, cameras in the tube, and in bars.

Over the last 5 years London has overtaken NY as a financial and banking capital of the world. More deals and companies operate there. The reason is the over regulation of the USA. Much of this is due to SOX and other laws. Now that the NYSE owns Euronext, which operates many of the exchanges in the EU and UK, will we be changing? Will the changes entail moving towards more regulations in the UK, or will the US realize these oppressive regulations are hurting the US economy?

With all of the security and regulations in focus everywhere it’s important to know the landscape. I feel this is getting gaining relevant knowledge. I do this by certificates, studying security, and learning complex systems. The other aspect of this is the law, and regulations. I am working on going to law school part time, but in June I am getting my CISA certification as a short term. I got my CISSP in 2006. Stuff to think about.

Wednesday, March 21, 2007

Onaro Sanscreen

We have been trying (since our last major storage outage in November 2006) to get this product configured properly for our environment. The issue is that we run so much old software to manage our storage that it's a constant struggle to get things working together. Our storage team is too busy to devote time to getting products to help them operational. We need to come up with a plan to keep management software current. This includes all of our tools.

The product is excellent for the operational needs, but it's not something that I think our team would use well in engineering. I do see value in the storage architects using the product. We'll see how it pans out. The salespeople at Onaro are very aggressive, which can make them hard to deal with.

On a side note, their performance product looks very nice, and it's something the business side has been trying to get for years.

Network configuration management – Wrap Up

I am working on wrapping up a couple of projects so I haven't been blogging as much. I am currently finishing up our network configuration management project. It's been going very well and we are finding so many uses for the technology:

Here is a snippet from our business case on the product:

This product enables large amounts of functionality that we don't currently have from the CiscoWorks LME implementation. The major advantages are multi-vendor support (Cisco Wireless, Cisco, Nortel, Checkpoint, PIX, and F5). The product tracks all changes, captures configurations, and allows for software and configuration upgrades centrally. It allows for dynamic complex grouping, enabling us to track the environment when devices are added and removed. The tool allows for policy management, inventory, and vulnerability management. Ability to proxy into the legacy environment to manage devices we've never been able to access. Opsware NAS allows for the sharing of information, by advanced reporting and dashboarding. These reports allow for reporting to auditors and customers (Business).


 

Other major benefits of the tool:

  1. Switch port utilization and capacity
  2. Checking and fixing DST compliance across network devices. (corrected several hundred devices). This is what the Server product would have helped us with as well.
  3. Generate inventory reports to allow for verification of maintenance renewals. Reports included serial numbers, modules, models, and IOS versions.
  4. Update access controls, and enable passwords across large numbers of devices easily.
  5. Port capacity planning, and switch port utilization for future switch purchases.
  6. Dynamic grouping allows for inventory to be grouped and reported on by business ownership.
  7. The GSOC was given a large list of IP addresses of virus-infected machines.  The only way to find these machines previously was to hop from switch to switch tracing out the MAC addresses/IP addresses via MAC tables, cam tables, etc until a switch port can be identified.  Opsware does this in second with its search for addresses "seen from port" feature
  8. Ability to track what and by whom devices were changed. This has been in use on various occasions to avoid outages and finger pointing.


 

Wednesday, March 7, 2007

Research group?

Can someone please post comments on these questions.

Does anyone here work for a software or IT company?

If you do, and you design product, or deal with software/IT architectures, how much time do you spend doing research?

How much of that time do you spend using Analysts and web searches?

Qualifying technology and direction?

Do you have an internal department which does research, and helps coordinate that stuff? I find that lots of people who use the analyst seats are the senior management, and those people don't need to use analysis. I am fighting over Gartner right now, but it's very frustrating.

DST?

Its funny how many of my house appliances and cars and such just have the Daylight Savings Time hardcoded. It's going to be a pain, because not only will they not change this weekend, but they will change in a month or two. I've heard a little press about the change, but it will cause outages and issues. It will not be the non-issue that year 2000 was when it happened.

As for my company, it has been in planning at our company since August 2006, but it wasn't seriously engaged until January 2007. We've been frantic trying to get things fixed in time. We have done as much as we can, so we'll see how it affects us.

If we had better auditing, and deployment tools this would have been a lot easier to manage. I am hoping that this gives me and ammunition that I need to convince people to start taking this more seriously. We will start a larger Opsware SAS pilot once DST has passed, and we've fixed those issues which popped up.

Monday, February 26, 2007

Coradiant Meetings @ Montreal

Had a great trip to Montreal to meet with the Development and R&D people for Coradiant. This product has been great for us, there is some learning curve to using the product effectively, but there is no shortage of uses for their excellent products. They are always thinking ahead, and we are trying more and more to leverage their technology for monitoring, troubleshooting, and analytics.

I met with some of the product folks, support people, and development leads. Really good talent making great tools.

Keep up the good work.

Friday, February 9, 2007

Linuxworld NYC Next Week

I'm going to be attending Linuxworld next week. The show was good last year. Looking forward to seeing what is new in the open source community.

Opsware NAS progress, Alterpoint setbacks.

Opsware has been very helpful to us with the NAS product. Its been doing everything we need and then some. The reporting is excellent as well. We are going to present the solution soon to the rest of network engineering and architecture.



Alterpoint was supposed to start over a week ago, but they had to push it off due to concerns they were having with the deal. I think their product is excellent, and I'd like to get it in the hands of the folks who use these tools day to day and get an honest opinion of it.



Alterpoint has a great reputation, and I'm looking forward to evaluating the solution.

Tuesday, January 30, 2007

MOM 2007 RC2

I installed, and I am playing with MOM. We may decide to evaluate it as part of the consolidation project. Its a lot better now with this release, and the linux support seems more solid (provided by Quest software).

Still have a lot to test (such as network devices, etc).

More later.

Vista on primary machine

I took the plunge 2 weeks ago, and installed Vista Ultimate on my primary work laptop. I love the new OS. Its a lot faster, easier to use, and better. I have been running the betas and release on my main desktop at home, but I was prevented from using it for work since we didn't have a workable Nortel Contivity VPN client. Now that I have a beta that finally works, it was all systems go.

Issues:
We disable the built in firewall here at work, so I am running the Jetico Firewall betas which work on vista. Zonealarm doesn't work on it, which is odd, since they built the windows firewall initially.
Our antivirus doesn't support Vista. I am running the open source clamwin, which seems to work well. I don't get viruses anyways.
The built in indexing was working fine for the first week, but now it doesn't index my pst that is open in outlook. I am using the inferior google desktop, but it works fine :)

Things I like:
Sidebar is really nice, there are some good widgets for it. It doesn't slow things down like the yahoo widgets do.

Opsware NAS POC / Alterpoint

The POC went off without a hitch regarding the network tools. So far its been very helpful in reporting and understanding our infrastructure. The product is working perfectly as to our requirements.

Alterpoint was scheduled to start the POC today, but we had to move it due to some political issues. Its been quite confusing, but they seem to have sorted things out internally. The sales person I am dealing with has been quite removed from the opportunity, but they are fixing that.

Forward plans

Just posting a note that we should have much more solid plans after the 5th. This includes our forward plans for what we are planning on tacking and when. We will also know how the resourcing will look at this is a lot of additional work for all of our teams.

Friday, January 19, 2007

Application management

I work really hard to get a tool that people need across the organization, and I can’t seem to get any takers for trying it. People are so busy doing the day to day, and they are unable to step back and spend time on making efficiencies. We spend a lot of time chasing issues, and we have no way to audit configurations, servers, and applications. If people were to step back and actually look at the technology hey maintain and support and the tools they use they would have a lot more time to manage projects and a lot less time doing operational work.

I’ve asked my boss to help me work out these issues, and get people to pay more attention. The last thing we need is a took which is in use by our groups in TechOps, but not in use by the application support, qa, and development people.


Another frustrating Friday... i'll be at work until 730 again :(

Wednesday, January 17, 2007

Datacenter Automation Decisions - Rev2

As referenced by my last post, you can see this is high priority for everyone at my company. We have decided that Bladelogic can fit one area very well, but the vision, strategy, and direction of the company is not in line with our needs. Bladelogic had a stronger POC team, and better talent as well. If Opsware can do what we need for the application configuration management and deployment, then it would be the best choice for the following reasons:

  • Dependency mapping and use of the agents to do that.
    • The EMC/nlayers solution is better but not feasible for our network
    • Opsware has an excellent visual application manager which can help us troubleshoot problems and changes quickly.
  • End to end view of Network and soon Storage assets.
    • We have started a POC with the NAS product, and it can deliver very good network data to the server tool.
    • Alterpoint is next week, who Bladelogic partners with for network info in the tool. The integration is not as tight, which is understandable.
  • Scalability and resiliency.
    • Bladelogic has a lack of built in replication and agent failover. The agent and replication can be adapted using 3rd party tools.
    • Our ultimate scope is over 20,000 systems with varying uses of the tool.

We are going to deploy Opsware before a general decision is made for the company, and if the product cannot do what we need we will go with Bladelogic. Either way with our lack of centralized authentication, the major issue is going to be getting some kind of agent to deploy software on the systems. We will see how this pans out.

Needless to say Bladelogic is not happy about this decision, and rightly so. I have explained to them that this is not a final decision, but it’s a better direction due to strategy and needs of our business. Software is a mix of capabilities, direction, and the suite of offerings a single entity can offer us.

Sunday, January 14, 2007

Consolidation

As part of our overall technology consolidation across my company we have determined a high priority list of item as follows:


Technology Sub-Technology Need Difficulty Phase
Server
Application Deployment 10 6 1
Basic Host Agent 10 2 1
Realtime Performance/Capacity Planning 7 2 1
Bare Metal Provisioning 2 3 5
Configuration Management 4 4 5
Patching 4 1 1
Asset Management 8 2 2
Database
Realtime Performance/Alarming 10 8 3
Configuration Management 2 5 4
Application/HTTP
Real User 10 3 1
Synthetic Static/Transaction 5 5 4
Outsourced Monitoring 8 6 4
Java/NET 6 9 4
Reporting
Outsourced Web Analytics 2 1 5
Web Analytics 8 4 1
Data Correlation and Rollup 8 10 5
Network
Event Management 10 2 1
Asset/Discovery/Configuration/Deployment 9 2 2
Configuration Assurance 9 8 3
Logging 10 1 1
Circuit Management 3 8 5
Cable Management 4 10 5
Engineering/Simulation/Provisioning 8 6 3
Route Analytics 5 1 4
Flow Analytics 7 5 4
Performance Graphing 6 3 2
Storage
Event Management 10 2 2
Configuration Management 7 3 3
Utilization/Backup Reporting/Capacity Planning 10 3 2
Event Management
Console 10 4 1
Correlation 7 9 5
Security Event Management 7 7 3
Process
Ticketing Out of scope
Change Management Out of scope
Physical Datacenter Management Out of scope
Dependancy Mapping Out of scope