Wednesday, December 20, 2006

Technology consolidation – Datacenter Automation

Across the company we have lacking tools in the area of configuration management, both on the network, storage, and server. Datacenter automation ( Bladelogic, and Opsware) is very needed as determined by the initial meetings of our cross market group meetings.

The work I have done so far will be leveraged by the group as we select and roll out a technology across the corporation. The business side is clamoring for these tools to help deployment, troubleshooting, and management of their servers. I am pushing back so we can make the right decision.

Do I roll something out with the intention of possibly replacing it when we are ready? It's a hard call to make, but we'll see how it pans out over the next 3 weeks.

More on this soon.

Technology alignment

My company is undergoing a realignment of our direction and technical operations groups across all of our market groups. This includes many initiatives, but the one I am involved in is monitoring and tools in use to conduct operations. The other groups don't seem to have a solid strategy, nor a solid understanding of how to solve their operational problems. This is in part due to the fact that they are always firefighting, and thus not able to analyze situations and determine a path.

The strategy and products I have selected in my roadmap are partially due to the fact that we have certain technologies engrained in our process which are not problematic enough to replace. With this new initiative that becomes not only possible, but probable. This means we'll have the BEST tools in place in our roadmap. That includes supportability, cost, and open standards (open source).

Now the tools and set of products will be aligned across the whole organization, across the 20,000+ servers and 10,000+ network devices.

Wednesday, December 6, 2006

Business Intelligence for Infrastructure Monitoring

How cool would it be if you could look at your servers behind a VIP on a load-balancer and tell which servers were serving more errors, serving traffic slower, or not getting the requests they should be in comparisons to your other servers. I have a product in place to do just this, its not released so I can't talk too much about it.

The issue is that we have to embed some of the data in the http requests, or cookies, which I need help from development to do. I am getting pushback about security, and information disclosure which I am fighting through. I understand it could be an issue, but it's the only way that we can properly track usage and performance. This would rally help a lot to have as a internet based standard for tracking requests and backend connections and usage. Using weblogs is not feasible when you have thousands of servers that don't have any consistency. This is my first pass at what I want to add to the http headers:

Type

Variable Name

Explination

Sample code

Application

ApplicationName

StockTickerApp

 

Application

ServiceName

TickerWebService

 

Application

PageName

Login_Page

 

Application

TransactionID

Unique ID of request

 

Infrastructure

ServerID

Something like fcweb01 + last 3 from ip range (.124) fcweb01124

 

User

UserName

Bsmith

 

User

CompanyName

Customer ABC

 


 

More on this later.

Sunday, November 26, 2006

Housekeeping on other projects (BI, MSSQL)

I am working on implementing a new shared SQL cluster to help improve our Openview tools to run on a better hardware infrastructure. I also hope to leverage that SQL server for Iconclude a workflow tool we are implementing more and more. I am trying to get a proper cleaner server architecture for the tools. I am also adding a high end server for the BI tool to compile the Proclarity cubes going forward.

The server config mgmt race is coming to a close, with Bladelogic seeming to have the win so far. I am working on the final touches of the evaluation.

I am also on the home stretch with the Symantec versus Quest database tools, and hope to wrap that up in the 2 weeks.

That's mostly it on my major projects, I have a lot of other smaller projects including:

  • Qip Integration
  • Monitoring work for better consistency on some of our internal apps
  • Coradiant next gen products
  • Business Intelligence work with Proclarity
  • Syslog integration for Coradiant Truesight and Onaro Sanscreen
  • Trap integration for database tools
  • Coradiant instrumentation of some leased line infrastructure

Network Configuration Management

The other areas we are looking at around the network tools are the Opnet suite for engineering (which is a serious investment for a tool), and the configuration management areas.

We have gotten several demos, and sent our 4 RFPs. Out of the 4 we sent out, we got 3 back. We are trying to get it down to 2 vendors for a POC. This is slightly different than the server side, because I have a much larger team to work with on the POC and the work needed to get it down to a great tool for us. Here is the criteria extracted from the RFP, there will be more on this as we move forward narrowing down the field of 3 (HP OVNCM, Opsware NAS, Alterpoint DeviceAuthority:

Requirement

Sub Facts

Company Viability and History



Customers of specific product


Revenue Growth FY 05-06


Total Revenue FY 05

Install Base



Financial Customers


Largest Install

Technologies



Windows


Linux


Java


Perl


Native Eclipse, Visual Studio, or other native IDE


XML


MSSQL


Oracle


Mysql


X64

Extensibility and Robustness



Portal


Web Services API


NET Java Perl API


CLI


Open CMDB


DR/HA


Active Directory, TACACS, RADIUS


Granular permissioning

Device Support



Nortel, Cisco, F5, Checkpoint


telnet, ssh, rlogin, snmp, oob


Auto discovery of devices


De-Duplication of devices


Dynamic grouping

Configuration and Usage



Ease of Installation


Reporting (PDF, XML, HTML, DOC, CSV)


Report Delivery


Modeling and Visio support


Support for Perl, Expect, and Shell


Syntax checking


No java or plugin client


Upgrade IOS with verification of hardware support

Change Tracking



Comparison Live to Snap


Change notification to run collection (syslog, snmp)


Generate SNMP traps for changes


Enforcement of peer review before implementation


Complaince templates (SOX, GLBA, etc) with weighted application


Keystroke logging


Tracking of CPU, Memory, Users per device

Asset Management



Contract management


Integration with Cisco Contract site

Cost and Community



Maintenance Fees


List Price


User group meetings


Online user groups


Free development licenses

Managed firewall monitoring services

We are looking at a couple of vendors in the Managed Security Services space (MSS) to do some firewall monitoring for us. Essentially they give us an added line of data security and best practices that we don't already have the capabilities to do. We are testing them on 3 pairs of key firewalls. These products do several things:

  1. Absorb all of our firewall logs to a 3rd party, who does correlation, distillation, and has analysts who look at major events across the customer base.
  2. Send back alarms for critical issues and worms they detect.
  3. Log and report on the data, trends, and how our data compares against the collective whole of their customers.

The two vendors we are looking at have different pros and cons about their technology, methodology, and ability to provide these.

Eventually if this goes as planned and there is a major benefit, which will be easy to prove I believe, then we will roll this out to all major firewalls (of which we have about 70). I will also work on implementing snort IDS systems to help give the MSS more data and provide better visibility to our security events.

Wednesday, November 8, 2006

Configuration Management and Datacenter Automation Status

I have been evaluating the following vendor solutions for the past 3 weeks. We have all 4 of them installed in a small test environment consisting of varying windows systems and technologies running on the systems. We are focusing on current pain points in configuration management, and we are also evaluating technology which we will need in the medium term as well. I am going to review how they are stacking up, as I fill out the matrix of which products are supporting the requirements.

Requirements:

· Monitor and track configuration/policy

o Create policy off Live including patches and settings

o Track compliance to the policy

o Enforce the policy

o Track changes made outside the product

o Prevent the execution of a specific exe or file

· Architecture

o Ability to have proxies in datacenters/envs

o Ability to have decentralized control over envs

o Ability to use a single uni-directional port

· CMDB

o Visualize relationships between servers

o Visualize relationships between server and network

o Track dependencies of servers and websites

o Configuration Management Interoperability

· Manage users and services

o Manage local users across servers

o Replicate credentials to other servers

o Manage services in real-time

o Verify status of services in real-time across servers

o Verify services port usage

· Usability

o How easy is the product to administrate

o How easy is the product to use

o How easy is the product to configure and setup

· Software asset collection

o Collect software revision and install details

o Collect how often and for how long software is used

· Hardware asset collection

o Collect data via DMI or Standard Protocol

o Collect detailed information

· Reporting capability

o Export to PDF,XLS

o Report on compliance, changes, and activity

o Open database with views that make it easy to query

· Software Deployment

o Support for MSI, RPM, and Sun Packages

o GUI for creating Packages

o Search and replace

o Reverse engineer files into packages

o Rollback

o Notifications via SNMP and SMTP

o Download patches, deploy, and rollback patches

o Create a policy of patches

· PXE deployment

o Provision OS and policy in one job

The products we are reviewing are (in order of the installs):

CA – DSM, Cendura, and CMDB – The CMDB is the glue between the other components. The suite is very well done, and does a good job in general. There is not as granular policy control as some of the others. There is also not a good package of supported configurations in the DSM product. So far I would rank them 2nd or 3rd place. We still have more evaluation work to do on the products.

Bladelogic – Operations Manager – The product is excellent and extensible easily. The downsides are complex security model, and the UI is not great. They don’t have a solid CMDB strategy. I would rank this product in 1st place so far. We still have work to do here as well.

Opsware – SAS, VAM – This product does an excellent job in the CMDB and visualization. The system is scalable and capable as well. The downsides are the complexity of deployment, some instability, and some growing pains as they re architect some of the way the product operates. It doesn’t have as good of a unified shell that Blade has. This product shares the same spots with CA. We still have more evaluations to complete with the product.

HP – Radia – Lets put it this way…. After 2 days, the product hardly ran, and was not usable. I would be working with them today if I hadn’t given up and asked them to stop the POC.

.NET 3.0 and Sysinternals release

While the .NET release info was going out. Microsoft purchased Sysinternals a few months ago, the non-commercial side of the business is an excellent set of tools used almost everywhere now. They are incorporated in many commercial software packages for common tasks and debugging. They have finally moved the content over to Microsoft's site and bundled the tools together:


 

http://www.microsoft.com/technet/sysinternals/default.mspx


 

My favorites:

http://www.microsoft.com/technet/sysinternals/ProcessesAndThreads/ProcessExplorer.mspx

http://www.microsoft.com/technet/sysinternals/ProcessesAndThreads/Filemon.mspx

Thursday, October 19, 2006

Datacenter automation status

Over the last week, I have been testing the CA Server management and CMDB product, as well as Bladelogic. Both products are good, but have their downsides. I am evaluating 4 products, and narrowing it down to 2 in order to deploy on a real QA/staging environment. The criteria we are testing on are as follows, each has a weight as well. More later:

Requirement Sub-Feature
Network Option
Track network device configuration
Create policy for configuration standards
CMDB
Visualize relationships between servers
Visualize relationships between server and network
Track dependencies of servers and websites
Software Deployment
Support for MSI, RPM, and Sun Packages
GUI for creating Packages
Search and replace
Reverse engineer files into packages
Rollback
Notifications via SNMP and SMTP
Download patches, deploy, and rollback patches
Create a policy of patches
Hardware asset collection
Collect data via DMI
Collect detailed information
Software asset collection
Collect software revision and install details
Collect how often and for how long software is used
Reporting capability
Export to PDF,XLS
Report on compliance, changes, and activity
Open database with views that make it easy to query
Multiple Datacenter capability
Ability to have proxies in datacenters/envs
Ability to have decentralized control over envs
PXE deployment
Provision OS and policy in one job
Monitor and track configuration/policy
Create policy off Live including patches and settings
Track compliance to the policy
Enforce the policy
Track changes made outside the product
Prevent the execution of a specific exe or file
Manage users and services
Manage local users across servers
Replicate credentials to other servers
Manage services in real-time
Verify status of services in real-time across servers
Verify services port usage
Usability
How easy is the product to administrate
How easy is the product to use
How easy is the product to configure and setup

Thursday, October 12, 2006

BI BI my oh my

I've been really absorbed in my BI project. The problem is that one of my tools is really fast, but requires a lot of development and building. The other tool is REALLY slow, because its runs all of its queries on demand. They told me they can fix it by pre aggregating like the other tool does. Hopefully it will, but you never know.

Friday, October 6, 2006

Datacenter Automation

I have a lot of things going on including POCs with CA, Bladelogic, and Opsware to deal with asset management (displacing another tool) configuration management, centralized credential management, and deployment. By testing these main areas we can figure out which product we want to deploy on around 4,000-5,000
of our servers. We need a better handle on change and configuration. It will improve the quality of our products and make reporting changes and noncompliance easier across the board. We will be able to fix things faster.

Hiatus

Sorry about my missing in action. I have been ignoring my blog and working a lot on finalizing my POCs with the BI vendors. I am paying attention again to my blog and will be writing more. Just got back from 2 days in NYC. Here I sit in my office on Friday night at about 7pm.

Tuesday, September 19, 2006

Managed Security

I am also working on managed security solutions for firewall monitoring for my company. We have a lackluster focus on anything security related. Having this monitoring will do a lot for our security infrastructure. I am looking at Symantec and Counterpane to provide the services for us. I am just now scoping the project and getting together a business case. People seem onboard which is good. We shall see how it progresses.

BI Update

Still working on Microsoft, Microstrategy, and Panorama tools. So far the Microsoft/Panorama stuff isn't going well. Microstrategy has very slow support (10 days to fix my last issue). I am on the fence about which one is a better selection. I do need to spend more time with the tools.

Networking tools

Just a update, I am looking at some networking tools from Packet Designs. Route Explorer, is a very neat tool we have a POC running in our networking team right now.

Sunday, August 27, 2006

MPLS monitoring and engineering tools

We have holes in our tools regarding our new MPLS network. We have a private backbone almost completed. When finished, sometime Q4'06 or Q1'07, it would have 8 P nodes, ~40 PEs, and ~120 CE. One OSPF domain and one CONFED BGP with sub-AS in each region. All P and most PEs are Cisco 7600/SUP7203BXL running 12.2(18)SXD1 and will be upgrading soon to SXF train. We also have PEs those are 7200VXR IOS 12.4(5)a.

The tools needed are to do provisioning, engineering, and monitoring the VPN links we are building on the MPLS network. We are looking into existing tools we own to see what they cover. I will post updates as we get our list of requirements answered and do more research. The companies we are looking at are : Lucent (vital), Cisco, HP, Opnet, and other smaller vendors.

Predictive Monitoring Tools

I was prepping a POC with Proactivenet, and they were pushing on me using the "Proactivenet" agent on my systems. Deploying another agent was not possible, existing tools and agents would have to be leveraged in order to do the correlation, statistical baselining, correlation, and predictive failure. The ability to absorb metrics and data from HP Openview Operations (Windows and Unix) would not be ready until November. The POC is dead until then.

Netuitive is my second vendor the POC with them is now being scoped and accelerated. They will support the agents that we have already purchased and deployed everywhere (approx 3,000 agents). The tool is also much better because they are looking at the low level metrics, versus the collected data that Proactivenet is using. We shall see as these products are fully deployed and tested.

More soon....

BI Updates

I am testing some BI tools:

Microsoft Proclarity
Microstrategy
Panorama Novaview

The POC built on Microstrategy, but need to build a few more reports and such. Next week Panorama will be in working on the POC. Microstrategy is good, but the client and configuration are rather arcane. The web based tools are great, which is most of what my consumers will be using. If they have to use the Desktop product they will be confused, which I don't think is going to be a good fit for us. Microsoft Analysis Services, which are what the other 2 products are the front end to. The cubes and metrics need to be built, which is what I will be doing over the next week.

Monday, August 14, 2006

Been rather busy

I have been out of touch with my blog readers recently due to being very busy with work and personal things. I had a few trips and visitors for both work and fun.

Back to the technology... I am working hard on evaluating some BI vendors: Microsoft Proclarity, Panorama, and Microstrategy. So far I am much happier dealing with the Microsoft analysis services that Panorama and Proclarity are based on. Not to mention that the pricing I am getting is much cheaper from those 2 vendors. I will continue to build my POCs and show them internally.

I am also working on some management tool consolidation from Symantec I3, CA Spectrum, and other tools. A lot of the work is in architecture still, but we are also doing some implementation.

We are evaluating proactivenet in the next week or two as well. More soon.

Thursday, July 20, 2006

So much good software so little time

My bladelogic POC got shot down due to people being too busy. When are people going to realize by leveraging tools it will save them time. By not biting the bullet you are just going to be scrambling to stay on top of things.

I am also looking at the following:
RBA : Opalis and Iconclude (very cool software packages)
Config Mgmt : Bladelogic
Log parsing and management : loglogic and splunk (leaning towards splunk)


I am integrating the following:
I3 and Netcool for SNMP traps.

Thursday, July 6, 2006

Predictive failures

The class of tools which are known as predictive and analytical monitoring tools is a very interesting concept. I am looking at 2 such vendors now, to try to find something which can sit inline with our Netcool (manager of managers) and produce more intelligent alarms and such. If its works well, it may even replace or be the go to middle man to get to Netcool.

Good idea to have not only the ability to predict, but to correlate systems based on what alarms together, and how they behave leading up to a failure.

We are looking at :

Intergrien Alive - http://www.integrien.com/alive.cfm
Proactivenet - http://www.proactivenet.com/

Proactivenet is already in a QA lab here at another part of my company, but its not in production yet.

We shall see how and if the POCs progress. Expect me to keep you guys posted.

Monday, June 26, 2006

Results from software forum

As a result of attending the forum it just strengthened the fact that we need to have a core company who provides us with all of the technology that makes up the core CMDB and ITSM strategy. This include change management, ticketing, monitoring (as a whole). The integration needed between these items is not only important, but vital to having a closed loop process of implementing changes in a non-disruptive way.

Other tools which strengthen monitoring, infrastructure management, application dependencies, and other areas can easily be integrated and further solidify the core of the service delivery of the IT group. The only area I am on the fence about is asset management, where something like Radia or even Bladelogic ensures compliance and proper change execution moves it outside of a traditional asset management system into part of the change verification arena.

My firm has been buying tools from HP, but has not adopted them across the ticketing, change management, and CMDB areas.

I have solidified the Enterprise Architecture now, and I will be finalizing this with people in the operations side, as well as the business units to ensure we as an IT organization are going in the proper direction for our customers (the business) to fill the gaps and future innovation.

Wednesday, June 21, 2006

HP Software Forum

I have been in Miami at HP Software forum this week learning some good stuff I will be sharing with you guys this weekend.

automated fault correction
nlayers and where its heading hopefully (how I feel about them in general)
CMDB
HP's overall vision around ITSM

I will be discussing the competition to HP and how they are stacking up.

More soon...

Wednesday, June 14, 2006

Service Desk (Ticketing) and CMDB

We currently use a small companies ticketing system for our IT service desk operations. I will leave the vendor's name out, because the product is not cutting it for many reasons. It can't scale, it doesn't have an open architecture, and its a poor all around product in the area of reporting and integration.

We have an old change tracking system, which is quite good (especially for its age), but its developed internally, and it doesn't have the proper functionality that we need.

I feel that in order to properly implement a full blown CMDB solution that it needs to be integrated fully with change management and ticketing systems. The only vendor I feel that can properly give you that infrastructure is HP. I'm going to be spending a lot of time to understand how their products work together to provide that. If anyone has comments on some other comprehensive and enterprise class companies with that kind of offering I am interested to know about it.

HP Software Forum

I will be attending HP software forum. It looks to be a very interesting show. I need to really accelerate my knowledge of their product line, and this is probably the best way to do it.

If anyone is going let me know (in comments).

Saturday, June 10, 2006

New position

I accepted a new position at my company. I'll be the Director of Enterprise Architecture. What that means is that I will provide vision, analysis, and evangelism for the groups that cover monitoring tools (IP, systems, network, dashboarding, rollup, soa, etc), CMDB, change management, ticketing, and level 1 support. This includes the current state and forwarding thinking ideas around how to structure out tools and integration.

I have been doing a lot of reading and thinking over the last 2 weeks. I will be posting more as I get my thoughts in better order.

Thursday, June 1, 2006

BI and next gen reporting

You say reporting, and people pretty much say :

Microsoft Reporting Services
Crystal

In reality big companies need to invest in real Business Intelligence (BI). The business people need tools like these to analyze the metrics, build proper reports, and good dashboarding tools. My company has been trying to do this on the cheap. We are starting to realize we need a full blown OLAP based tool. The real issue is going to be finding the right experts in this area to help select the proper tools. These tools are very complex and far reaching. Once you pick one, that will be the vendor you are in bed with. Its a serious decision, and I hope I can make it happen.

This will take the standard web analytics tools and turn them into real tools which allow for the understanding and mining of any type of data, and not just web traffic.

This is very interesting technology that will help with the information overload we are dealing with.

Tuesday, May 23, 2006

The new management platforms

It seems like all of these new platforms are a bunch of best of breed products glued together so that data flows between the components. I know for a fact that HP has been pushing and building its products to work in this manner.

Why build a better mousetrap, and not just put the mousetrap into your product and get the end result and apply it to another input (kill a mouse and feed it to your cat, sorry that's kind of gross).

Its good because I often pick these best of breed companies and execute on them, only to find other companies OEMing the products. This has happened on several occasions to me in the last 6 months. Its funny because the vendor doesn't reveal the OEM, but I can tell either by wording, literature, or by seeing the product and the integration they did.

Companies that are the best in this approach are either learning how to do it (BEA, Bladelogic) or have been doing that for many years (EMC). It should be interesting anyways in the area of Enterprise Architecture.

Monday, May 15, 2006

CRM messes - Ticketing systems

I was meeting with the guy that is in charge of the CRM stuff here. He does most of the support, upgrades, and projects around them all. We use some big ugly CRM system (leaving out the name, because I don't like them at all). I have asked his boss a couple times about moving to salesforce.com. It would save us a lot of money and time. Its taken 2 years, but they finally listened and are making the plunge. Its good to see that they are finally evolving off the crap they currently support, upgrade and deploy. The project is estimated to take 5 years. They are also moving parts of it (accounting) onto SAP, which is a great move too.

We have a lot of different ticketing systems too. We are trying to consolidate them into one set, but people seem to be moving in different directions. I deal with 5 systems currently. Each of them are pretty poor and overly complex IMHO. Some have bad workflow, some have bad reporting, and none seem to integrate with email.

If i spend most of my time in email or on the web, why can't I integrate those systems into my existing workflow (browser extension, or email based workflows). They all just generate an email and you have to login to see what's happening.

I prefer something nice with email integration like Cerebrus Helpdesk - http://www.cerberusweb.com/

Its cheap too!