Can't Manage IT

Thursday, June 29, 2006

You only need 3 machines (replicas of production)....

In many technology shops, you typically find several non-production hosting environments (machines or groups of machines) used by the various participants in the development lifecycle to weed out the inevitable bugs (and new features) before applications or services get deployed to production. Since "non-production application (or service) hosting environments" is quite a mouthful, let's call them production replicas for short (as discussed below, the fidelity of these replicas may be intentially or unintentionally compromised).

Having multiple replicas makes it easier to practice parallel development: one release can be in functional test while another release is being performance tuned or developed. It also gives your deployment and infrastructure players a place to practice installs, middleware upgrades or other fun tweaks, while the developers and testers are busy doing their own thing. This flexibility comes at a price, of course. The maintenance of these non-production environments is a significant activity and a costly investment. In what might be considered a fairly mature practice, you may find individual replicas for development integration, functional test and performance test; in this approach, you may have twice as many non-production CPUs as production CPUs and be paying for the middleware you run on them as well (although some vendors give you a break on non-production licenses). Hardware still costs something (as oil prices keep increasing, the power to run those CPUs is costing more than the hardware) and the people that maintain the hardware cost a lot more.

The main frustration is keeping these various instances in sync as much as possible (except when they're intentionally not in sync to test some patch or other change). If the replicas don't have the same OS or middleware on them, the first response from the application development teams when a new "feature" is discovered will be "It works on my machine!"

The synchronization activity is complex. Each replica may have multiple tiers (web, integration, database) and each tier may involve a dozen major software and/or hardware components. The replicas are frequently in different data centers with different network topologies, different interconnects with other (usually non-production!) services and applications and different application configuration data. In addition, to save on infrastructure costs, the replicas are frequently crippled versions of production with fewer CPUs or actually a shared environment, i.e., the functional test and development integration replicas may run on the same physical host(s). If production and its replicas support multiple applications, the complexity is even higher as you have multiple releases of these applications running through the replicas and they share some dependencies on the infrastructure.

So having replicas is useful, but costly. How can this cost be managed and reduced?

Reduce the number of replicas. As the title of this post suggests, you only need 3 logical replicas:
- one shared replica for development,
- the second for development integration & functional test
- and the third for performance testing.
I suggest that development integration & functional test share the same physical replica to avoid the otherwise inevitable releases that fail testing's initial "sniff test" due to environmental differences.

If you have a Disaster Recovery environment (as you should), it should already be a 100% fidelity replica of production and you should consider using it as your performance test replica (reducing the number of non-production replicas to 2!). In addition to eliminating the need for a separate replica, using your DR replica for performance testing will increase the likelihood that your performance tests will be accurate and that DR will work correctly in the event of an actual disaster, instead of just during a DR test. If you're using (some or all) of your DR replica to support production usage, i.e., load sharing, you may have to make adjustments, e.g., time-shifting your performance tests to periods of low volume. If your DR has passive nodes that are only active during a disaster or test, you may have to adopt new methods for activating and passivating these tiers or retain (portions of) a separate replica.

Adopt the use of VMWare or other deployment provisioning tools to make it easy to run development replicas on developer workstations. Your developer's workstations or laptops should be beefy enough to deploy the entire application environment (if they're not, your developers are probably less productive than they could be). Again, if you're successful at this, you may be able to get rid of another replica.

Adopt the use of Tripwire or other tools to track all system and application changes at a granular level. Incorporate these tools into your change management process. Make this information available on-line to the development/test teams, so they know what's changing and when. Ideally, your production environment should be using the same toolset as well.

Friday, April 28, 2006

A fool with a tool...with enhancement (Aphorism of the Day)

I know this aphorism has been around for a long time, but I first heard it from Michael Marden a couple of years ago (Thanks, Michael!).... I've extended it a little...

A fool with a tool is still a fool, but if you build your own tool when it could have been bought or "borrowed" (open source), then you're the biggest fool of all....

Why do most programmers disdain tools, unless they have built them? Twenty years ago, the first program I ever wrote with an application framework was a bug database (as a demonstration app for the framework). I thought it was really cool (IMHO 8-)>).

It was useful as a demo, but no one wanted to use it as a product, even at the company I was working at, even though they had no on-line tracking system...when bug tracking is the #1 collaboration tool needed to achieve mature development processes!?!#@?

Of course, there are dozens (hundreds?) of open source bug tracking tools now. But why can’t the Bugzilla folks have a beer with the CodeTrack folks and/or the other dozens of bug tracking projects you can find on SourceForge and produce a single product?

Because there’s something unique capability that one of them has over the other?

Bad, bad programmer! Sit!

Copyright 2006, John Sovereign

Friday, April 14, 2006

Standards Oversight Board

Most analysts promote the establishment of a technology Standards Oversight Board (SOB) to handle waivers to established standards and drive the evolution of those standards. This is essentially a "dressed up" Purchasing function; if internal client organizations can justify purchases from non-preferred vendors, then they need sign-off from some SOB.... 8-)>

The problem with establishing this as a governance board of internal "thought leaders" is that they will be perceived as performing a low-valued support function...just like Purchasing! Of course, this perception is accurate...that's what they're doing...but it undermines their intellectual capital, which would be better invested in solving larger problems in the IT portfolio.

I served as an SOB for some time and experienced this loss of credibility first-hand. The following is one of my attempts to capture this with a sense of humor (with all flattery intended to Mr. David Letterman):

How many members of the SOB does it take to change a light bulb?

10. All of them. One to hold the light bulb stationary and the rest to rotate the heavens and the earth.
9. The SOB is a black hole from which light cannot escape. After being interrogated for several hours about its preferred behavior (wave vs. particle), light gives up trying to escape.
8. SOBs don't change bulbs. Now if you're looking for someone to blow your fuse box....
7. Vat are your Kwalities of Zervice requirements? If you don't know your Kwalities of Zervice, you might az vell be left in ze dark.
6. Seven. One to request the Design & Engineering teams to write the specifications of the light bulb and socket, one to review and approve the specifications, one to request Operations to actually change the light bulb, one to ask Operations to check if the bulb has been screwed in properly, one to test in production (pre-market) by flipping the switch ON and then OFF (an unfunded project to improve the user experience by adding a dimmer switch has already been branded Light Bulb Pro), one to do an article on the Intranet about this groundbreaking technology AND one more to assert that the bulb will keep glowing when the socket is migrated to Linux.
5. None – when you deploy Acme Brand Light Bulb, your officially Sanctioned source of light….
4. Are you sure you want a buy a light bulb? We have some spare capacity which can generate light and heat when run with sufficient load.
3. None. "Local illumination" is an application model governed by the Applications Solutions Standards board. The SOB only governs the e-utility infrastructure.
2. A corporate officer will need to sponsor your exception request and support from the majority of the SOBs will be necessary to approve it, if you are using a non-sanctioned illumination appliance (aka light bulb).
1. Define "light".

Monday, March 20, 2006

Fowler and Sovereign on Organizational Redesign

Obviously as a stakeholder and user of many organizational designs, I've been interested in organizational architecture for a long time. I began to see the relationship between system and organizational design, however, after a friend observed that "technology architecture tends to follow the organization", rather than the other way around (as you, or at least most, architects) would expect the world to behave.

In his First Law of Distributed Object Design ("Don't!"), Fowler argues that all system designs should limit the number of (internal and exgternal?) distribution boundaries. Sometimes this is called "limiting the exposed surface area". I believe this principle applies equally well to both man-made computer systems and organizations. System and Organization are both sub-classes of Organism.

P.S. A (different) friend has suggested that I need to find a couple of partners: the desired partnership name would be "Quid, Sovereign and Farthing". Anyone willing to change their names to Quid or Farthing?

Copyright John Sovereign, 2006

Aphorism of the Day: On Leadership

Motivational aphorism for the day:

I'd rather be thought of as a leader than be a "thought leader"....

Copyright John Sovereign, 2006

Tuesday, March 07, 2006

Data is a Virus

To paraphrase William S. Burroughs, "Data is a Virus"....

Data is the life-blood of any modern organization. A common problem at any IT shop is the proliferation of data sources and stores. Data management is a key IT capability which typically has no single owner. And unlike most other IT assets, data tends to move around a lot and grow quickly. Data is like life itself: it has an inherent desire to replicate. Like the stuff in your garage, it will grow to exceed all available disk space.

Some data, like a customer’s SSN and other Non-Public Information (NPI), needs to be quarantined like a “hot” virus. It should have an indisputable chain of ownership like the evidence on CSI. Few systems have the capability to track sensitive information at this level. How do organizations obtain this capability throughout core information models like Customer? Only by establishing iron-clad enterprise-wide information meta-models that “strongly type” these fine-grained data elements and requiring applications, middleware and tooling to enforce these rules.

If your business and systems architecture does not anticipate this life force, these pathogens will escape from behind your firewalls. I've even heard of some organizations contemplating “banning” data replication as a technical fix, as if they could ban life itself. If your IT shop isn’t demonstrably working towards practical, comprehensive management of your data (more than information security “awareness programs” or overly restrictive hobbling of your application delivery), they are exposing your brand to “tar and feathering” in the marketplace.

Copyright John Sovereign, 2006

Thursday, February 23, 2006

You can't manage it, if you don't measure it! NOT!

This maxim is simply not true!! Company cultures like Apple's are clear examples to the contrary. Steve Jobs is one of the premier practitioners of Management By Walking Around (MBWA); no metrics for him, nosireee....

Sunday, February 19, 2006

Manage IT before it manages you....

The question is not "Why can't we manage IT?" it's "Why DON'T we?!?"