Having multiple replicas makes it easier to practice parallel development: one release can be in functional test while another release is being performance tuned or developed. It also gives your deployment and infrastructure players a place to practice installs, middleware upgrades or other fun tweaks, while the developers and testers are busy doing their own thing. This flexibility comes at a price, of course. The maintenance of these non-production environments is a significant activity and a costly investment. In what might be considered a fairly mature practice, you may find individual replicas for development integration, functional test and performance test; in this approach, you may have twice as many non-production CPUs as production CPUs and be paying for the middleware you run on them as well (although some vendors give you a break on non-production licenses). Hardware still costs something (as oil prices keep increasing, the power to run those CPUs is costing more than the hardware) and the people that maintain the hardware cost a lot more.
The main frustration is keeping these various instances in sync as much as possible (except when they're intentionally not in sync to test some patch or other change). If the replicas don't have the same OS or middleware on them, the first response from the application development teams when a new "feature" is discovered will be "It works on my machine!"
The synchronization activity is complex. Each replica may have multiple tiers (web, integration, database) and each tier may involve a dozen major software and/or hardware components. The replicas are frequently in different data centers with different network topologies, different interconnects with other (usually non-production!) services and applications and different application configuration data. In addition, to save on infrastructure costs, the replicas are frequently crippled versions of production with fewer CPUs or actually a shared environment, i.e., the functional test and development integration replicas may run on the same physical host(s). If production and its replicas support multiple applications, the complexity is even higher as you have multiple releases of these applications running through the replicas and they share some dependencies on the infrastructure.
So having replicas is useful, but costly. How can this cost be managed and reduced?
- Reduce the number of replicas. As the title of this post suggests, you only need 3 logical replicas:
- one shared replica for development,
- the second for development integration & functional test
- and the third for performance testing.
I suggest that development integration & functional test share the same physical replica to avoid the otherwise inevitable releases that fail testing's initial "sniff test" due to environmental differences. - If you have a Disaster Recovery environment (as you should), it should already be a 100% fidelity replica of production and you should consider using it as your performance test replica (reducing the number of non-production replicas to 2!). In addition to eliminating the need for a separate replica, using your DR replica for performance testing will increase the likelihood that your performance tests will be accurate and that DR will work correctly in the event of an actual disaster, instead of just during a DR test. If you're using (some or all) of your DR replica to support production usage, i.e., load sharing, you may have to make adjustments, e.g., time-shifting your performance tests to periods of low volume. If your DR has passive nodes that are only active during a disaster or test, you may have to adopt new methods for activating and passivating these tiers or retain (portions of) a separate replica.
- Adopt the use of VMWare or other deployment provisioning tools to make it easy to run development replicas on developer workstations. Your developer's workstations or laptops should be beefy enough to deploy the entire application environment (if they're not, your developers are probably less productive than they could be). Again, if you're successful at this, you may be able to get rid of another replica.
- Adopt the use of Tripwire or other tools to track all system and application changes at a granular level. Incorporate these tools into your change management process. Make this information available on-line to the development/test teams, so they know what's changing and when. Ideally, your production environment should be using the same toolset as well.
Copyright 2006, John Sovereign
