Monday, January 14, 2013

Reproduce a Dynamic, Client-Server Production DB2 for z/OS Workload in Test with Optim Query Capture and Replay

Effective regression testing, and comparative application testing in general, depends largely on your ability to reproduce in a test environment a workload that closely mimics the production application workload of concern. Historically, the difficulty of reproducing a production workload in test has depended on the nature of that workload. Back when I started in IT (Ronald Reagan was in his first term as President of the USA), the focus of mainframe application performance testing efforts was often on batch workloads, and reproducing that type of production workload in test was not so tough: you had your input files, you had your batch jobs that were submitted in a certain sequence and with a certain timing through a job scheduling tool, you ran the jobs and measured the results against baseline data, and that was that.

Batch is still an important part of mainframe computing, but over time the emphasis at many DB2 for z/OS sites has shifted to transactional workloads; moreover, the nature of transactional workloads -- especially during the past 10 years or so -- has changed, with multi-tiered, network-attached, DRDA-using, client-server applications coming to the fore. On top of that, these modern transactional applications are much more dynamic than their forerunners. They are not driven by in-house staff performing clerical and data-entry functions via a "green screen" interface; instead, in many cases the end-users are external to the organization -- maybe consumers browsing online product catalogs and making purchase decisions, or perhaps employees of client companies in a business-to-business context, checking on orders or reviewing fulfillment history. If the end-users are internal to the organization, increasingly they are not clerical workers; rather, they are senior professionals, managers, and executives using analytics-oriented applications to improve speed and quality-of-outcome with respect to decision-making. The actions of these individuals, and the frequency and timing of their interactions with your DB2 for z/OS subsystem, are often hard to predict. For testing purposes, how do you get your arms around that?

And, getting your arms around that particular kind of application testing scenario is getting to be more and more important. If an environmental change (e.g., new system software releases) or an application modification is going to negatively impact performance from the end-user perspective, you REALLY want to catch that before the change goes into production. Elongated response time for in-house clerical staff is one thing, but poor performance affecting an external-to-the-organization end user can lead to lost business and, perhaps, long-term loss of customers (as the now-familiar adage goes, your competition is often just a click away). If performance degrades for an internal-use decision support application, likely as not it won't be a DBA getting calls from irate users -- it'll be directors and VPs and maybe your CIO getting calls from their peers on the business side of the organization.

In short, the challenge is tougher than it's been before, and the stakes are higher than they've been before. Gulp.

Fortunately, a recently announced and available IBM tool addresses this need very nicely. It's called IBM InfoSphere Optim Query Capture and Replay for DB2 on z/OS, and it came out just a couple of months ago. The fundamentals of what Optim Query Capture and Replay can do are spelled out in the product's name: it can capture a DDF application workload executing in your production DB2 for z/OS environment and enable the playing back of that captured workload in a test environment: the same SQL statements, with the same values, executed with the same volume and timing through the same number of connections to the DB2 subsystem.

This capture and replay capability by itself would come in very handy for things like regression testing, but that's not where the story ends. Suppose you want to see what would happen to response times if transaction volume were to increase by some amount? No problem: not only can Optim Query Capture and Replay play back a captured workload -- it can play it back at a higher speed; so, instead of, say, the 100 transactions per second seen in production for a client-server application workload, you could see how response times hold up at 150 transactions per second.

Speaking of response time, Optim Query Capture and Replay provides built-in reporting capabilities that help you to easily zero in on changes between baseline and replay test results.

What's more, Optim Query Capture and Replay can be used to invoke the IBM DB2 Cloning Tool to make a copy of a DB2 subsystem for testing purposes.

Oh, and I would be remiss if I failed to tell you that Optim Query Capture and Replay is not just about comparative workload testing. It's also a great tool for helping you to better understand a DB2 client-server application workload. Often, when it comes to these very dynamic, shape-shifting transactional applications, people want to get a better look at the trees in the forest. What SQL statements are being executed? What predicate values are being supplied by users? What columns are being retrieved? We are familiar with the idea of taking a "snapshot" of a database, but taking a snapshot (more accurately, a time slice), of a DDF workload seemed implausible -- until now. And why stop with just a better understanding of a client-server application workload? How about tuning it? The SQL statements in a workload captured by Optim Query Capture and Replay can be exported for analysis and tuning -- something you might do with a tool such as IBM's InfoSphere Optim Query Workload Tuner.

Now, all this good stuff would be less appealing if it came with too great a cost in terms of system overhead, so it's nice to know that Optim Query Capture and Replay has a pretty small footprint. This is true largely because the tool employs a "catch and throw" mechanism (more like "copy" than "catch") to send statements associated with a workload being captured to an external appliance, from which the workload can be replayed; thus, there is not a reliance on relatively expensive performance trace classes to get the statement-level data recorded by Optim Query Capture and Replay.

There you have it: a way to efficiently capture what may have appeared to you as an elusive workload, and to effectively use that captured workload for regression testing, "what if?" testing, and application SQL analysis. Check out Optim Query Capture and Replay, and get ready to go from, "My gut tells me..." to, "Here are the numbers." 

Monday, January 7, 2013

Get Hands-On with DB2 Automation Tool in Texas

First, my apologies for having let so much time go by since last posting to this blog. The fourth quarter of 2012 was a very busy time for me. The pace is a little less frenetic now, and I should be able to resume blogging here on a fairly regular basis (I hope to post an entry here within the next couple of days on a new and very cool tool that can be a big help in the area of DB2 for z/OS application testing).

Second, for those of you in the GREAT state of Texas (where I was born and raised), and particularly for people in the heart of The Lone Star State, I want to make you aware of an opportunity to get some hands-on time with the DB2 Automation Tool for z/OS (about which I blogged last year). IBM has partnered with the Heart of Texas DB2 Users Group (aka HOTDUG -- one of my favorite regional DB2 user group acronyms) to provide a free half-day of DB2 Automation Tool education and training. Show up, and you'll not only get the expected overview presentation -- you'll also get a demonstration on how to use the tool to set up profiles that can drive automated and intelligent DB2 for z/OS utility execution. AND you'll get to participate in a hands-on lab that will give you the opportunity to put the DB2 Automation Tool through its paces. AND you'll get breakfast and lunch. Sounds like a half-day well spent, to me.

You can RSVP for this event (appreciated, for food planning purposes, but not required for attendance) by sending an e-mail to my colleague Bill Houston (houstonb@us.ibm.com).

More information:

Date: Tuesday, January 22, 2013

Time: 9 AM to 12:00 PM

Location:
     IBM Executive Briefing Center
     Rio Grande Room (Building 904)
     11501 Burnet Road
     Austin, Texas  78758

Check it out. Get some food. Get some knowledge. Get your hands on a keyboard. 

Thursday, October 4, 2012

DB2 Utilities Enhancement Tool for z/OS: Easier, More Effective Management of Your DB2 Utility Jobs

John, a colleague of mine, was recently at a DB2 for z/OS site when a problem arose. Someone accidentally ran a LOAD utility with the REPLACE option on a table space that held a considerable amount of data. LOAD did what it is supposed to do when REPLACE is specified in the utility control statement: it cleared all the existing data from the table space and replaced those rows with records from the input data set. Unfortunately, the rows replaced -- and there were a lot of them -- were rows that the company wanted to keep in the table space. As the DB2 DBA team worked to recover the inadvertently discarded data (they were ultimately successful in doing so), John mentioned to the team leader that the problem could have been avoided via the IBM DB2 Utilities Enhancement Tool for z/OS (sometimes called UET). Needless to say, UET is now on that DB2 DBA team leader's "I want this" list.

John was referring in the aforementioned conversation to the syntax monitor function of Utilities Enhancement Tool -- one of several UET features that can enable you to more easily and effectively manage your IBM DB2 utility jobs. The syntax monitor can be used to examine control statements for utility jobs when they are submitted, allowing for prevention of job execution when options on the "no" list are specified (REPLACE, for example, in a LOAD utility statement) as well as addition of desired keywords to utility control statements when those are missing (one organization used UET to ensure that histogram statistics generation was requested for certain RUNSTATS jobs). Now, with respect to the LOAD REPLACE example I've used, you might be thinking that blanket interdiction of REPLACE for LOAD jobs would be overly burdensome in your environment -- perhaps there are times when you WANT to empty existing rows from a table space before adding new rows. Not a problem: Utilities Enhancement Tool allows you to apply a given utility syntax rule very broadly (e.g., at the DB2 subsystem level) or more narrowly (perhaps for a set of user IDs or job names -- you could, for example, set things up so that a utility keyword to be used on an exception basis could only be included in a job submitted by one of your senior DBAs). These rules and their application, by the way, are specified by way of an easy-to-create (and easy-to-modify) UET policy.

In addition to being able to being able to monitor the syntax of DB2 utility control statements, Utilities Enhancement Tool can monitor messages associated with utility job execution. With this capability, you can have UET issue a customized return code when a given message is encountered. That customized return code could be used to automatically trigger a responsive action through your site's automated operations system. Alternatively, operations personnel could be provided with response instructions specifically tied to utility return codes customized through Utilities Enhancement Tool (such an instruction might let operations folks know that a particular situation can be left alone until a DBA deals with it during daytime hours -- DBAs are a lot happier when they are not needlessly awakened in the middle of the night).

UET also delivers some extensions to the standard syntax of IBM DB2 utility control statements. For example, UET allows inclusion of the keyword PRESORT in a LOAD utility statement (an option that would not pass syntax muster in the absence of Utilities Enhancement Tool). When PRESORT is specified for a LOAD job, UET will invoke DFSORT (or, even better, DB2 Sort, if that product is installed on the system) to sort the input file in the target table's clustering key sequence. UET will then change the PRESORT keyword to the DB2 LOAD-recognized PRESORTED option, indicating to the utility that the input records are indeed sorted in clustering key sequence. LOAD will consequently execute its RELOAD and BUILD phases in parallel, and will avoid sorting the table's clustering index. The end result is a LOAD utility job that completes more quickly and consumes less CPU time than would be the case if the input file were not sorted in clustering key sequence.

There's more: when you execute the REORG utility with SHRLEVEL CHANGE to enable concurrent read/write access to the target table space's data, Utilities Enhancement Tool can automatically create the required mapping table for you. UET also provides a capability whereby you can have the tool cancel existing threads and block new threads that could prevent the successful completion of a utility operation.

UET provides you with a lot of useful information through its logging of interactions with DB2 utility jobs. Examining Utilities Enhancement Tool job log data will show you which policy rules were applied and which were not, and why. That knowledge, in turn, can help you to fine-tune your UET policy for a DB2 subsystem.

Of course, Utilities Enhancement Tool supports DB2 10 for z/OS. Like all of IBM's DB2 for z/OS tools, UET was ready for Version 10 when that release of DB2 became generally available a couple of years ago. Consider how UET could help you to introduce new utility keywords (maybe new in DB2 utilities, or just new for your organization) in your DB2 for z/OS environment in a low-risk, systematic, and managed way. Utilities Enhancement Tool can provide for you a new point of control for DB2 utility execution at your site, leading to greater efficiency and productivity, more timely exploitation of new utility control options, and fewer "oops" moments. Give UET a look, and think about how it could deliver for you.

Thursday, September 6, 2012

Intelligent Utility Scheduling with DB2 Automation Tool for z/OS

I have a few questions for you:
  • In your shop, are DB2 for z/OS database maintenance utilities such as RUNSTATS, COPY, and REORG executed on a periodic basis (e.g., once per week per table space) regardless of whether or not they actually need to be run for a given object?
  • Is it sometimes the case that DB2 utilities are NOT executed in your environment when they should be?
  • Do you spend more time than you'd like setting up and submitting DB2 utility jobs?
  • Are there higher-value activities to which you could attend if you weren't spending a lot of time looking after DB2 utility execution requirements?

If your answer to any of these questions would be, "yes," you ought to take a look at what the IBM DB2 Automation Tool for z/OS could do for you and your organization.

Here's the deal: mainframe DB2 DBAs are being asked to look after ever-larger databases (more and more objects, not just more data) and to keep these DB2 for z/OS systems in fighting trim with regard to CPU efficiency and in the expected lead spot when it comes to data availability. If these requirements are to be effectively addressed, certain DB2 utilities have to be run on a regular basis. Knowing this, DB2 for z/OS DBAs at plenty of sites have adopted a strategy of submitting RUNSTATS, COPY, and REORG jobs for table spaces based on what I'd call a calendar criterion. In other words, RUNSTATS, for example, might be executed once per month per table space. COPY might be executed once per week per table space for full backups, and once daily per table space for incremental backups between execution of full-backup jobs.

That mode of operation has the benefits of being simple and (generally) keeping things from "falling through the cracks," but it comes with some costs that your organization would probably like to avoid. Quite often, these costs hit you from the left and from the right -- that is, they are associated both with "not enough" and "too much" with respect to utility execution frequency:
  • On the "not enough" side, consider a once-per-month-per-table space approach to running the REORG utility. That might be OK for most of your table spaces, but you might have some tables with high insert activity (and not so much delete activity), and associated indexes on keys that are not continuously ascending (necessitating inserts of entries into the "middle" of these indexes). These indexes can become quite disorganized quite quickly, and that's not good for the performance of queries for which DB2 uses index scan access. Maybe once per month is not a sufficient REORG frequency for these table spaces (or separately for these indexes).
  • On the "too much" side, a common misstep is to execute RUNSTATS too frequently (an extreme case I've seen in the real world involved DAILY execution of RUNSTATS for a number of table spaces). Accurate catalog statistics are important for good query performance, but in many cases the data in a table space does not change significantly in a near-term time frame from a statistics perspective: row count is pretty stable, column cardinalities change little, high and low key values do not vary much, etc. In such cases, excessive execution of RUNSTATS burns CPU cycles without delivering performance benefits.
  • "Too much" also became more of an issue for REORG at some sites with the change introduced with DB2 9 for z/OS: when an online REORG of a subset of a partitioned table space's partitions is executed, any non-partitioned indexes (NPIs) defined on the underlying table will be reorganized in their entirety. That was a very good change from an availability perspective (it eliminated the so-called "BUILD2" phase of online REORG, which effectively blocked data access for a time), but it increased the system resource consumption (CPU cycles and disk space) associated with these REORG jobs. That, in turn, provided a new incentive for avoiding unnecessary REORGs.

Those are just a few examples, but you get the picture. The thing is, shifting to a purpose-driven utility execution strategy ("I need to run COPY for this table space because of recent update activity") versus a strictly calendar-based approach ("I'll execute COPY for this table space -- whether or not it's been updated -- because I haven't done that in X days") can take a lot of time if you go it alone -- and who has lots of time on his or her hands these days? The solution here is to not go it alone. Let DB2 Automation Tool assist you in getting to needs-based DB2 utility execution. With Automation Tool's easy-to-use ISPF interface, you can quickly define:
  • Object profiles -- sets of database objects (table spaces and/or indexes) for which you want to create utility execution action plans.
  • Utility profiles -- these designate the utilities you want to be executed, and how you want them to execute (referring to utility options).
  • Exception profiles -- the "smart" in smart utility execution. With DB2 Automation Tool, you have 180 different exceptions which you can use individually or in combination to determine when conditions warrant the execution of a utility for a database object. What's more, you can add greater sophistication to your smart utility execution criteria through formulas that you can implement with DB2 Automation Tool user exits and REXX EXECs
  • Job profiles -- where intention turns into action. Exception jobs can be automatically executed at regular intervals to evaluate objects, and utility jobs needed to address identified exceptions can be automatically submitted for execution.

Put it all together, and you get a DB2 utility execution mechanism that saves you time and saves your organization CPU and disk resources -- not only by avoiding unnecessary utility execution, but also through more effective maintenance of database objects and catalog statistics to help ensure consistently good DB2 application performance. Plus, with your own freed-up bandwidth you can more fully engage in things like application enablement and other activities that really deliver value to your company.

Something else to keep in mind: this is an IBM DB2 tool we're talking about, so of course it supports -- and exploits -- the latest version of DB2 for z/OS (that's of course DB2 10). Some examples of DB2 feature exploitation by DB2 Automation Tool are:
  • The DB2 administrative task scheduler. DB2 Automation tool makes it easier to utilize this DB2 capability and helps you to reduce associated time to value; furthermore, the DB2 administrative task scheduler offers an operationally flexible means of submitting utility jobs generated via DB2 Automation Tool.
  • DB2 10 autonomic statistics. DB2 Automation Tool provides an interface to the DB2 10 autonomic statistics stored procedures, which can be used to determine when statistics need to be collected for database objects, and to automate RUNSTATs execution. DB2 Automation Tool also provides an interface to manage maintenance windows for executing the RUNSTATS utility.
  • The new (with DB2 10) REORGCLUSTERSENS and REORGSCANACCESS columns of the SYSTABLESPACESTATS real-time statistics table in the DB2 catalog. These are among the exception criteria that can be used with DB2 Automation Tool to help determine when REORG should be executed for a table space.

Obviously, a high degree of integration with the DB2 "engine" is there. So, too, is integration with other IBM DB2 for z/OS tools. Want to generate an image copy of an object using a system-level DB2 backup generated with DB2 Recovery Expert for z/OS? Easily done with DB2 Automation Tool.

You have better things to do than deal with the scheduling of DB2 utilities, and your organization should get the benefits that come with need-based utility execution. Do yourself and your company a favor, and check out the IBM DB2 Automation Tool for z/OS. Smart utility execution is within your reach.

Tuesday, August 14, 2012

Got DB2 for z/OS? GET DATA STUDIO

If you work with DB2 for z/OS -- as a systems programmer, a DBA, an application developer, or in some other capacity -- you REALLY ought to be using IBM Data Studio. Why? Well, start with the fact that it's FREE and downloadable from the Web. Now, free is not such a big deal if the product in question is a fluffy piece of almost-nothing that delivers practically zilch in the way of useful functionality. In the case of Data Studio, free is a great thing, because the tool is loaded with features that can boost your productivity and effectiveness as a mainframe DB2 professional (in addition to DB2 for z/OS, Data Studio can be used with DB2 for Linux, UNIX, and Windows; DB2 for i, Informix, and several non-IBM relational database management systems). In this blog entry, I'll describe a few of the Data Studio capabilities that I most appreciate (and yes, I most definitely have it on my PC). I encourage you to get your own copy of Data Studio, and to explore its functionality to see how the tool can best deliver for you.

Before going further, a couple of preliminary information items: first, Data Studio as we know it today essentially made the scene in the fall of last year, with the delivery of Version 3.1 (the current release is Version 3.1.1). If you last looked at Data Studio prior to Version 3.1, look again. That was a big-time product upgrade.

Second, note that Data Studio comes in two forms: the administration client and the full client. There's a page on IBM's Web site that provides detailed information on the features of the administration client and the full client, but I'll tell you here that the major difference between the two Data Studio clients is the support for development of DB2-accessing Java code provided by the full client, and the ability you have with the full client to "shell-share" with other Eclipse-based tools, including a number of IBM's Rational and InfoSphere Optim products. Now, you might think, "Given that Data Studio is free, why wouldn't I just download the full client, and not bother with figuring out whether or not a capability I want is or isn't provided by the administration client?" In fact, going with the full client is a pretty good idea, as far as I'm concerned. One reason there are two clients is that the full client is about three times the size of the administration client. That can be important in terms of download time, IF you don't have access to a high-speed Internet connection. These days, lots of folks have high-speed Internet connections of 50+ or even 100+ Mbps, and when that's the case download time is less of a concern and getting the Data Studio full client is a good choice. If your download capacity is more limited, and if you don't need Java development support and/or shell-sharing capability with other Eclipse-based tools, the Data Studio administration client could be best for you.

OK, on now to my personal favorites among the capabilities provided by Data Studio:
  • SQL statement tuning. So much to like here. I like the access plan graph that I can get for a query. I like how I can quickly get information about an element in the access plan graph (for example, a table accessed by the query you're analyzing) just by hovering over it with my cursor. I like how I can get more information about that element in the access plan graph by clicking on it, and even more information via the handy explorer window I get for a clicked-on element of the access plan graph (if the element clicked on is a table, the explorer window lets me very quickly obtain information about indexes on the table, keys of these indexes, columns in the index keys, etc.). I like that the access plan graph shows me the optimizer estimates for the number of result set rows going into a step of the access plan, and for the number of rows coming out of that step. I REALLY like Data Studio's statistics advisor, which provides me with RUNSTATS utility control statements aimed at gathering statistics which could help the DB2 optimizer to choose a better-performing access path for the query I'm tuning (people at the IBM Support Center who spend their time working on query performance problems will tell you that MOST of the situations with which they deal could be resolved through updated and/or enriched catalog statistics). I am one who for years analyzed query performance by slogging through data retrieved from the PLAN_TABLE. I finally got wise and started leveraging the query tuning aids provided by Data Studio, and as a result I got more efficient -- and more effective -- in getting queries to run faster.
  • Submitting SQL statements. As I stayed for a long time with the old ways of query tuning before getting modern with Data Studio, so I also persisted in submitting SQL statements the old way: via SPUFI in a TSO/ISPF session. Know what got me into using Data Studio versus SPUFI for interactive SQL statement execution? XML. I was trying one day to retrieve data in an XML column in a DB2 for z/OS table using SPUFI, and got majorly frustrated in trying to get that to work. Thinking, "Why not?" I gave it a go using Data Studio, and it worked like a champ right off the bat. Not only did I get the XML data that I wanted using Data Studio, I also got it very nicely formatted, with different lines and different levels of indentation showing clearly the structure of the returned XML data. I've also gotten to where I appreciate what I can do with a multi-row query result set obtained via Data Studio -- an example is easy sorting of the result set rows by any column's values. I also like that Data Studio kind of looks over your shoulder as you enter a query's text, showing a little red X by a line of the query if you've left something out that would cause a syntax error -- I like seeing this and getting a chance to correct the text BEFORE submitting the statement for execution. I like that Data Studio keeps a record of statements you've submitted, making it very easy to re-submit a query if you want to. I could go on and on, but the bottom line is this: Data Studio is not just "SPUFI on a PC" when it comes to entering and executing SQL statements -- it's better than SPUFI.
  • Formatting SQL statements. Over the years I've done a fair amount of query analysis work in DB2 for z/OS-based data warehouse environments. Some of the queries you might be asked to look over in such an environment might take up several pages when printed out. If one of these humongous queries (lots of tables joined, lots of subquery predicates, lots of CASE expressions, etc.) is handed to you unformatted, you've got a big job on your hands just to make it readable -- a big job, that is, if you don't have Data Studio. If you do have Data Studio, you just paste the query into an SQL input window and ask Data Studio to format it for you. You then get indentations and line breaks that make the big hairy query easy to read, and that makes working with the query much easier.
  • Retrieving data from the DB2 catalog. Sure, you can do this by issuing queries against catalog tables, but it can be done a whole lot more quickly using Data Studio's data source explorer when you're connected to a DB2 for z/OS subsystem. 
  • Developing SQL stored procedures. So-called native SQL procedures, introduced with DB2 9 for z/OS, are, in my opinion, the future of DB2 stored procedures. Developing, testing, and deploying native SQL procedures is a snap with Data Studio. I've done it, and so can you.

The above-listed Data Studio features are just the ones that I've used most extensively -- there is a LOT more that you can do with the tool, such as creating Web services for DB2 data access; creating, altering, and dropping tables and indexes and other database objects; generating DDL from objects in a database; generating diagrams that show database objects and relationships between those objects; and comparing one database object with another (e.g., the definition of table A as compared to the definition of table B). The best way to see what Data Studio can do for YOU is to get your hands on it and exercise the functionality yourself.

Hey, if you're like me -- a longtime mainframe DB2 professional -- then you've done most of the DB2 work you've needed to do over the years using the good old 3270 "green screen" interface. That interface will be with us for a long time to come, I'm sure, but for a number of DB2-related tasks a GUI is not just a different way -- it's a better way. Data Studio is your opportunity to put this assertion of mine to the test. Give it a try. I think you'll like what you find.
    

Wednesday, July 18, 2012

Quickly and Easily Clone DB2 Subsystems and Objects with DB2 Cloning Tool for z/OS

There are times when a clone of an existing DB2 for z/OS subsystem can be very handy for program testing purposes (or to speed up a production application process -- more on that momentarily). Suppose you want to clone one of your DB2 subsystems. What will you have to do? Some of the required steps -- identifying the subsystem to z/OS via PARMLIB updates, setting up RACF (or third-party equivalent) security, providing DB2 libraries for the subsystem (e.g., SDSNLOAD, SDSNEXIT), determining the volumes in the disk subsystem that will be used for the clone -- are pretty straightforward and not particularly time-consuming. Then there's the data set part. You have to initialize all the "system" data sets for the clone (catalog, directory, bootstrap data set, log data sets, etc.), and you have to deal with all of the "user" data sets (potentially, tens of thousands) associated with table spaces and indexes -- including any required data set renaming and alteration of object identifier information so that these IDs will line up with those in the clone's DB2 catalog. Ugh. You can grind through this data set tedium yourself, but there's a MUCH better way to get the job done: use IBM's DB2 Cloning Tool for z/OS to handle all the data set-related work, and get your clone established way faster (and without the risk of error linked to the manual approach).

DB2 Cloning Tool makes life easier for DBAs with respect to two main functions: cloning a DB2 subsystem in its entirety, and cloning individual objects or sets of objects (table spaces and indexes) by overlaying objects on a target subsystem with objects from a source subsystem (this object-level cloning functionality is sometimes referred to as "object refresh"). For the cloning of an entire subsystem, what has to be done ahead of time is the system stuff I mentioned above (PARMLIB update to identify the new subsystem to z/OS, RACF set-up, DB2 libraries, etc.). DB2 Cloning Tool takes it from there, doing the heavy lifting related to dealing with all the DB2 data sets. As noted previously, a key benefit of DB2 Cloning Tool utilization is speed. One of the ways in which DB2 Cloning Tool delivers here is through its ability to utilize ultra-fast, disk subsystem-based data replication technology, such as FlashCopy in IBM disk storage systems, and compatible technologies provided by other storage system vendors. FlashCopy replication of one volume's contents to another volume is near-instantaneous from a source system perspective, and on top of that you get some nice CPU savings on the source and target z/OS LPARs because the physical data copying work is accomplished by processors in the disk subsystem.

FlashCopy exploitation is great, but DB2 Cloning Tool is by no means limited to using that technology to accomplish a subsystem cloning operation. You can direct DB2 Cloning Tool to use any of a number of mechanisms to create a DB2 subsystem clone. For example, you could choose to go with dump/restore processing, or you could have DB2 Cloning Tool create a clone using your disk subsystem's remote mirroring functionality (e.g., IBM's Metro Mirror technology, formerly known as PPRC). DB2 Cloning Tool can also take a system-level backup generated by the IBM DB2 Recovery Expert for z/OS and use that to create a clone of a DB2 subsystem (I blogged about Recovery Expert earlier this year). DB2 Cloning Tool's flexibility with respect to the mechanisms used to establish a clone of a DB2 subsystem means that you can go with the approach that best suits a particular situation in your shop.

Another way in which DB2 Cloning Tool speeds things up is its proprietary process for renaming data sets, when that is required for a cloning operation. Far faster than IDCAMS RENAME, this process by itself can take a big chunk out of the time that would otherwise be required to clone a large DB2 subsystem.

Speed of execution is also enhanced through features that boost DBA productivity. This certainly comes into play when you want to use DB2 Cloning Tool to clone objects -- that is, to refresh a subset of the data in a cloned DB2 subsystem. Here, DBA work is accelerated through an interface that is very similar to the DB2 LISTDEF utility. The DB2 utility DSN1COPY, which can be used to perform object ID translation when copying a DB2 data set from one subsystem to another, is limited to one input data set per execution. By contrast, DB2 Cloning Tool allows "wild-carding" and INCLUDE/EXCLUDE specifications that make it easy to pull a whole set of objects into one data refresh operation. For example, you could refresh data in a target subsystem using all source system table spaces in database XYZ whose names begin with the characters 'EMP'. Want to include the indexes on those table spaces? No problem. Want to do a refresh using source table space ABC and all table spaces referentially related to it? Piece of cake.

And hey, DB2 Cloning Tool doesn't just make you more productive -- it also helps to keep you out of trouble by taking care of the storage system stuff (e.g., volume initialization) that needs doing when a DB2 subsystem is cloned. You might say that DB2 Cloning Tool understands database-speak (the interface is DBA-centric) and translates that as needed into data set and storage-speak. In addition to protecting you from mistakes, this "storage-aware" aspect of DB2 Cloning Tool also cuts way down on the time investment required of your colleagues in storage administration: you collaborate with these folks for the initial set-up of a DB2 subsystem clone, and then you just turn the crank using the DB2 Cloning Tool.

As for use cases, I mentioned in my opening paragraph that a DB2 subsystem clone is most often thought of in the context of application testing. Indeed, DB2 clones are very useful for that purpose, but don't think that this is the only way your organization can derive value from DB2 Cloning Tool. One company used the product to surmount what had been a vexing challenge in their production DB2 environment. The problem in this case concerned a report-generation application that took three hours to complete. That would be OK, except for the fact that the DB2 data on which the reports are based had to remain static during execution of the application, and at most there was a two-hour window during which the target DB2 tables could be in a read-only state. This organization hit on the idea of using DB2 Cloning Tool to create a clone of the DB2 subsystem, against which the reporting application could run. The clone is created by DB2 Cloning Tool in only 7 minutes (thanks to exploitation of fast replication technology, mentioned previously), and is "thrown away" once the report-generating application has completed (clone disposal is another DB2 Cloning Tool function -- done in a neat and orderly fashion, on request). Because the clone is a separate DB2 subsystem, database object names don't have to be changed, and that means no required program code changes for the reporting application. Because the clone subsystem is created so quickly, data-changing applications can get going almost two hours earlier than before. Everybody's happy.

Seems like a good point on which to close: when creating (and refreshing) a DB2 subsystem clone is so easy and so fast (and it can be when you use DB2 Cloning Tool), you can advantageously use DB2 clones in ways that might not otherwise have occurred to you. Food for thought.

Thursday, July 5, 2012

Proactive SQL Statement Tuning with IBM InfoSphere Optim Query Workload Tuner for DB2 for z/OS

SQL statement tuning -- most often in the form of query tuning -- is important when it comes to boosting the CPU efficiency of a DB2 for z/OS application workload. Sure, there are "system-level" adjustments that one can make to reduce  the CPU consumption of a DB2 application workload (things like enlarging and/or page-fixing buffer pools, binding high-use packages with RELEASE(DEALLOCATE), setting up CICS-DB2 protected entry threads, and -- in a data sharing environment -- increasing the size of the global lock structure), and these definitely have their place as part of an overall DB2 performance tuning strategy; however, a given system-level change will often have a CPU overhead impact of just a few percentage points. For really dramatic improvements in CPU efficiency (and run times), one generally needs to turn one's focus to particular SQL statements. In some cases, I've seen statement-focused tuning actions reduce the CPU cost of a query by 80%, 90%, or more (with attendant reductions in elapsed time).

So, SQL statement tuning can be a very good thing. Trouble is, for many DBAs it's an interrupt-driven process: you're working away at something, and your work is interrupted by a phone call, or an instant message, or (oh, boy) someone standing at your desk to tell you about a query that is running too long. So, you drop what you were doing, and you analyze the problem query and take some action that (hopefully) brings execution time down to an acceptable level. Then, you get back to whatever task you had to temporarily suspend, and roll along until the next "fix this query!" disruption.

If you're tired of this reactive approach to query tuning -- tired (as we say in the USA) of chasing after horses that have gotten out of the barn -- then you should take a look at a tool that can put you in control -- a tool that can enable you to get out in front of problem queries and transform query tuning from disaster response to an ongoing process that delivers results you can easily document and communicate to the higher-ups in your organization. The tool I'm talking about is IBM's InfoSphere Optim Query Workload Tuner for DB2 for z/OS, and it can be the key to making your query tuning efforts proactive in nature.

One of the defining aspects of Query Workload Tuner is in the name: workload. With Query Workload Tuner, you can bring powerful tuning capabilities to bear on not just a single query, but a set of queries. We call that set a "query workload." What is it? It's what you need and want it to be. Query Workload Tuner makes it easy to define a set -- or sets -- of queries that matter to your company. Is it the top 25 dynamic SQL statements, ranked by CPU or elapsed time, pulled from the DB2 for z/OS dynamic statement cache? Top CPU-consuming queries -- whether static or dynamic -- as identified by IBM's DB2 Query Monitor for z/OS or another vendor's query monitor? The SQL statements in a DB2 package? The statements that access a particular database object? In any case, you tell Query Workload Tuner what you want to tune, and once you've defined a set of queries you can save the workload definition and re-generate the query set of interest when you want to. At plenty of sites, query tuning has become process-driven in exactly this way: a DBA tunes a specified query set, then periodically -- weekly, biweekly, monthly, or at some other interval that suits the needs of the organization -- regenerates the query workload and tunes it again. If that set is a "top n" group of queries, it could well change from interval to interval, as formerly high-cost queries fall out of the "top n" mix (a result of their running faster following tuning actions) and others bubble up.

To boost your SQL statement tuning productivity, Query Workload Tuner provides several "advisors" through which sets of queries can be run. Two of the most useful of these are the Statistics Advisor and the Index Advisor. The Statistics Advisor will generate RUNSTATS utility control statements that can be used to update DB2 catalog statistics so as to enable the DB2 optimizer to choose better-performing access paths for queries in the target workload (for example, it might be recommended that value-frequency statistics be gathered to provide DB2 with information about the distribution of duplicate values in a column or set of columns). The Index Advisor will suggest table-indexing changes aimed at reducing the CPU cost of a set of queries, thereby helping you to get the biggest bang for the buck with respect to new and modified indexes (maximizing the performance payback from new and/or expanded indexes is important, as indexes have associated costs and you don't want to incur these without a significant return on investment). A particularly attractive aspect of query tuning via updated statistics and indexing is the fact that performance benefits can be realized without having to change SQL statement coding. That's a big deal these days, as it's increasingly common to have a situation in which SQL statements cannot be modified, either because they are generated by a query/reporting tool or they are issued by purchased applications (e.g., a DB2-accessing ERP or CRM application).

Of course, when you're engaged in a proactive query workload tuning process it's nice to be able to see how things have changed for the better (and -- just as important -- to show other people in your organization how things have changed for the better). Query Workload Tuner delivers a big assist in this area via its Workload Access Plan Comparison feature. Using this capability, you can see the before-tuning and after-tuning change in cost for a query workload (the figures are estimated costs, but these are usually a good indicator of actual performance), along with details such changes in table join order, join method, index utilization, etc. And, Query Workload Tuner's Workload Access Plan Comparison feature isn't just a great way to track the progress of your ongoing query tuning efforts -- it's also very useful for zeroing in on access path changes resulting from a migration from one version of DB2 for z/OS to another.

One other thing: if it's been a while since you last looked at Query Workload Tuner, look again. This is a product that is being regularly enhanced by IBM. Version 3.1 of InfoSphere Optim Query Workload Tuner for DB2 for z/OS, delivered in the fall of 2011, provided the aforementioned Workload Access Plan Comparison functionality. More recently, with Version 3.1.1, the Query Workload Tuner Client was replaced by plug-ins that make IBM's Data Studio the end-user interface for the product.

So, consider what your DB2 for z/OS query tuning experiences are like now ("SMITH! Speed that query up NOW!"), and what you'd like them to be ("Nice work, Smith -- you've taken a lot of CPU time out of that application"). Query Workload Tuner can help you to get from A to B.