DB2 for z/OS: Beyond the Engine: 2012

Thursday, October 4, 2012

DB2 Utilities Enhancement Tool for z/OS: Easier, More Effective Management of Your DB2 Utility Jobs

John, a colleague of mine, was recently at a DB2 for z/OS site when a problem arose. Someone accidentally ran a LOAD utility with the REPLACE option on a table space that held a considerable amount of data. LOAD did what it is supposed to do when REPLACE is specified in the utility control statement: it cleared all the existing data from the table space and replaced those rows with records from the input data set. Unfortunately, the rows replaced -- and there were a lot of them -- were rows that the company wanted to keep in the table space. As the DB2 DBA team worked to recover the inadvertently discarded data (they were ultimately successful in doing so), John mentioned to the team leader that the problem could have been avoided via the IBM DB2 Utilities Enhancement Tool for z/OS (sometimes called UET). Needless to say, UET is now on that DB2 DBA team leader's "I want this" list.

John was referring in the aforementioned conversation to the syntax monitor function of Utilities Enhancement Tool -- one of several UET features that can enable you to more easily and effectively manage your IBM DB2 utility jobs. The syntax monitor can be used to examine control statements for utility jobs when they are submitted, allowing for prevention of job execution when options on the "no" list are specified (REPLACE, for example, in a LOAD utility statement) as well as addition of desired keywords to utility control statements when those are missing (one organization used UET to ensure that histogram statistics generation was requested for certain RUNSTATS jobs). Now, with respect to the LOAD REPLACE example I've used, you might be thinking that blanket interdiction of REPLACE for LOAD jobs would be overly burdensome in your environment -- perhaps there are times when you WANT to empty existing rows from a table space before adding new rows. Not a problem: Utilities Enhancement Tool allows you to apply a given utility syntax rule very broadly (e.g., at the DB2 subsystem level) or more narrowly (perhaps for a set of user IDs or job names -- you could, for example, set things up so that a utility keyword to be used on an exception basis could only be included in a job submitted by one of your senior DBAs). These rules and their application, by the way, are specified by way of an easy-to-create (and easy-to-modify) UET policy.

In addition to being able to being able to monitor the syntax of DB2 utility control statements, Utilities Enhancement Tool can monitor messages associated with utility job execution. With this capability, you can have UET issue a customized return code when a given message is encountered. That customized return code could be used to automatically trigger a responsive action through your site's automated operations system. Alternatively, operations personnel could be provided with response instructions specifically tied to utility return codes customized through Utilities Enhancement Tool (such an instruction might let operations folks know that a particular situation can be left alone until a DBA deals with it during daytime hours -- DBAs are a lot happier when they are not needlessly awakened in the middle of the night).

UET also delivers some extensions to the standard syntax of IBM DB2 utility control statements. For example, UET allows inclusion of the keyword PRESORT in a LOAD utility statement (an option that would not pass syntax muster in the absence of Utilities Enhancement Tool). When PRESORT is specified for a LOAD job, UET will invoke DFSORT (or, even better, DB2 Sort, if that product is installed on the system) to sort the input file in the target table's clustering key sequence. UET will then change the PRESORT keyword to the DB2 LOAD-recognized PRESORTED option, indicating to the utility that the input records are indeed sorted in clustering key sequence. LOAD will consequently execute its RELOAD and BUILD phases in parallel, and will avoid sorting the table's clustering index. The end result is a LOAD utility job that completes more quickly and consumes less CPU time than would be the case if the input file were not sorted in clustering key sequence.

There's more: when you execute the REORG utility with SHRLEVEL CHANGE to enable concurrent read/write access to the target table space's data, Utilities Enhancement Tool can automatically create the required mapping table for you. UET also provides a capability whereby you can have the tool cancel existing threads and block new threads that could prevent the successful completion of a utility operation.

UET provides you with a lot of useful information through its logging of interactions with DB2 utility jobs. Examining Utilities Enhancement Tool job log data will show you which policy rules were applied and which were not, and why. That knowledge, in turn, can help you to fine-tune your UET policy for a DB2 subsystem.

Of course, Utilities Enhancement Tool supports DB2 10 for z/OS. Like all of IBM's DB2 for z/OS tools, UET was ready for Version 10 when that release of DB2 became generally available a couple of years ago. Consider how UET could help you to introduce new utility keywords (maybe new in DB2 utilities, or just new for your organization) in your DB2 for z/OS environment in a low-risk, systematic, and managed way. Utilities Enhancement Tool can provide for you a new point of control for DB2 utility execution at your site, leading to greater efficiency and productivity, more timely exploitation of new utility control options, and fewer "oops" moments. Give UET a look, and think about how it could deliver for you.

Thursday, September 6, 2012

Intelligent Utility Scheduling with DB2 Automation Tool for z/OS

I have a few questions for you:

In your shop, are DB2 for z/OS database maintenance utilities such as RUNSTATS, COPY, and REORG executed on a periodic basis (e.g., once per week per table space) regardless of whether or not they actually need to be run for a given object?
Is it sometimes the case that DB2 utilities are NOT executed in your environment when they should be?
Do you spend more time than you'd like setting up and submitting DB2 utility jobs?
Are there higher-value activities to which you could attend if you weren't spending a lot of time looking after DB2 utility execution requirements?

If your answer to any of these questions would be, "yes," you ought to take a look at what the IBM DB2 Automation Tool for z/OS could do for you and your organization.

Here's the deal: mainframe DB2 DBAs are being asked to look after ever-larger databases (more and more objects, not just more data) and to keep these DB2 for z/OS systems in fighting trim with regard to CPU efficiency and in the expected lead spot when it comes to data availability. If these requirements are to be effectively addressed, certain DB2 utilities have to be run on a regular basis. Knowing this, DB2 for z/OS DBAs at plenty of sites have adopted a strategy of submitting RUNSTATS, COPY, and REORG jobs for table spaces based on what I'd call a calendar criterion. In other words, RUNSTATS, for example, might be executed once per month per table space. COPY might be executed once per week per table space for full backups, and once daily per table space for incremental backups between execution of full-backup jobs.

That mode of operation has the benefits of being simple and (generally) keeping things from "falling through the cracks," but it comes with some costs that your organization would probably like to avoid. Quite often, these costs hit you from the left and from the right -- that is, they are associated both with "not enough" and "too much" with respect to utility execution frequency:

On the "not enough" side, consider a once-per-month-per-table space approach to running the REORG utility. That might be OK for most of your table spaces, but you might have some tables with high insert activity (and not so much delete activity), and associated indexes on keys that are not continuously ascending (necessitating inserts of entries into the "middle" of these indexes). These indexes can become quite disorganized quite quickly, and that's not good for the performance of queries for which DB2 uses index scan access. Maybe once per month is not a sufficient REORG frequency for these table spaces (or separately for these indexes).
On the "too much" side, a common misstep is to execute RUNSTATS too frequently (an extreme case I've seen in the real world involved DAILY execution of RUNSTATS for a number of table spaces). Accurate catalog statistics are important for good query performance, but in many cases the data in a table space does not change significantly in a near-term time frame from a statistics perspective: row count is pretty stable, column cardinalities change little, high and low key values do not vary much, etc. In such cases, excessive execution of RUNSTATS burns CPU cycles without delivering performance benefits.
"Too much" also became more of an issue for REORG at some sites with the change introduced with DB2 9 for z/OS: when an online REORG of a subset of a partitioned table space's partitions is executed, any non-partitioned indexes (NPIs) defined on the underlying table will be reorganized in their entirety. That was a very good change from an availability perspective (it eliminated the so-called "BUILD2" phase of online REORG, which effectively blocked data access for a time), but it increased the system resource consumption (CPU cycles and disk space) associated with these REORG jobs. That, in turn, provided a new incentive for avoiding unnecessary REORGs.

Those are just a few examples, but you get the picture. The thing is, shifting to a purpose-driven utility execution strategy ("I need to run COPY for this table space because of recent update activity") versus a strictly calendar-based approach ("I'll execute COPY for this table space -- whether or not it's been updated -- because I haven't done that in X days") can take a lot of time if you go it alone -- and who has lots of time on his or her hands these days? The solution here is to not go it alone. Let DB2 Automation Tool assist you in getting to needs-based DB2 utility execution. With Automation Tool's easy-to-use ISPF interface, you can quickly define:

Object profiles -- sets of database objects (table spaces and/or indexes) for which you want to create utility execution action plans.
Utility profiles -- these designate the utilities you want to be executed, and how you want them to execute (referring to utility options).
Exception profiles -- the "smart" in smart utility execution. With DB2 Automation Tool, you have 180 different exceptions which you can use individually or in combination to determine when conditions warrant the execution of a utility for a database object. What's more, you can add greater sophistication to your smart utility execution criteria through formulas that you can implement with DB2 Automation Tool user exits and REXX EXECs
Job profiles -- where intention turns into action. Exception jobs can be automatically executed at regular intervals to evaluate objects, and utility jobs needed to address identified exceptions can be automatically submitted for execution.

Put it all together, and you get a DB2 utility execution mechanism that saves you time and saves your organization CPU and disk resources -- not only by avoiding unnecessary utility execution, but also through more effective maintenance of database objects and catalog statistics to help ensure consistently good DB2 application performance. Plus, with your own freed-up bandwidth you can more fully engage in things like application enablement and other activities that really deliver value to your company.

Something else to keep in mind: this is an IBM DB2 tool we're talking about, so of course it supports -- and exploits -- the latest version of DB2 for z/OS (that's of course DB2 10). Some examples of DB2 feature exploitation by DB2 Automation Tool are:

The DB2 administrative task scheduler. DB2 Automation tool makes it easier to utilize this DB2 capability and helps you to reduce associated time to value; furthermore, the DB2 administrative task scheduler offers an operationally flexible means of submitting utility jobs generated via DB2 Automation Tool.
DB2 10 autonomic statistics. DB2 Automation Tool provides an interface to the DB2 10 autonomic statistics stored procedures, which can be used to determine when statistics need to be collected for database objects, and to automate RUNSTATs execution. DB2 Automation Tool also provides an interface to manage maintenance windows for executing the RUNSTATS utility.
The new (with DB2 10) REORGCLUSTERSENS and REORGSCANACCESS columns of the SYSTABLESPACESTATS real-time statistics table in the DB2 catalog. These are among the exception criteria that can be used with DB2 Automation Tool to help determine when REORG should be executed for a table space.

Obviously, a high degree of integration with the DB2 "engine" is there. So, too, is integration with other IBM DB2 for z/OS tools. Want to generate an image copy of an object using a system-level DB2 backup generated with DB2 Recovery Expert for z/OS? Easily done with DB2 Automation Tool.

You have better things to do than deal with the scheduling of DB2 utilities, and your organization should get the benefits that come with need-based utility execution. Do yourself and your company a favor, and check out the IBM DB2 Automation Tool for z/OS. Smart utility execution is within your reach.

Tuesday, August 14, 2012

Got DB2 for z/OS? GET DATA STUDIO

If you work with DB2 for z/OS -- as a systems programmer, a DBA, an application developer, or in some other capacity -- you REALLY ought to be using IBM Data Studio. Why? Well, start with the fact that it's FREE and downloadable from the Web. Now, free is not such a big deal if the product in question is a fluffy piece of almost-nothing that delivers practically zilch in the way of useful functionality. In the case of Data Studio, free is a great thing, because the tool is loaded with features that can boost your productivity and effectiveness as a mainframe DB2 professional (in addition to DB2 for z/OS, Data Studio can be used with DB2 for Linux, UNIX, and Windows; DB2 for i, Informix, and several non-IBM relational database management systems). In this blog entry, I'll describe a few of the Data Studio capabilities that I most appreciate (and yes, I most definitely have it on my PC). I encourage you to get your own copy of Data Studio, and to explore its functionality to see how the tool can best deliver for you.

Before going further, a couple of preliminary information items: first, Data Studio as we know it today essentially made the scene in the fall of last year, with the delivery of Version 3.1 (the current release is Version 3.1.1). If you last looked at Data Studio prior to Version 3.1, look again. That was a big-time product upgrade.

Second, note that Data Studio comes in two forms: the administration client and the full client. There's a page on IBM's Web site that provides detailed information on the features of the administration client and the full client, but I'll tell you here that the major difference between the two Data Studio clients is the support for development of DB2-accessing Java code provided by the full client, and the ability you have with the full client to "shell-share" with other Eclipse-based tools, including a number of IBM's Rational and InfoSphere Optim products. Now, you might think, "Given that Data Studio is free, why wouldn't I just download the full client, and not bother with figuring out whether or not a capability I want is or isn't provided by the administration client?" In fact, going with the full client is a pretty good idea, as far as I'm concerned. One reason there are two clients is that the full client is about three times the size of the administration client. That can be important in terms of download time, IF you don't have access to a high-speed Internet connection. These days, lots of folks have high-speed Internet connections of 50+ or even 100+ Mbps, and when that's the case download time is less of a concern and getting the Data Studio full client is a good choice. If your download capacity is more limited, and if you don't need Java development support and/or shell-sharing capability with other Eclipse-based tools, the Data Studio administration client could be best for you.

OK, on now to my personal favorites among the capabilities provided by Data Studio:

SQL statement tuning. So much to like here. I like the access plan graph that I can get for a query. I like how I can quickly get information about an element in the access plan graph (for example, a table accessed by the query you're analyzing) just by hovering over it with my cursor. I like how I can get more information about that element in the access plan graph by clicking on it, and even more information via the handy explorer window I get for a clicked-on element of the access plan graph (if the element clicked on is a table, the explorer window lets me very quickly obtain information about indexes on the table, keys of these indexes, columns in the index keys, etc.). I like that the access plan graph shows me the optimizer estimates for the number of result set rows going into a step of the access plan, and for the number of rows coming out of that step. I REALLY like Data Studio's statistics advisor, which provides me with RUNSTATS utility control statements aimed at gathering statistics which could help the DB2 optimizer to choose a better-performing access path for the query I'm tuning (people at the IBM Support Center who spend their time working on query performance problems will tell you that MOST of the situations with which they deal could be resolved through updated and/or enriched catalog statistics). I am one who for years analyzed query performance by slogging through data retrieved from the PLAN_TABLE. I finally got wise and started leveraging the query tuning aids provided by Data Studio, and as a result I got more efficient -- and more effective -- in getting queries to run faster.
Submitting SQL statements. As I stayed for a long time with the old ways of query tuning before getting modern with Data Studio, so I also persisted in submitting SQL statements the old way: via SPUFI in a TSO/ISPF session. Know what got me into using Data Studio versus SPUFI for interactive SQL statement execution? XML. I was trying one day to retrieve data in an XML column in a DB2 for z/OS table using SPUFI, and got majorly frustrated in trying to get that to work. Thinking, "Why not?" I gave it a go using Data Studio, and it worked like a champ right off the bat. Not only did I get the XML data that I wanted using Data Studio, I also got it very nicely formatted, with different lines and different levels of indentation showing clearly the structure of the returned XML data. I've also gotten to where I appreciate what I can do with a multi-row query result set obtained via Data Studio -- an example is easy sorting of the result set rows by any column's values. I also like that Data Studio kind of looks over your shoulder as you enter a query's text, showing a little red X by a line of the query if you've left something out that would cause a syntax error -- I like seeing this and getting a chance to correct the text BEFORE submitting the statement for execution. I like that Data Studio keeps a record of statements you've submitted, making it very easy to re-submit a query if you want to. I could go on and on, but the bottom line is this: Data Studio is not just "SPUFI on a PC" when it comes to entering and executing SQL statements -- it's better than SPUFI.
Formatting SQL statements. Over the years I've done a fair amount of query analysis work in DB2 for z/OS-based data warehouse environments. Some of the queries you might be asked to look over in such an environment might take up several pages when printed out. If one of these humongous queries (lots of tables joined, lots of subquery predicates, lots of CASE expressions, etc.) is handed to you unformatted, you've got a big job on your hands just to make it readable -- a big job, that is, if you don't have Data Studio. If you do have Data Studio, you just paste the query into an SQL input window and ask Data Studio to format it for you. You then get indentations and line breaks that make the big hairy query easy to read, and that makes working with the query much easier.
Retrieving data from the DB2 catalog. Sure, you can do this by issuing queries against catalog tables, but it can be done a whole lot more quickly using Data Studio's data source explorer when you're connected to a DB2 for z/OS subsystem.
Developing SQL stored procedures. So-called native SQL procedures, introduced with DB2 9 for z/OS, are, in my opinion, the future of DB2 stored procedures. Developing, testing, and deploying native SQL procedures is a snap with Data Studio. I've done it, and so can you.

The above-listed Data Studio features are just the ones that I've used most extensively -- there is a LOT more that you can do with the tool, such as creating Web services for DB2 data access; creating, altering, and dropping tables and indexes and other database objects; generating DDL from objects in a database; generating diagrams that show database objects and relationships between those objects; and comparing one database object with another (e.g., the definition of table A as compared to the definition of table B). The best way to see what Data Studio can do for YOU is to get your hands on it and exercise the functionality yourself.

Hey, if you're like me -- a longtime mainframe DB2 professional -- then you've done most of the DB2 work you've needed to do over the years using the good old 3270 "green screen" interface. That interface will be with us for a long time to come, I'm sure, but for a number of DB2-related tasks a GUI is not just a different way -- it's a better way. Data Studio is your opportunity to put this assertion of mine to the test. Give it a try. I think you'll like what you find.

Wednesday, July 18, 2012

Quickly and Easily Clone DB2 Subsystems and Objects with DB2 Cloning Tool for z/OS

There are times when a clone of an existing DB2 for z/OS subsystem can be very handy for program testing purposes (or to speed up a production application process -- more on that momentarily). Suppose you want to clone one of your DB2 subsystems. What will you have to do? Some of the required steps -- identifying the subsystem to z/OS via PARMLIB updates, setting up RACF (or third-party equivalent) security, providing DB2 libraries for the subsystem (e.g., SDSNLOAD, SDSNEXIT), determining the volumes in the disk subsystem that will be used for the clone -- are pretty straightforward and not particularly time-consuming. Then there's the data set part. You have to initialize all the "system" data sets for the clone (catalog, directory, bootstrap data set, log data sets, etc.), and you have to deal with all of the "user" data sets (potentially, tens of thousands) associated with table spaces and indexes -- including any required data set renaming and alteration of object identifier information so that these IDs will line up with those in the clone's DB2 catalog. Ugh. You can grind through this data set tedium yourself, but there's a MUCH better way to get the job done: use IBM's DB2 Cloning Tool for z/OS to handle all the data set-related work, and get your clone established way faster (and without the risk of error linked to the manual approach).

DB2 Cloning Tool makes life easier for DBAs with respect to two main functions: cloning a DB2 subsystem in its entirety, and cloning individual objects or sets of objects (table spaces and indexes) by overlaying objects on a target subsystem with objects from a source subsystem (this object-level cloning functionality is sometimes referred to as "object refresh"). For the cloning of an entire subsystem, what has to be done ahead of time is the system stuff I mentioned above (PARMLIB update to identify the new subsystem to z/OS, RACF set-up, DB2 libraries, etc.). DB2 Cloning Tool takes it from there, doing the heavy lifting related to dealing with all the DB2 data sets. As noted previously, a key benefit of DB2 Cloning Tool utilization is speed. One of the ways in which DB2 Cloning Tool delivers here is through its ability to utilize ultra-fast, disk subsystem-based data replication technology, such as FlashCopy in IBM disk storage systems, and compatible technologies provided by other storage system vendors. FlashCopy replication of one volume's contents to another volume is near-instantaneous from a source system perspective, and on top of that you get some nice CPU savings on the source and target z/OS LPARs because the physical data copying work is accomplished by processors in the disk subsystem.

FlashCopy exploitation is great, but DB2 Cloning Tool is by no means limited to using that technology to accomplish a subsystem cloning operation. You can direct DB2 Cloning Tool to use any of a number of mechanisms to create a DB2 subsystem clone. For example, you could choose to go with dump/restore processing, or you could have DB2 Cloning Tool create a clone using your disk subsystem's remote mirroring functionality (e.g., IBM's Metro Mirror technology, formerly known as PPRC). DB2 Cloning Tool can also take a system-level backup generated by the IBM DB2 Recovery Expert for z/OS and use that to create a clone of a DB2 subsystem (I blogged about Recovery Expert earlier this year). DB2 Cloning Tool's flexibility with respect to the mechanisms used to establish a clone of a DB2 subsystem means that you can go with the approach that best suits a particular situation in your shop.

Another way in which DB2 Cloning Tool speeds things up is its proprietary process for renaming data sets, when that is required for a cloning operation. Far faster than IDCAMS RENAME, this process by itself can take a big chunk out of the time that would otherwise be required to clone a large DB2 subsystem.

Speed of execution is also enhanced through features that boost DBA productivity. This certainly comes into play when you want to use DB2 Cloning Tool to clone objects -- that is, to refresh a subset of the data in a cloned DB2 subsystem. Here, DBA work is accelerated through an interface that is very similar to the DB2 LISTDEF utility. The DB2 utility DSN1COPY, which can be used to perform object ID translation when copying a DB2 data set from one subsystem to another, is limited to one input data set per execution. By contrast, DB2 Cloning Tool allows "wild-carding" and INCLUDE/EXCLUDE specifications that make it easy to pull a whole set of objects into one data refresh operation. For example, you could refresh data in a target subsystem using all source system table spaces in database XYZ whose names begin with the characters 'EMP'. Want to include the indexes on those table spaces? No problem. Want to do a refresh using source table space ABC and all table spaces referentially related to it? Piece of cake.

And hey, DB2 Cloning Tool doesn't just make you more productive -- it also helps to keep you out of trouble by taking care of the storage system stuff (e.g., volume initialization) that needs doing when a DB2 subsystem is cloned. You might say that DB2 Cloning Tool understands database-speak (the interface is DBA-centric) and translates that as needed into data set and storage-speak. In addition to protecting you from mistakes, this "storage-aware" aspect of DB2 Cloning Tool also cuts way down on the time investment required of your colleagues in storage administration: you collaborate with these folks for the initial set-up of a DB2 subsystem clone, and then you just turn the crank using the DB2 Cloning Tool.

As for use cases, I mentioned in my opening paragraph that a DB2 subsystem clone is most often thought of in the context of application testing. Indeed, DB2 clones are very useful for that purpose, but don't think that this is the only way your organization can derive value from DB2 Cloning Tool. One company used the product to surmount what had been a vexing challenge in their production DB2 environment. The problem in this case concerned a report-generation application that took three hours to complete. That would be OK, except for the fact that the DB2 data on which the reports are based had to remain static during execution of the application, and at most there was a two-hour window during which the target DB2 tables could be in a read-only state. This organization hit on the idea of using DB2 Cloning Tool to create a clone of the DB2 subsystem, against which the reporting application could run. The clone is created by DB2 Cloning Tool in only 7 minutes (thanks to exploitation of fast replication technology, mentioned previously), and is "thrown away" once the report-generating application has completed (clone disposal is another DB2 Cloning Tool function -- done in a neat and orderly fashion, on request). Because the clone is a separate DB2 subsystem, database object names don't have to be changed, and that means no required program code changes for the reporting application. Because the clone subsystem is created so quickly, data-changing applications can get going almost two hours earlier than before. Everybody's happy.

Seems like a good point on which to close: when creating (and refreshing) a DB2 subsystem clone is so easy and so fast (and it can be when you use DB2 Cloning Tool), you can advantageously use DB2 clones in ways that might not otherwise have occurred to you. Food for thought.

Thursday, July 5, 2012

Proactive SQL Statement Tuning with IBM InfoSphere Optim Query Workload Tuner for DB2 for z/OS

SQL statement tuning -- most often in the form of query tuning -- is important when it comes to boosting the CPU efficiency of a DB2 for z/OS application workload. Sure, there are "system-level" adjustments that one can make to reduce the CPU consumption of a DB2 application workload (things like enlarging and/or page-fixing buffer pools, binding high-use packages with RELEASE(DEALLOCATE), setting up CICS-DB2 protected entry threads, and -- in a data sharing environment -- increasing the size of the global lock structure), and these definitely have their place as part of an overall DB2 performance tuning strategy; however, a given system-level change will often have a CPU overhead impact of just a few percentage points. For really dramatic improvements in CPU efficiency (and run times), one generally needs to turn one's focus to particular SQL statements. In some cases, I've seen statement-focused tuning actions reduce the CPU cost of a query by 80%, 90%, or more (with attendant reductions in elapsed time).

So, SQL statement tuning can be a very good thing. Trouble is, for many DBAs it's an interrupt-driven process: you're working away at something, and your work is interrupted by a phone call, or an instant message, or (oh, boy) someone standing at your desk to tell you about a query that is running too long. So, you drop what you were doing, and you analyze the problem query and take some action that (hopefully) brings execution time down to an acceptable level. Then, you get back to whatever task you had to temporarily suspend, and roll along until the next "fix this query!" disruption.

If you're tired of this reactive approach to query tuning -- tired (as we say in the USA) of chasing after horses that have gotten out of the barn -- then you should take a look at a tool that can put you in control -- a tool that can enable you to get out in front of problem queries and transform query tuning from disaster response to an ongoing process that delivers results you can easily document and communicate to the higher-ups in your organization. The tool I'm talking about is IBM's InfoSphere Optim Query Workload Tuner for DB2 for z/OS, and it can be the key to making your query tuning efforts proactive in nature.

One of the defining aspects of Query Workload Tuner is in the name: workload. With Query Workload Tuner, you can bring powerful tuning capabilities to bear on not just a single query, but a set of queries. We call that set a "query workload." What is it? It's what you need and want it to be. Query Workload Tuner makes it easy to define a set -- or sets -- of queries that matter to your company. Is it the top 25 dynamic SQL statements, ranked by CPU or elapsed time, pulled from the DB2 for z/OS dynamic statement cache? Top CPU-consuming queries -- whether static or dynamic -- as identified by IBM's DB2 Query Monitor for z/OS or another vendor's query monitor? The SQL statements in a DB2 package? The statements that access a particular database object? In any case, you tell Query Workload Tuner what you want to tune, and once you've defined a set of queries you can save the workload definition and re-generate the query set of interest when you want to. At plenty of sites, query tuning has become process-driven in exactly this way: a DBA tunes a specified query set, then periodically -- weekly, biweekly, monthly, or at some other interval that suits the needs of the organization -- regenerates the query workload and tunes it again. If that set is a "top n" group of queries, it could well change from interval to interval, as formerly high-cost queries fall out of the "top n" mix (a result of their running faster following tuning actions) and others bubble up.

To boost your SQL statement tuning productivity, Query Workload Tuner provides several "advisors" through which sets of queries can be run. Two of the most useful of these are the Statistics Advisor and the Index Advisor. The Statistics Advisor will generate RUNSTATS utility control statements that can be used to update DB2 catalog statistics so as to enable the DB2 optimizer to choose better-performing access paths for queries in the target workload (for example, it might be recommended that value-frequency statistics be gathered to provide DB2 with information about the distribution of duplicate values in a column or set of columns). The Index Advisor will suggest table-indexing changes aimed at reducing the CPU cost of a set of queries, thereby helping you to get the biggest bang for the buck with respect to new and modified indexes (maximizing the performance payback from new and/or expanded indexes is important, as indexes have associated costs and you don't want to incur these without a significant return on investment). A particularly attractive aspect of query tuning via updated statistics and indexing is the fact that performance benefits can be realized without having to change SQL statement coding. That's a big deal these days, as it's increasingly common to have a situation in which SQL statements cannot be modified, either because they are generated by a query/reporting tool or they are issued by purchased applications (e.g., a DB2-accessing ERP or CRM application).

Of course, when you're engaged in a proactive query workload tuning process it's nice to be able to see how things have changed for the better (and -- just as important -- to show other people in your organization how things have changed for the better). Query Workload Tuner delivers a big assist in this area via its Workload Access Plan Comparison feature. Using this capability, you can see the before-tuning and after-tuning change in cost for a query workload (the figures are estimated costs, but these are usually a good indicator of actual performance), along with details such changes in table join order, join method, index utilization, etc. And, Query Workload Tuner's Workload Access Plan Comparison feature isn't just a great way to track the progress of your ongoing query tuning efforts -- it's also very useful for zeroing in on access path changes resulting from a migration from one version of DB2 for z/OS to another.

One other thing: if it's been a while since you last looked at Query Workload Tuner, look again. This is a product that is being regularly enhanced by IBM. Version 3.1 of InfoSphere Optim Query Workload Tuner for DB2 for z/OS, delivered in the fall of 2011, provided the aforementioned Workload Access Plan Comparison functionality. More recently, with Version 3.1.1, the Query Workload Tuner Client was replaced by plug-ins that make IBM's Data Studio the end-user interface for the product.

So, consider what your DB2 for z/OS query tuning experiences are like now ("SMITH! Speed that query up NOW!"), and what you'd like them to be ("Nice work, Smith -- you've taken a lot of CPU time out of that application"). Query Workload Tuner can help you to get from A to B.

Friday, May 25, 2012

DB2 for z/OS Recovery: Getting it Right -- and Fast -- with Recovery Expert

Some people seem to be under the impression that recovery has drifted down, importance-wise, in the hierarchy of issues associated with running a DB2 for z/OS system. These folks might be thinking that, with the ever-increasing reliability of mainframe hardware, software, and storage systems, DB2 recovery operations just aren't required as often as they once were. On top of that, there's a feeling that point-in-time data recovery -- more complex than recover-to-currency -- is largely a thing of the past, owing to factors such as the increasing prevalence of round-the-clock transactional traffic accessing DB2 databases. Finally, there are the ongoing improvements in backup and recovery capabilities that are delivered with new releases of DB2 for z/OS, seen as mitigating some challenges that presented themselves in years past.

Well, these factors are indeed part of the current landscape: mainframe systems, long known for rock-solid reliability, are even more robust than before. Point-in-time data recovery is trickier when both online transactions and batch jobs update data in the same tables at the same time. DB2 for z/OS backup and recovery functionality has advanced in important ways in recent years. With all that said, however, I will tell you that recovery is as front-and-center in the world of DB2 for z/OS as it's ever been. Here's why:

The financial cost of downtime is escalating. It's an online world. Organizations' customers and clients demand access to data and services at all times. For some companies, the cost of data and application unavailability is measured in millions of dollars per hour.
The need for point-in-time data recovery is very much present in today's systems. Batch job input files still occasionally contain errors that have to be subsequently backed out of the database; furthermore, the iterative nature of application development common at many sites results in more-frequent database schema changes, and these at times need to be reversed.
With more options for DB2 for z/OS backup and recovery come more decisions. If there are multiple go-forward paths for a given recovery situation, which one is the best? If there are more recovery assets nowadays (e.g., system-level as well as object-level backups), how do you track and manage them?

Lots to think about. Fortunately, there is a tool available that can make you better at DB2 for z/OS recovery than you were before. As the voice-over from the beginning of the old television series, The Six Million Dollar Man, put it (yes, I'm dating myself here): "Better. Faster. Stronger." That tool is IBM's DB2 Recovery Expert for z/OS.

Recovery Expert delivers the greatest value when you need it most: 2:00 AM, data unavailable, application programs failing, phone ringing non-stop, escalation to upper management imminent. Pressure's on, cost of mistakes is high, time is absolutely of the essence. Do you even know what your recovery options are? Will you choose a good one? Are the required recovery assets in place (backups, log files, etc.)? Will you get the JCL right? Will you try to multi-thread some of the recovery processing? You can go with your gut and hope for a good outcome, or you can raise the odds of success by getting the advanced technology of Recovery Expert on your side. Fire up the GUI (browser-based starting with the recently announced 3.1 release) or the ISPF interface, point and click (or populate ISPF panel fields) to input information on the recovery task at hand, and let Recovery Expert do its thing. You'll quickly be presented with a list of alternative recovery procedures, with the best-performing option on top. Make the choice that's right for the situation (minimization of elapsed time might be your objective, or you might be focused on CPU consumption or some other aspect of the recovery operation), and leave it to Recovery Expert to build and execute the jobs that will get things done right the first time around. You even have the ability to specify a degree of parallelization for processes involved in the recovery operation. Speed and accuracy are significantly enhanced, and you get to tell the higher-ups that the problem is resolved, versus fighting through a storm of panic and stress.

Hey, and outside those 2:00 AM moments (which, I hope, are few and far between for you), Recovery Expert can deliver all kinds of value for your company. Want to reverse a database schema change without leaving behind all the associated DB2 stuff -- not only indexes and packages, but triggers, views, stored procedures, and authorizations -- that would be impacted by an unload/drop/re-create/re-load operation? Check. Want to restore an accidentally dropped table? Check. Want to accelerate the leveraging of DB2 10 features such as roll-backward recovery? Check. Want to quickly implement DB2 system-level backups to make your backup procedures much simpler and more CPU-efficient? Check. Want to assess available recovery assets to ensure that you have what you need to accomplish various recovery tasks? Check. Want to find "quiet times" in the DB2 log to which you can aim point-in-time recovery operations? Check. Want to accommodate application-based referential integrity in your recovery operations? Check. Want to translate a timestamp to which you want to restore data to the requisite DB2 log RBA or LRSN? Check. Want to create a backup that can be handed over to the IBM DB2 Cloning Tool for z/OS to clone a subsystem? Check. Want to simplify the bringing forward of your existing backup and recovery procedures as part of a migration to DB2 10 for z/OS? Check.

That's a lot of checks, and I haven't even gotten into all that's there. Believe me, Recovery Expert addresses so many needs associated with DB2 for z/OS recovery, once you start using it you may wind up wondering how you were able to get along without it. The more data you have, the more DB2 subsystems you have, the more complex and demanding your DB2 application environment is, the greater the Recovery Expert payoff. Get Recovery Expert, and get bionic with your DB2 recovery capabilities. "We have the technology..."

Thursday, May 3, 2012

Administering (and Changing) a DB2 for z/OS Database: The Right Tools for the Job

A lot of DB2 for z/OS DBAs are being asked to do more these days:

There are more database objects to look after (sometimes tens of thousands of tables, indexes, and table spaces in one DB2 subsystem)
There is a greater variety of database objects to manage -- not just tables and such, but stored procedures, triggers, user-defined functions, sequences, and more.
There are more options available with respect to object definition and alteration. Examples of these new options are hash organization of data in a table, DDL-specified XML schema validation, in-lining of LOB data in a base table, and data versioning by way of temporal data support).
There are new demands for physical database design changes. DB2 10 provides, among other things, a non-disruptive process for converting simple, segmented, and "classic" partitioned table spaces to universal table spaces.
The number of DB2 environments at many sites has gone way up, and by "environments" I don't necessarily mean DB2 subsystems (though some organizations have scores of these). Even if the number of DB2 subsystems at your company is in the single digits, in just one of those subsystems you could have multiple different development and/or test "environments," which might be in the form of different schemas (and these might contain the same tables, with slight variations in names or logical or physical design).
The data security situation is under more scrutiny than ever. In a given subsystem, who has what privileges on which objects? And what about all those new security objects (e.g., roles, trusted contexts, row permissions, column masks) and privileges (EXPLAIN, "system" DBADM, DATAACCESS, etc.) introduced with DB2 9 and DB2 10 for z/OS?
Documenting of database changes is a high priority. Management are increasingly demanding that a historical record of database changes be maintained.

And with all this extra stuff on DBAs' plates, is help on the way in the form of more arms and legs to get things done? Not likely: organizations are still running pretty lean and mean with respect to IT staffing, and DBA teams are generally not growing in proportion to the work they are expected to accomplish. What you need is a DBA accelerator -- something that enables a DB2 for z/OS administrator to do what he (or she) could do on his own, only faster. That accelerator is available in the form of IBM's DB2 Administration Tool for z/OS (which I sometimes call the DB2 Admin Tool) and its complement, DB2 Object Comparison Tool for z/OS. Together, these offerings provide the capabilities that enable a DB2 DBA to do his job more time-efficiently than ever, and with higher quality, to boot (you can avoid errors that could occur through efforts to get more done in less time simply by hurrying). Consider just some of the possibilities:

Accelerate the answering of questions about your mainframe DB2 environment. Sure, information needed to respond to most any question about a DB2 system can be found in the catalog, and you can always query those tables via SELECT statements. Or, use the Admin Tool's catalog navigation interface and get your answers faster.
Accelerate the issuance of DB2 commands and SQL statements. Rather than free-forming these, relying on your recollection of syntax (or having the SQL Reference or Command Reference handy), use the statement- and command-build assist functionality of the Admin Tool to get it right -- fast.
Accelerate the leveraging of DB2 TEMPLATE and LISTDEF functionality. TEMPLATE (for controlling the names and other characteristics of data sets that can be dynamically allocated by DB2 utilities such as COPY) and LISTDEF (used when you want a utility job to be executed for a group of objects, such as all table spaces in a given database) are productivity-boosting functions delivered with (as I recall) DB2 V7. TEMPLATEs and LISTDEFs can save you a lot of time when it comes to defining and managing DB2 utility jobs, but you have to set them up first. The DB2 Admin Tool helps you to do that -- fast.
Accelerate analysis of DROP and REVOKE actions and avoid "gotcha" outcomes. The DB2 Admin Tool will show you the impact of, for example, dropping an object or revoking a privilege -- BEFORE you do the deed.
Accelerate the copying of statistics from one DB2 catalog to another for SQL statement access path analysis. If you want some assurance that access paths seen for SQL statements in a test environment are those that you'd get for the statements in the production DB2 system, the statistics for the target objects in the test DB2 subsystem's catalog had better match those in the production catalog. I know from personal experience that manually migrating a set of statistics from one DB2 catalog to another is a chore. The DB2 Admin Tool makes this process easy -- and fast.
Accelerate tracking of DB2 database changes, and foster collaboration in the implementation of those changes. The DB2 Admin Tool stores information about database changes in a repository (a set of DB2 tables), from which data can be quickly retrieved when needed. Does anyone actually enjoy documenting database change plans and operations? I don't. Let the Admin Tool take the documentation load off of your shoulders.
Accelerate the restoration of a database to a previous state. Want to take a changed database back to a preexisting state (something that can be very useful in a test environment)? Easily done with the DB2 Object Comparison Tool.
Accelerate the generation of database comparison information that matters. Maybe you want to compare two different databases (in the same DB2 subsystem or in different subsystems), but you don't want the output of the comparison operation to be cluttered with information that doesn't matter to you. For example, you KNOW that the schema name of objects in database A is different from the schema name used for those objects in database B, or maybe you KNOW that primary and secondary space allocation specifications are different in the two databases. The DB2 Object Comparison Tool provides masking ("when you see schema name X in the source, translate that to Y in the target for purposes of this comparison") and "ignore" functions ("in comparing this source to that target, disregard differences in PRIQTY and SECQTY values") so that you can get the comparison result information that is important to you.
Accelerate the propagation of database changes made in (for example) a development or test environment to production. Database designs tend to evolve over time. For various reasons, an individual table might be split into two tables, or vice versa; a column might be added to or removed from a table; a table's row-ordering scheme might change from clustered to hash-organized. Following successful testing, these changes have to be reproduced in the production system. The DB2 Object Comparison Tool can generate the change actions (ALTER, DROP, CREATE, etc.) needed to bring a target environment in sync with a source environment, and those changes can be automatically applied to the target system.

That's a pretty good list of capabilities, but it's by no means complete. You can get more information via the products' respective Web pages (pointed to by the links near the beginning of this entry), and LOTS more information from the Administration Tool and Object Comparison Tool users guides (the "Product library" link on each product's Web page will take you to a page from which you can download a PDF version of the user's guide).

And the story keeps getting better. IBM is committed to the ongoing enhancement of the Company's tools for DB2 on the System z platform. An important new feature for the DB2 Administration Tool and the DB2 Object Comparison Tool that was recently delivered via PTFs (the associated APARs are PM49907 for the Admin Tool and PM49908 for the Object Comparison Tool) is a batch interface to the products' change management functions. With CM batch (as the new feature is known), users can set up and reuse batch jobs to drive change management processes -- an alternative to the ISPF panel interface that is particularly useful for database change-related actions that are performed on a regular basis. John Dembinski, a member of the IBM team that develops the DB2 Administration Tool for z/OS and the DB2 Object Comparison Tool for z/OS, has written a comprehensive article that describes in detail the capabilities and uses of CM batch. This article should be showing up quite soon on IBM's developerWorks Web site. When I get the direct link to the article I'll provide it via a comment to this blog post.

A closing thought. Something I really want you to understand about the DB2 Administration Tool for z/OS and the DB2 Object Comparison Tool for z/OS is this: the products are not training wheels for inexperienced mainframe DB2 DBAs. To say that these tools are DB2 administration training wheels would be like saying that co-polymer monofilament strings are tennis training wheels for Rafael Nadal. Those high-tech racquet strings didn't make Rafa a great tennis player. Rather, they make him even better at the game than he otherwise would be. Sure, if you're pretty new to DB2 for z/OS then the Admin Tool and the Object Comparison Tool can help you to get productive in a hurry. If, on the other hand, you're a mainframe DB2 veteran, the Admin Tool and the Object Comparison Tool can provide a multiplier effect that will enable you to leverage your knowledge and skills more extensively than you may have thought possible. Wherever you are on the spectrum of DB2 for z/OS experience, YOU are the reason you're good at what you do. With the DB2 Administration Tool and the DB2 Object Comparison Tool, you can become you-plus. Maybe even you-squared. Check 'em out, and get accelerated.

Thursday, April 19, 2012

Taming DB2 for z/OS Dynamic SQL with Optim pureQuery Runtime

There was a time, not too many years ago, when dynamic SQL was relatively uncommon in many DB2 for z/OS environments. DB2 systems programmers and DBAs insisted on the use of static SQL, and very often got their way.

Nowadays, dynamic SQL is pervasive at a great many mainframe DB2 sites. This is due to several factors, including:

An increase in the use of DB2 for z/OS for business analytics purposes. This is kind of a "back to the future" thing. DB2 for the mainframe platform was initially positioned, when introduced in the mid-1980s, as a DBMS intended for use with decision support applications (versus "run the business" transactional and batch applications). Many organizations decided that the major programmer productivity gains achievable with SQL and the relational data model more than offset the additional overhead of data access versus non-relational DBMSs and flat files, and before long "run-the-business" applications that exclusively utilized static SQL came to dominate the DB2 for z/OS scene. Those static SQL applications are still going strong, but business analytics is "coming home" to where the data is, and that brings with it a lot of dynamic queries.
The growing prevalence of client-server application development and deployment in mainframe DB2 environments. I'm talking here about the application activity that a DB2 person would refer to as "the DDF workload" or "the DRDA workload:" SQL statements issued from programs on network-attached application servers, typically by way of standard database-access interfaces such as JDBC (for Java) and CLI/ODBC (often used with C, C++, and C# programs). These interfaces are very popular with application developers because they are non-DBMS-specific (and so can be used with a variety of database management systems). JDBC or CLI/ODBC on the client side of things generally means dynamic SQL at the DB2 server.
Greater utilization of packaged applications with DB2 for z/OS. When I first started working with DB2 (in the Version 1 Release 2 time frame), virtually all application code accessing data on mainframe computers was custom-built, either by a company's in-house developers or by third-party programmers working on a contract basis. In recent years, packaged applications -- often used for enterprise resource planning (ERP) or customer relationship management (CRM) -- have become very commonplace, and organizations using these products frequently use DB2 for z/OS to manage the associated data. Vendors of packaged applications typically utilize standard, non-DBMS-specific data access interfaces, and, as mentioned in the item above, that usually involves execution of dynamic SQL statements on the DB2 server.

So, dynamic SQL is, for most mainframe DB2 DBAs and systems programmers these days, a part of the landscape. That doesn't change the fact that dynamic SQL can present DB2 people with several challenges, relative to static SQL. These include:

Access path instability. A static SQL statement's data access path is "locked in" at program bind time. For dynamic SQL, access path selection happens at statement execution time. That being the case, changes in DB2 catalog statistics or other factors in the execution environment could cause a dynamic SQL statement's access path to change from one execution of the statement to another -- not so good if you're looking for consistency of performance.
CPU overhead. Things like DB2 for z/OS dynamic statement caching can help to reduce the high cost of dynamic SQL statement preparation, but dynamic SQL will still consume more CPU cycles than equivalent static SQL statements.
Security. The DB2 authorization ID of a process issuing dynamic SQL statements has to have the DB2 privileges necessary for execution of the statements (e.g., SELECT, INSERT, UPDATE, and/or DELETE privileges on target tables). Data access security can be more tightly controlled when static SQL is used, as in that case the authorization ID of the SQL-issuing process requires only the EXECUTE privilege on the DB2 package associated with the static SQL statements -- direct table access privileges are not needed.
Problem resolution. Suppose a dynamic SQL statement is causing an application performance problem. What client-side program issued the statement? The package name associated with the statement is not likely to be of much help to you: most all the statements issued by way of (for example) the JDBC driver provided by IBM's DB2 clients (e.g., DB2 Connect or the IBM Data Server Driver) will use the same package (that being one of the packages associated with the DB2 client). Throw an object-relational framework such as Hibernate or .NET LINQ into the mix, and tracing a poorly-performing SQL statement back to the statement-issuing client-side program can be even more difficult.

If you have to deal with these headaches in your environment, you owe it to yourself (and your organization) to check out the relief provided by an IBM offering called InfoSphere Optim pureQuery Runtime for z/OS. pureQuery Runtime provides multiple features that can help you to improve the performance of, and get a better handle on, your client-server DB2 for z/OS dynamic query workload. In this blog entry I'm going to highlight my two favorite pureQuery features: dynamic-to-static SQL statement transformation, and enhanced development of data-access code for Java applications. More information on these and other pureQuery capabilities can be found via the Web page pointed to by the preceding hyperlink.

Dynamic-to-static SQL statement transformation

I'm using the word "transformation" somewhat loosely here. pureQuery doesn't change JDBC or CLI/ODBC calls in client-side source programs. Instead, via a feature called client optimization, pureQuery captures SQL statements issued by client programs and store these in a DB2-managed repository database (this does not disrupt execution of the programs -- copies of the JDBC or CLI/ODBC calls are stored in the repository, and the program-issued statements proceed to execution on the back-end DB2 for z/OS server). An administrative interface provided via an IBM Data Studio plug-in enables the binding of captured SQL statements into packages, and subsequently the statement-issuing programs can execute in "static" mode, meaning that when pureQuery recognizes a statement that was previously captured and bound, the static form of the statement in the server-side DB2 package will be invoked -- this instead of going down the dynamic SQL execution path.

Statement capture and static-for-dynamic statement substitution is performed by a piece of pureQuery code that runs in an application server where client-side programs execute (the other pureQuery code component is the aforementioned Data Studio plug-in). An application server administrator can turn pureQuery statement capture mode on for client-side programs (usually done at the data source level -- multiple data sources can map to one DB2 system), and can set execution mode to static for programs whose SQL statements have been captured and bound into a package or packages.

Another useful capability delivered by pureQuery client optimization is replacement of literal values coded in client-side SQL statements with parameter markers at run time. Using parameter markers instead of literal values can significantly boost the "hit ratio" for DB2's dynamic statement cache, thereby reducing the CPU overhead of dynamic SQL. Perhaps developers of client-side programs at your site don't follow this practice, or it may be that you have to deal with literal-containing dynamic SQL statements produced by a query tool or issued by a purchased application package. Whatever the reason for literals in client program SQL statements in your environment, pureQuery-enabled literal substitution can boost the CPU efficiency of your DB2 for z/OS DDF workload (and keep in mind that even though server-side literal substitution -- sometimes referred to as statement concentration -- is an option provided by DB2 10 for z/OS, host performance is likely to be optimized when literal replacement is accomplished on the client side of an application).

pureQuery's client optimization feature, which requires NOTHING in the way of client-side program changes (and can be used with both custom-developed and packaged applications, and with applications that use frameworks such as Hibernate), delivers the benefits you expect from static SQL: stable access paths, improved CPU efficiency, more-robust security (programs executing in static mode can be executed using an ID that has only the EXECUTE privilege on the associated package), and accelerated SQL statement performance problem resolution. Regarding that last point, finding the client-side program that issued a problematic SQL statement is simplified when you have the name of a package that is much more specific to the program in question than are the DB2 client packages used generically by JDBC- and CLI/ODBC-using applications. That's an advantage delivered for all programs for which pureScale client optimization is used. For Java applications, pinpointing the originating point of a client-server SQL statement is even easier: pureQuery provides you with the name of the Java class and the line number at which the statement can be located.

A key ingredient in pureQuery's remedy for relief of headaches brought on by dynamic SQL is the previously-mentioned statement repository, and that repository serves as more than an input for package bind operations. It's also a great resource for SQL statement analysis in tuning, and you can boost the effective leveraging of that resource with Data Studio (pureQuery statement repository information can be imported into Data Studio). You'll find not only statement text information, but associated metadata and some statistics, too (statement execution counts and elapsed time). pureQuery's statement repository information delivers even more value when you combine it with IBM's InfoSphere Optim Query Workload Tuner for performance tuning and with the end-to-end DB2 application performance monitoring capability provided by the Extended Insight feature of IBM's Tivoli OMEGAMON XE for DB2 Performance Expert on z/OS.

One more thing about pureQuery client optimization: I noted near the beginning of this entry that the rise of business analytics activity on mainframe DB2 systems has contributed to the current prevalence of dynamic SQL in DB2 for z/OS environments. You might think that business intelligence applications couldn't get a performance boost from pureQuery because all the queries associated with such apps are ad-hoc. In fact, it is often the case that many BI queries are NOT ad-hoc. So-called "operational" analytics can be characterized by a large volume of structured queries that are transactional in nature, mixed in with the more complex, customized queries submitted by "power" users. pureQuery might indeed be a CPU efficiency booster for transactional BI -- give this some thought in the context of your environment.

Enhanced development of data-access code for Java applications

I pointed out previously that pureQuery's client optimization capability requires nothing in the way of changes to existing client-side programs. For new applications written in Java, pureQuery can enhance programmer productivity via an API that can be exploited to not only generate code for data access (in DB2, Informix, and Oracle databases), but to generate performance-optimized code, to boot. For example: with JDBC, there's no concept of what in DB2 SQL we call a singleton SELECT (i.e., a SELECT INTO statement) -- you have to go with open cursor/fetch/close cursor, even if you know that a query's result set can only contain a single row (as would be the case if rows were qualified using an "equals" predicate referencing a unique key). In such situations, utilizing the pureQuery API will result in the generation of a static singleton SELECT statement, potentially saving significant resources on the DB2 server side of the application, especially for a frequently-executed transaction. The pureQuery API can also optimize SQL statement performance with multi-row INSERT batching and through a feature referred to as "heterogeneous batching" -- a mechanism by which multiple database update operations (which could target several different tables) are combined in one "batch," thereby reducing network operations and improving transaction elapsed time -- sometimes dramatically.

If you want to get the most out of the productivity and performance benefits associated with coding to the pureQuery API, it's strongly recommended that you use Data Studio and its plug-in interface to pureQuery. This plug-in enables Data Studio to leverage built-in Eclipse capabilities on which pureQuery's Java code generation capabilities rely. I've mentioned Data Studio several times in this entry, and with good reason: the combination of pureQuery and Data Studio (the latter is free and downloadable from the Web) can provide your organization with a Java-DB2 application development environment that will boost programmer productivity, speed query analysis and tuning, and foster increased collaboration between Java developers and DB2 DBAs. I can't overstate how important Data Studio is becoming in getting the most out of DB2 for z/OS and IBM DB2 tools. I'll have more to say on this point in future entries I'll post to this blog.

A closing thought: if you haven't looked at pureQuery lately...

Have you looked at pureQuery before? Was that before October of 2011? If so, you need to look again. The 3.1 release delivered in October of 2011 was really big, providing multiple enhancements such as:

Improved performance for pureQuery client optimization
Simplified administration for enabling and managing pureQuery client optimization
pureQuery API productivity enhancements (including automatic mapping of JOIN queries to structured Java objects and a pluggable formatter for handling data conversions such as date/time values)
pureQuery API deployment enhancements (including more control over package names used for Java Data Access Objects)

It could be that the capabilities of today's Optim pureQuery Runtime intersect nicely with your organization's needs and concerns around dynamic SQL in your DB2 for z/OS client-server application environment. Don't just shrug your shoulders in the face of a growing DB2 for z/OS dynamic SQL workload. Manage it. Optimize it. Analyze and tune it. Productively develop it. With pureQuery.

Monday, March 26, 2012

How Long Has it Been Since You Last Checked Out the Performance of IBM's DB2 for z/OS Utilities?

Here is an interesting situation that exists at a number of DB2 for z/OS sites: an organization is licensed for the IBM DB2 Utilities Suite for z/OS, but opts to use utilities from another vendor for various DB2 database administration tasks (common examples are data load, table space or index reorganization, and generation of catalog statistics used for query access path optimization). Why do companies pay twice (once to IBM and once to another software vendor) for some of their DB2 utilities? An oft-cited reason is performance, as in, "We run vendor XYZ's DB2 data load utility because it is more CPU-efficient than IBM's LOAD utility." If this is the case at your shop, I have a question for you: when was the last time that you actually measured the performance of the IBM DB2 utilities in your DB2 for z/OS environment? If it was before DB2 9, you ought to revisit this issue -- doing so could provide your employer with an opportunity for significant savings in the area of mainframe software expenditure.

DB2 9 delivered major advancements with respect to the CPU efficiency of the IBM DB2 utilities. Tests of the performance of the DB2 9 utilities versus their DB2 V8 counterparts showed the following:

For LOAD: CPU savings of 5-30%
For LOAD of a partition of a partitioned table space: CPU savings of 35%
For LOAD REPLACE of a partition of a table space with non-partitioned indexes, using a dummy input data set (a technique regularly used to quickly "empty out" a partition): CPU savings of up to 70%
For RUNSTATS INDEX: CPU savings of 30-50%
For REORG INDEX: CPU savings of 40-50%
For REORG TABLESPACE and REBUILD INDEX: CPU savings of 5-20%
For CHECK INDEX: CPU savings of 20-60%

Similar gains were seen for utility job elapsed times. Detailed information on a number of performance tests can be found in chapter 6 of the IBM "red book" titled, "DB2 9 for z/OS Performance Topics."

How were these CPU efficiency and run time improvements, available in DB2 9 conversion mode, achieved? There are three primary factors:

A block-level interface to the DB2 index manager. The most significant CPU and elapsed time reductions are associated with utilities that process index keys. Whereas previously a call to the index manager was necessary for each key involved in a utility execution, with DB2 9 (and 10) blocks of keys can be passed with an index manager call. The resulting reduction in required index manager calls reduced CPU overhead for index-intensive utilities.
More efficient generation of index keys. This comes into play especially for non-padded indexes with varying-length keys.
Exploitation of shared private storage. z/OS 1.7 provided a new type of virtual storage resource known as shared private storage. With the availability of this resource (configured via the HVSHARE parameter of the IEASYSxx member of PARMLIB), address spaces can register to use a Virtual Storage Object (VSO) that is created just for those address spaces (the VSO is thus part of those address spaces' virtual storage, but not part of other address spaces' virtual storage -- this in contrast to ECSA, a shared resource that is part of every address space). A DB2 9 (or 10) utility address space can use a VSO (which is located above the 2 GB "bar" in virtual storage) to exchange data rows with the DB2 database services address space (aka DBM1) in a way that does not require the use of z/OS cross-memory services. That, in turn, reduces the CPU cost of utility execution.

While the big splash in terms of IBM DB2 utility performance was made by DB2 9, DB2 10 added some utility performance enhancements of its own, to wit (and you can get the details in chapter 9 of the IBM red book, "DB2 10 for z/OS Performance Topics"):

COPY utility exploitation of FlashCopy functionality. DB2 10's COPY utility can utilize FlashCopy to make near-instantaneous (from a data availability perspective) backups of individual database objects (this capability is also available for inline image copies generated through the execution of other utilities, such as LOAD and REORG). The object-level, DB2-directed FlashCopy backup functionality provided with DB2 10 builds on the volume-level FlashCopy exploitation delivered via the BACKUP SYSTEM utility introduced with DB2 V8 (and significantly enhanced with DB2 9). Object-level backups created using FlashCopy can reduce host CPU consumption, particularly when used for larger objects.
zIIP exploitation for RUNSTATS. A major portion of the processing done by a RUNSTATS execution in a DB2 10 system can be offloaded to zIIP engines (the percentage of zIIP offload will vary based on the nature of the work done by RUNSTATS). As just about everyone knows by now, zIIP MIPS are less expensive than general-purpose-engine MIPS.
Page-level versus row-level RUNSTATS sampling. People commonly specify, on a RUNSTATS utility control statement, a sampling percentage -- this to reduce the CPU consumption of RUNSTATS execution. Basically, this told RUNSTATS to look at a percentage of rows in the table -- versus all rows -- in generating statistics for the catalog. The thing is, in looking at a relatively low percentage of a table space's rows (25% is the default SAMPLE value), RUNSTATS might have to read a large percentage of a table space's pages. The new sampling capability, specified via the TABLESAMPLE SYSTEM option, can result in RUNSTATS examining significantly fewer of a table space's pages during execution, thereby reducing CPU time for an execution of the utility.
Online REORG use of list prefetch for disorganized indexes. When a DB2 10 online REORG job reorganizes an index that is disorganized (and this is particularly applicable to a REORG INDEX job, or to a partition-level REORG TABLESPACE job when the associated table has non-partitioned indexes), it can use list prefetch to efficiently access the index leaf pages in logical order when unloading the index to the shadow data set. This technique avoids the large number of synchronous index pages reads that might otherwise be required for the index unload, with associated CPU and elapsed time benefits.
Exploitation of data buffering enhancements for BSAM files. By utilizing more buffers for BSAM data and page-fixing these buffers (exploiting enhancements delivered in z/OS 1.9 and 1.10), DB2 10 utilities using BSAM can run faster (COPY and UNLOAD) and use less CPU time (COPY, UNLOAD, LOAD, and RECOVER) than in previous-release DB2 systems.
Variable-blocked spanned record support for LOAD and UNLOAD. I blogged about this a few weeks ago. Big-time performance improvement for LOAD and UNLOAD of tables containing LOB (large object) data.
LOAD and UNLOAD of data in internal format. By eliminating column-level processing, this option (retrofit to DB2 9 via the fix for APAR PM19584) can substantially reduce CPU and elapsed time for some LOAD and UNLOAD jobs.
PRESORTED option for LOAD. This new option (also retrofit to DB2 9 via the aforementioned APAR PM19584) eliminates LOAD-related sort of input data when that data has already been sorted in clustering key sequence prior to utility execution. Bypassing that sort operation can result in reduced CPU and elapsed time for LOAD utility execution.

On top of all this, the IBM DB2 utilities, which by default use DFSORT for sort processing, benefit from performance enhancements delivered by the DFSORT development team. An example is the performance boost -- especially as it pertains to CPU consumption -- provided by the fix for DFSORT APAR PM18196.

So, having laid out all these recent IBM DB2 utility performance benefits, I have two questions for you:

How fast, and how CPU-efficient, do your DB2 utility operations need to be, and can IBM's DB2 utilities meet those requirements? As noted in the opening paragraph of this blog entry, if you haven't put IBM's DB2 utilities to the test in a DB2 9 or DB2 10 environment, and if performance has been the reason for your organization's use of other vendors' DB2 utility products, it's time to give the IBM utilities another look. You could end up helping to reduce your company's mainframe software costs (especially if your organization is already licensed for IBM's DB2 Utilities Suite for z/OS, which is often the case).
If IBM's DB2 utilities, even in a DB2 9 or DB2 10 environment, don't quite clear your performance bar, would they if you also had the IBM DB2 Sort product? I blogged about DB2 Sort a couple of weeks ago. DB2 Sort can take the performance of the IBM DB2 utilities to another level, and the added cost of DB2 Sort could still mean software cost savings for your organization in light of what your company may be paying for another vendor's DB2 utility products.

Let IBM's DB2 utilities perform for your company. Could be a good move on your part.

Sunday, March 4, 2012

Faster, More Efficient, More Resilient DB2 Utility Sorting with DB2 Sort

If you're a mainframe DB2 person, there's a good chance that you've heard of an IBM software offering called DB2 Sort. If your knowledge of DB2 Sort doesn't extend much beyond "I've heard of it," you may have this vague idea that the product speeds up IBM DB2 for z/OS utility sort processing. You'd be on target there, but utility speed-up is only part of the DB2 Sort story. On top of that, your thinking as to how DB2 Sort accelerates DB2 utility execution may be off the mark. With this blog entry I want to clear up a few misconceptions about DB2 Sort that I think exist to some extent within the DB2 for z/OS user community. Additionally, I'll review recent developments with respect to DB2 Sort in particular, and IBM DB2 utility performance in general.

On to clarifications.

Misconception #1: DB2 Sort reduces utility elapsed and CPU times by optimizing DB2's use of DFSORT. You probably know that, starting with Version 8, DB2 for z/OS exclusively uses IBM's DFSORT as the base for accomplishing utility-related sorts. Over time, DB2's use of DFSORT has been enhanced, but DB2 Sort is not part of that story. Instead, it is an alternative sorting tool for DB2 -- something that DB2 can call instead of DFSORT to execute sorts as needed for utility processing (the DB2 utilities that leverage DB2 Sort are the most sort-intensive in the DB2 Utilities Suite: REORG, LOAD, REBUILD INDEX, RUNSTATS, and CHECK DATA/INDEX/LOB). While DFSORT is an outstanding general-purpose sort product, DB2 Sort is truly specialized software -- highly tuned to the unique requirements of DB2 utility sort operations. When an organization's demands with regard to DB2 utility performance present a challenging target, DB2 Sort can provide the means of clearing the bar.

A little backstory: not too long ago, IBM decided to team with a long-established leader in sort technology to develop a product that would conform especially closely to the particular characteristics of sort processing in a DB2 utility context. Developers from this partner company worked with IBM DB2 utility developers to turn concept into reality, and DB2 Sort is the result of that collaboration -- a collaboration, by the way, that is ongoing.

Misconception #2: The efficiency impact of DB2 Sort is strictly a CPU thing. DB2 Sort does indeed have a positive impact on DB2 utility CPU consumption, but hardware resource utilization optimization goes beyond that effect. Take disk space, for example: having enough of this for sort work is important if a DB2 utility is to perform well (and indeed, if it is to run to completion), but as disk vendors don't give their wares away you'd prefer not to have to over-allocate space to get the results you want. DB2 Sort has an advanced dynamic allocation feature that delivers a "space on demand" capability, providing a right-sized disk resource to get a utility going, and compensating -- if needs be -- for extra requirements by allocating additional space as needed to keep utility processing on track.

Then there's system memory. When DB2 Sort is performing work for a utility, it is aware not only of what it's doing for that particular utility, but of what other concurrently executing utilities are doing -- and are planning on doing -- in terms of sort processing. DB2 Sort will balance sort-related utilization of system memory across batch address spaces to help avoid (for example) excessive paging activity that could otherwise negatively impact the performance of DB2 utilities (and application programs).

And while I'm on the topic of hardware utilization optimization, I'll remind you that boosting CPU efficiency is not just a matter of reducing overall CPU utilization (though DB2 Sort does that) -- it's also about shifting work from general-purpose mainframe processors to less-expensive zIIP engines. Use DB2 Sort with your DB2 utilities, and you'll see a greater degree of work-offload to zIIPs (assuming that your system has zIIP engines).

Misconception #3: DB2 Sort is just about DB2 utility performance. You want your DB2 utilities to finish faster, but you also want them to finish, period. DB2 Sort can boost the resiliency of DB2 utilities, allowing more utility jobs to run to completion. One way in which DB2 Sort delivers here is through retries of dynamic disk space allocation actions: if an allocation request fails, DB2 sort can wait a while (a couple of minutes, for example) and then retry the allocation action. If the initial allocation request failed due to a temporary tie-up of resources, the retried allocation could succeed. When more DB2 utility jobs complete successfully, the organization benefits, and DBAs on "pager duty" get more restful nights.

Recent developments on the DB2 Sort front: Over time, the list of DB2 tools that can utilize DB2 Sort for improved performance has expanded beyond the IBM DB2 Utilities Suite. Among the latest additions are DB2 High Performance Unload 4.1 (with the fix for APAR PM41087), the DB2 Utilities Enhancements Tool 2.2 (specifically, the LOAD presort feature), and DB2 Log Analysis Tool 3.3. What tools will be next in gaining DB2 Sort exploitability? Stay tuned.

DB2 Sort adds speed and CPU savings to DB2 utilities that were already getting faster and more efficient. DB2 9 for z/OS provided big-time performance improvements for a number of DB2 utilities, especially those that process index keys: CPU consumption for some RUNSTATS INDEX, REORG INDEX, and CHECK INDEX jobs could be 50% less in a DB2 9 versus a DB2 Version 8 environment. Other utilities such as LOAD, RECOVER INDEX, and REBUILD INDEX can also execute with substantially better CPU efficiency in a DB2 9 system (organizations going from DB2 V8 directly to DB2 10 will see these utility performance benefits in the DB2 10 environment). Considering these performance gains along with the boost that DB2 Sort can deliver might get you to thinking about some things.

Have you looked at the IBM DB2 utilities lately? You ought to. While you're at it, check out DB2 Sort. Could be a nice fit for your organization.

Wednesday, February 15, 2012

Dynamic Management of a DB2 for z/OS Client-Server Workload with IBM InfoSphere Optim Configuration Manager

I’ve been blogging about DB2 for z/OS for quite some time, beginning with my time as an independent DB2 consultant (http://catterallconsulting.blogspot.com/), and continuing with a blog I started when I rejoined IBM in 2010 (“Robert’s DB2 blog”). I am going to continue to post to “Robert’s DB2 blog,” but I’ve seen a need to let DB2 for z/OS people know about tools that can boost their productivity, extend their capabilities, and enable them to more effectively utilize mainframe DB2 technology to their employers’ advantage. I’ve decided to address that need with a new blog, and this is it. I hope that you’ll check in periodically to find information on IBM DB2 for z/OS tools that will make you better at what you do.

In this initial entry, I want to tell you about a new tool that enables dynamic and fine-grained management of a DB2 for z/OS client-server workload (i.e., a workload that involves the use of DRDA and DB2’s Distributed Data Facility): IBM InfoSphere Optim Configuration Manager V2.1 (the product’s full name is a bit of a mouthful, so I’ll refer to it hereafter as OCM). How can this be a new product, you ask, if it’s Version 2.1? Well, this is one of those happy stories in which a DB2 for z/OS-using organization came to IBM with a need, and IBM developed a tool to address the customer’s requirement. That was OCM Version 1.1. It was determined that the tool would be of value to the broader DB2 for z/OS user community, and OCM was enhanced and packaged accordingly. That’s Version 2.1, made available in the fourth quarter of last year (note that OCM can also be used with DB2 for Linux, UNIX, and Windows).

Why do I want to use this inaugural entry in my “DB2 for z/OS: Beyond the Engine” blog to shine a light on OCM? For two interrelated reasons: 1) at many mainframe DB2 sites, client-server is the fastest growing component of the overall DB2 for z/OS workload, and at more than a few sites it’s already the largest workload component -- bigger than batch-DB2, bigger than CICS-DB2 (this in terms of mainframe server CPU time associated with SQL statement execution -- refer to an entry on my DB2 blog, and check this out yourself for your environment); and 2) for plenty of DB2 for z/OS people, the level of control they have over the client-server workload is not what they want it to be. OCM changes that story.

First, consider workload management. There are still people out there in DB2 land who believe that there’s no such thing as fine-grained management of a client-server DB2 for z/OS workload -- they think that all DB2 DDF work runs at the same priority in the system. Of course, it’s been years since that was true. You can slice and dice a client-server DB2 workload based on a variety of identifiers, and assign priorities -- via service classes -- in a z/OS WLM policy accordingly (there’s a very good write-up on this in an IBM “red book” on DB2 for z/OS distributed functions). All well and good, but chances are the WLM set-up you have for your client-server DB2 applications assumes that these applications are behaving as they should. Suppose an application starts misbehaving in a way that causes it to hog mainframe DB2 server resources, to the detriment (performance-wise) of other DB2-accessing apps? Using OCM, you could create a rule that would dynamically re-map the misbehaving application to a different WLM service class to which a lower priority has been assigned. How could you do that? There are several scenarios, but one would involve changing the authorization ID associated with the rogue application, if auth ID is used to map an app to a WLM service class. How is that done? Easy: OCM allows you to have what are called managed clients (those being machines on which a piece of code called the Optim Data Tools Runtime Client has been installed), and for such a client an OCM administrator can override application-provided information such as the DB2 authorization ID (more on this override capability to come). After the application’s performance-degrading problem has been resolved, it can be dynamically switched back to it’s original WLM service class -- and these WLM-related adjustments require nothing in the way of client-side code changes.

If you’re running DB2 for z/OS in data sharing mode on a Parallel Sysplex mainframe cluster, OCM gives you even more options for minimizing the adverse impact of a misbehaving client-server application. Suppose the aforementioned reclassification (from a WLM perspective) of the problematic app is not sufficiently minimizing its negative impact on the performance of other DB2 applications? No problem: you can use OCM to dynamically isolate the resource-hogging application to a particular member (or subset of members) in the data sharing group (this OCM capability leverages the location alias functionality of DB2 for z/OS, about which I blogged a few weeks ago). After the problem has been resolved, you use OCM to re-enable the spreading of the application’s transactions across all members (or a larger subset of members) of the data sharing group -- and again, this shrinking and growing of the application’s DB2-side execution space, if you will, can be accomplished with no need for changes on the client end of things.

And the OCM application scenarios go on: Would it be handy to be able to easily redirect connection requests from a particular application server to a different DB2 for z/OS server (perhaps one that’s just been migrated to DB2 10, in order to see how the application behaves when accessing a DB2 10 system)? You can do that with OCM. Remember my mention of OCM’s ability to override application-provided identifying information (information that’s crucial in terms of mapping to WLM service classes, and important for DB2-side application monitoring)? Would it be nice to have a means of correcting such identifying information if it’s incorrectly specified on the client side? Check. Got a DB2 client-server app that’s using too many connections to a DB2 for z/OS server? Would you like to be able, from a central control point, to reduce the number of transports from that application to DB2 at the DB2 client (e.g., IBM Data Server Driver Package or DB2 Connect) data source level? Very do-able with OCM. Want something that can automatically track changes made to DB2 client and server configuration settings? With OCM, you got it.

Bottom line: OCM puts you in control of your DB2 for z/OS client-server workload in ways you weren’t before. And, we’re talking about control at a pretty fine-grained level, if you want it (e.g., by application identifier, authorization ID, client IP address, JDBC data source name, etc.) -- you aren’t limited to overly large-scope knobs and levers.

Something else to keep in mind: OCM is a pretty new tool. In this initial version, the focus is on Java applications. Expect support to be extended to other client types over time -- keep your eye on this space.

Want to know more? Check out the OCM Information Center on the Web. Want to see this thing? Watch a video of OCM being put through its paces.

As pointed out previously, client-server is becoming an ever-more substantial and mission-critical component of DB2 for z/OS workloads at sites all over the world -- your site included, most likely. Don’t fear that. Manage it -- with OCM.