Monday, February 17, 2014

Leverage DB2 Analytics Accelerator Technology More Completely with the IBM DB2 Analytics Accelerator Loader for z/OS

Yes, it's been a while since I last posted an entry to this blog - I've been up to my ears in other stuff. Now, however, I have something I really want to blog about, so I'm making the time to put pen to paper (figuratively speaking). Here's the elevator version, in case you're just going up a floor or two: if your organization has an IBM DB2 Analytics Accelerator, you need to be looking at the Analytics Accelerator Loader. If you don't have a DB2 Analytics Accelerator, consider how that technology could be made even more valuable to your company when paired with the Analytics Accelerator Loader.

OK, here's more information (if you're not on an elevator, or you're taking a long elevator ride). Lots of people know about the DB2 Analytics Accelerator: it's an appliance (server, software, and storage), based on Netezza technology, that is deeply integrated with a front-end DB2 for z/OS system and extends the capabilities of that system to enable execution of particularly complex and/or data-intensive queries with eye-popping speed. In case after case, you hear of queries executing 1000 or more times faster with a DB2 Analytics Accelerator on the floor. This consistency of outstanding performance led people to think, "Wow, the DB2 Analytics Accelerator is fantastic for queries that target tables in my DB2 for z/OS database. Wish I could achieve that kind of performance effect (and with the same attractive economics) for my non-DB2 data." With the availability of the Analytics Accelerator Loader, such thinking is no longer wishful.

The value proposition of the DB2 Analytics Accelerator Loader can be nicely summed up via two key use cases: getting non-DB2 data into an Analytics Accelerator, and non-disruptive loading of DB2 data into an Analytics Accelerator.

Getting non-DB2 data into a DB2 Analytics Accelerator. As previously mentioned, a DB2 Analytics Accelerator is front-ended by a DB2 for z/OS system (that's part of the beauty of the solution -- more on this momentarily). The front-end DB2 for z/OS system needs to know about any tables that have been "accelerated" (i.e., which can be found on the Analytics Accelerator), and that means that the "accelerated" data has to be in DB2 tables in the first place, right? Well, that used to be the case, and it caused some folks to hold off on putting data from sources other than DB2 for z/OS into an Analytics Accelerator. That thinking was understandable. Suppose, for example, that you had 100 million rows of data in table XYZ in a non-DB2 DBMS. To get that table's data into a DB2 Analytics Accelerator, you had to create the table on the DB2 for z/OS front end, load that table, and then copy that data (via unload and load) into the Analytics Accelerator. Maybe you didn't have the space in your mainframe disk subsystem for that data. Maybe you didn't want your DB2 for z/OS DBAs to have to manage an extra 100 million-row table on the front-end DB2 system. For whatever reason (or reasons), you may have ended up at, "Oh, well. I'll just leave the data in table XYZ where it is."

The Analytics Accelerator Loader can turn that "Oh, well" into "Oh, yeah!" How? By enabling this scenario: unload data from that hypothetical non-DB2 table XYZ, and load it directly into the DB2 Analytics Accelerator. The table ends up existing on the front-end DB2 for z/OS system only in a logical sense -- the table definition is there in the front-end system's DB2 catalog, but ALL of the table's data resides ONLY on the Analytics Accelerator. Is this a hard thing to do? NO. All that's required in terms of preparation for Analytics Accelerator loading is to get the data from the non-DB2 for z/OS source object unloaded into a flat file that's compatible with the IBM DB2 for z/OS LOAD utility (i.e., a flat file and the accompanying field specification information that indicates where fields start and end within a record). And THAT means that you can load non-DB2 data from all kinds of sources directly into a DB2 Analytics Accelerator: from VSAM, from IMS, from mainframe database management systems other than DB2, and from a wide variety of non-mainframe DBMSs.

So, why put non-DB2 data into a DB2 Analytics Accelerator? To get tremendous speed-up for queries accessing that data, sure, but also because the Accelerator is front-ended by a DB2 for z/OS system. Queries are directed to the front-end DB2 for z/OS system as they would be if the Accelerator weren't present, so nothing new is needed there (the front-end DB2 system takes care of routing queries to the Analytics Accelerator). AND you get the advantage of the best-of-breed security offered by DB2 for z/OS (the only way to access data in the Accelerator is through the DB2 front-end, and data security is managed at that level). AND you get the superior performance monitoring available with DB2 for z/OS (IBM's OMEGAMON for DB2 for z/OS provides DB2 Analytics Accelerator monitoring capabilities). AND you get to leverage the skills and abilities of the DB2 for z/OS system support team that you already have in place, without putting a lot of extra work on those folks (the Analytics Accelerator administrative interface is an add-on to Data Studio, with which many DB2 for z/OS people are familiar).

Non-disruptive loading of DB2 for z/OS data into a DB2 Analytics Accelerator. As appealing as that non-DB2-data angle is, it's not the whole of the Analytics Accelerator Loader story. For data that is already in a front-end DB2 for z/OS system, the Analytics Accelerator Loader enables non-disruptive loading of that data into a DB2 Analytics Accelerator. How so? Well, consider that the traditional DB2 Analytics Accelerator load operation involves unload from a source table on the DB2 for z/OS front end. If one wants the data unloaded from the front-end table to be time-consistent (i.e., not "fuzzy"), one needs to suspend updating of the front-end DB2 data while it's being unloaded. Enter the Analytics Accelerator Loader, and you have a great new option for loading data into a DB2 Analytics Accelerator: you can use an image copy of a DB2 for z/OS table space as input to a DB2 Analytics Accelerator load operation. That means zero impact on front-end objects. Not only that, but with the Analytics Accelerator Loader you can use an image copy together with the DB2 log to effect a point-in-time load of a table in a DB2 Analytics Accelerator, and that can be whatever point-in-time you want: after initial load from the input image copy data set, information from the DB2 log is used to roll the data in the table in the DB2 Analytics Accelerator forward to the desired point in time. And hey, there's more: you can use the Analytics Accelerator Loader to load data into a table in the DB2 Analytics Accelerator "as of current." Choose that option, and the Analytics Accelerator Loader will drive a new SHRLEVEL CHANGE (thus non-disruptive) FlashCopy (thus near-instantaneous) image copy that will then serve as input to the load of the associated table in the DB2 Analytics Accelerator.

The IBM DB2 Analytics Accelerator has proven itself -- across industries and around the world -- as a game-changing solution that greatly enhances the value (from a decision support perspective) of an organization's data assets. The Analytics Accelerator Loader, in turn, enhances the value of a DB2 Analytics Accelerator by 1) expanding its range of application beyond DB2 for z/OS-resident data and 2) increasing flexibility and decreasing impact when it comes to loading into an Analytics Accelerator data that already is resident in a DB2 for z/OS system. This is one of those situations in which one plus one equals a lot more than two. The Analytics Accelerator Loader could be a real winner for your company. Give it some thought.

Monday, January 14, 2013

Reproduce a Dynamic, Client-Server Production DB2 for z/OS Workload in Test with Optim Query Capture and Replay

Effective regression testing, and comparative application testing in general, depends largely on your ability to reproduce in a test environment a workload that closely mimics the production application workload of concern. Historically, the difficulty of reproducing a production workload in test has depended on the nature of that workload. Back when I started in IT (Ronald Reagan was in his first term as President of the USA), the focus of mainframe application performance testing efforts was often on batch workloads, and reproducing that type of production workload in test was not so tough: you had your input files, you had your batch jobs that were submitted in a certain sequence and with a certain timing through a job scheduling tool, you ran the jobs and measured the results against baseline data, and that was that.

Batch is still an important part of mainframe computing, but over time the emphasis at many DB2 for z/OS sites has shifted to transactional workloads; moreover, the nature of transactional workloads -- especially during the past 10 years or so -- has changed, with multi-tiered, network-attached, DRDA-using, client-server applications coming to the fore. On top of that, these modern transactional applications are much more dynamic than their forerunners. They are not driven by in-house staff performing clerical and data-entry functions via a "green screen" interface; instead, in many cases the end-users are external to the organization -- maybe consumers browsing online product catalogs and making purchase decisions, or perhaps employees of client companies in a business-to-business context, checking on orders or reviewing fulfillment history. If the end-users are internal to the organization, increasingly they are not clerical workers; rather, they are senior professionals, managers, and executives using analytics-oriented applications to improve speed and quality-of-outcome with respect to decision-making. The actions of these individuals, and the frequency and timing of their interactions with your DB2 for z/OS subsystem, are often hard to predict. For testing purposes, how do you get your arms around that?

And, getting your arms around that particular kind of application testing scenario is getting to be more and more important. If an environmental change (e.g., new system software releases) or an application modification is going to negatively impact performance from the end-user perspective, you REALLY want to catch that before the change goes into production. Elongated response time for in-house clerical staff is one thing, but poor performance affecting an external-to-the-organization end user can lead to lost business and, perhaps, long-term loss of customers (as the now-familiar adage goes, your competition is often just a click away). If performance degrades for an internal-use decision support application, likely as not it won't be a DBA getting calls from irate users -- it'll be directors and VPs and maybe your CIO getting calls from their peers on the business side of the organization.

In short, the challenge is tougher than it's been before, and the stakes are higher than they've been before. Gulp.

Fortunately, a recently announced and available IBM tool addresses this need very nicely. It's called IBM InfoSphere Optim Query Capture and Replay for DB2 on z/OS, and it came out just a couple of months ago. The fundamentals of what Optim Query Capture and Replay can do are spelled out in the product's name: it can capture a DDF application workload executing in your production DB2 for z/OS environment and enable the playing back of that captured workload in a test environment: the same SQL statements, with the same values, executed with the same volume and timing through the same number of connections to the DB2 subsystem.

This capture and replay capability by itself would come in very handy for things like regression testing, but that's not where the story ends. Suppose you want to see what would happen to response times if transaction volume were to increase by some amount? No problem: not only can Optim Query Capture and Replay play back a captured workload -- it can play it back at a higher speed; so, instead of, say, the 100 transactions per second seen in production for a client-server application workload, you could see how response times hold up at 150 transactions per second.

Speaking of response time, Optim Query Capture and Replay provides built-in reporting capabilities that help you to easily zero in on changes between baseline and replay test results.

What's more, Optim Query Capture and Replay can be used to invoke the IBM DB2 Cloning Tool to make a copy of a DB2 subsystem for testing purposes.

Oh, and I would be remiss if I failed to tell you that Optim Query Capture and Replay is not just about comparative workload testing. It's also a great tool for helping you to better understand a DB2 client-server application workload. Often, when it comes to these very dynamic, shape-shifting transactional applications, people want to get a better look at the trees in the forest. What SQL statements are being executed? What predicate values are being supplied by users? What columns are being retrieved? We are familiar with the idea of taking a "snapshot" of a database, but taking a snapshot (more accurately, a time slice), of a DDF workload seemed implausible -- until now. And why stop with just a better understanding of a client-server application workload? How about tuning it? The SQL statements in a workload captured by Optim Query Capture and Replay can be exported for analysis and tuning -- something you might do with a tool such as IBM's InfoSphere Optim Query Workload Tuner.

Now, all this good stuff would be less appealing if it came with too great a cost in terms of system overhead, so it's nice to know that Optim Query Capture and Replay has a pretty small footprint. This is true largely because the tool employs a "catch and throw" mechanism (more like "copy" than "catch") to send statements associated with a workload being captured to an external appliance, from which the workload can be replayed; thus, there is not a reliance on relatively expensive performance trace classes to get the statement-level data recorded by Optim Query Capture and Replay.

There you have it: a way to efficiently capture what may have appeared to you as an elusive workload, and to effectively use that captured workload for regression testing, "what if?" testing, and application SQL analysis. Check out Optim Query Capture and Replay, and get ready to go from, "My gut tells me..." to, "Here are the numbers." 

Monday, January 7, 2013

Get Hands-On with DB2 Automation Tool in Texas

First, my apologies for having let so much time go by since last posting to this blog. The fourth quarter of 2012 was a very busy time for me. The pace is a little less frenetic now, and I should be able to resume blogging here on a fairly regular basis (I hope to post an entry here within the next couple of days on a new and very cool tool that can be a big help in the area of DB2 for z/OS application testing).

Second, for those of you in the GREAT state of Texas (where I was born and raised), and particularly for people in the heart of The Lone Star State, I want to make you aware of an opportunity to get some hands-on time with the DB2 Automation Tool for z/OS (about which I blogged last year). IBM has partnered with the Heart of Texas DB2 Users Group (aka HOTDUG -- one of my favorite regional DB2 user group acronyms) to provide a free half-day of DB2 Automation Tool education and training. Show up, and you'll not only get the expected overview presentation -- you'll also get a demonstration on how to use the tool to set up profiles that can drive automated and intelligent DB2 for z/OS utility execution. AND you'll get to participate in a hands-on lab that will give you the opportunity to put the DB2 Automation Tool through its paces. AND you'll get breakfast and lunch. Sounds like a half-day well spent, to me.

You can RSVP for this event (appreciated, for food planning purposes, but not required for attendance) by sending an e-mail to my colleague Bill Houston (houstonb@us.ibm.com).

More information:

Date: Tuesday, January 22, 2013

Time: 9 AM to 12:00 PM

Location:
     IBM Executive Briefing Center
     Rio Grande Room (Building 904)
     11501 Burnet Road
     Austin, Texas  78758

Check it out. Get some food. Get some knowledge. Get your hands on a keyboard. 

Thursday, October 4, 2012

DB2 Utilities Enhancement Tool for z/OS: Easier, More Effective Management of Your DB2 Utility Jobs

John, a colleague of mine, was recently at a DB2 for z/OS site when a problem arose. Someone accidentally ran a LOAD utility with the REPLACE option on a table space that held a considerable amount of data. LOAD did what it is supposed to do when REPLACE is specified in the utility control statement: it cleared all the existing data from the table space and replaced those rows with records from the input data set. Unfortunately, the rows replaced -- and there were a lot of them -- were rows that the company wanted to keep in the table space. As the DB2 DBA team worked to recover the inadvertently discarded data (they were ultimately successful in doing so), John mentioned to the team leader that the problem could have been avoided via the IBM DB2 Utilities Enhancement Tool for z/OS (sometimes called UET). Needless to say, UET is now on that DB2 DBA team leader's "I want this" list.

John was referring in the aforementioned conversation to the syntax monitor function of Utilities Enhancement Tool -- one of several UET features that can enable you to more easily and effectively manage your IBM DB2 utility jobs. The syntax monitor can be used to examine control statements for utility jobs when they are submitted, allowing for prevention of job execution when options on the "no" list are specified (REPLACE, for example, in a LOAD utility statement) as well as addition of desired keywords to utility control statements when those are missing (one organization used UET to ensure that histogram statistics generation was requested for certain RUNSTATS jobs). Now, with respect to the LOAD REPLACE example I've used, you might be thinking that blanket interdiction of REPLACE for LOAD jobs would be overly burdensome in your environment -- perhaps there are times when you WANT to empty existing rows from a table space before adding new rows. Not a problem: Utilities Enhancement Tool allows you to apply a given utility syntax rule very broadly (e.g., at the DB2 subsystem level) or more narrowly (perhaps for a set of user IDs or job names -- you could, for example, set things up so that a utility keyword to be used on an exception basis could only be included in a job submitted by one of your senior DBAs). These rules and their application, by the way, are specified by way of an easy-to-create (and easy-to-modify) UET policy.

In addition to being able to being able to monitor the syntax of DB2 utility control statements, Utilities Enhancement Tool can monitor messages associated with utility job execution. With this capability, you can have UET issue a customized return code when a given message is encountered. That customized return code could be used to automatically trigger a responsive action through your site's automated operations system. Alternatively, operations personnel could be provided with response instructions specifically tied to utility return codes customized through Utilities Enhancement Tool (such an instruction might let operations folks know that a particular situation can be left alone until a DBA deals with it during daytime hours -- DBAs are a lot happier when they are not needlessly awakened in the middle of the night).

UET also delivers some extensions to the standard syntax of IBM DB2 utility control statements. For example, UET allows inclusion of the keyword PRESORT in a LOAD utility statement (an option that would not pass syntax muster in the absence of Utilities Enhancement Tool). When PRESORT is specified for a LOAD job, UET will invoke DFSORT (or, even better, DB2 Sort, if that product is installed on the system) to sort the input file in the target table's clustering key sequence. UET will then change the PRESORT keyword to the DB2 LOAD-recognized PRESORTED option, indicating to the utility that the input records are indeed sorted in clustering key sequence. LOAD will consequently execute its RELOAD and BUILD phases in parallel, and will avoid sorting the table's clustering index. The end result is a LOAD utility job that completes more quickly and consumes less CPU time than would be the case if the input file were not sorted in clustering key sequence.

There's more: when you execute the REORG utility with SHRLEVEL CHANGE to enable concurrent read/write access to the target table space's data, Utilities Enhancement Tool can automatically create the required mapping table for you. UET also provides a capability whereby you can have the tool cancel existing threads and block new threads that could prevent the successful completion of a utility operation.

UET provides you with a lot of useful information through its logging of interactions with DB2 utility jobs. Examining Utilities Enhancement Tool job log data will show you which policy rules were applied and which were not, and why. That knowledge, in turn, can help you to fine-tune your UET policy for a DB2 subsystem.

Of course, Utilities Enhancement Tool supports DB2 10 for z/OS. Like all of IBM's DB2 for z/OS tools, UET was ready for Version 10 when that release of DB2 became generally available a couple of years ago. Consider how UET could help you to introduce new utility keywords (maybe new in DB2 utilities, or just new for your organization) in your DB2 for z/OS environment in a low-risk, systematic, and managed way. Utilities Enhancement Tool can provide for you a new point of control for DB2 utility execution at your site, leading to greater efficiency and productivity, more timely exploitation of new utility control options, and fewer "oops" moments. Give UET a look, and think about how it could deliver for you.

Thursday, September 6, 2012

Intelligent Utility Scheduling with DB2 Automation Tool for z/OS

I have a few questions for you:
  • In your shop, are DB2 for z/OS database maintenance utilities such as RUNSTATS, COPY, and REORG executed on a periodic basis (e.g., once per week per table space) regardless of whether or not they actually need to be run for a given object?
  • Is it sometimes the case that DB2 utilities are NOT executed in your environment when they should be?
  • Do you spend more time than you'd like setting up and submitting DB2 utility jobs?
  • Are there higher-value activities to which you could attend if you weren't spending a lot of time looking after DB2 utility execution requirements?

If your answer to any of these questions would be, "yes," you ought to take a look at what the IBM DB2 Automation Tool for z/OS could do for you and your organization.

Here's the deal: mainframe DB2 DBAs are being asked to look after ever-larger databases (more and more objects, not just more data) and to keep these DB2 for z/OS systems in fighting trim with regard to CPU efficiency and in the expected lead spot when it comes to data availability. If these requirements are to be effectively addressed, certain DB2 utilities have to be run on a regular basis. Knowing this, DB2 for z/OS DBAs at plenty of sites have adopted a strategy of submitting RUNSTATS, COPY, and REORG jobs for table spaces based on what I'd call a calendar criterion. In other words, RUNSTATS, for example, might be executed once per month per table space. COPY might be executed once per week per table space for full backups, and once daily per table space for incremental backups between execution of full-backup jobs.

That mode of operation has the benefits of being simple and (generally) keeping things from "falling through the cracks," but it comes with some costs that your organization would probably like to avoid. Quite often, these costs hit you from the left and from the right -- that is, they are associated both with "not enough" and "too much" with respect to utility execution frequency:
  • On the "not enough" side, consider a once-per-month-per-table space approach to running the REORG utility. That might be OK for most of your table spaces, but you might have some tables with high insert activity (and not so much delete activity), and associated indexes on keys that are not continuously ascending (necessitating inserts of entries into the "middle" of these indexes). These indexes can become quite disorganized quite quickly, and that's not good for the performance of queries for which DB2 uses index scan access. Maybe once per month is not a sufficient REORG frequency for these table spaces (or separately for these indexes).
  • On the "too much" side, a common misstep is to execute RUNSTATS too frequently (an extreme case I've seen in the real world involved DAILY execution of RUNSTATS for a number of table spaces). Accurate catalog statistics are important for good query performance, but in many cases the data in a table space does not change significantly in a near-term time frame from a statistics perspective: row count is pretty stable, column cardinalities change little, high and low key values do not vary much, etc. In such cases, excessive execution of RUNSTATS burns CPU cycles without delivering performance benefits.
  • "Too much" also became more of an issue for REORG at some sites with the change introduced with DB2 9 for z/OS: when an online REORG of a subset of a partitioned table space's partitions is executed, any non-partitioned indexes (NPIs) defined on the underlying table will be reorganized in their entirety. That was a very good change from an availability perspective (it eliminated the so-called "BUILD2" phase of online REORG, which effectively blocked data access for a time), but it increased the system resource consumption (CPU cycles and disk space) associated with these REORG jobs. That, in turn, provided a new incentive for avoiding unnecessary REORGs.

Those are just a few examples, but you get the picture. The thing is, shifting to a purpose-driven utility execution strategy ("I need to run COPY for this table space because of recent update activity") versus a strictly calendar-based approach ("I'll execute COPY for this table space -- whether or not it's been updated -- because I haven't done that in X days") can take a lot of time if you go it alone -- and who has lots of time on his or her hands these days? The solution here is to not go it alone. Let DB2 Automation Tool assist you in getting to needs-based DB2 utility execution. With Automation Tool's easy-to-use ISPF interface, you can quickly define:
  • Object profiles -- sets of database objects (table spaces and/or indexes) for which you want to create utility execution action plans.
  • Utility profiles -- these designate the utilities you want to be executed, and how you want them to execute (referring to utility options).
  • Exception profiles -- the "smart" in smart utility execution. With DB2 Automation Tool, you have 180 different exceptions which you can use individually or in combination to determine when conditions warrant the execution of a utility for a database object. What's more, you can add greater sophistication to your smart utility execution criteria through formulas that you can implement with DB2 Automation Tool user exits and REXX EXECs
  • Job profiles -- where intention turns into action. Exception jobs can be automatically executed at regular intervals to evaluate objects, and utility jobs needed to address identified exceptions can be automatically submitted for execution.

Put it all together, and you get a DB2 utility execution mechanism that saves you time and saves your organization CPU and disk resources -- not only by avoiding unnecessary utility execution, but also through more effective maintenance of database objects and catalog statistics to help ensure consistently good DB2 application performance. Plus, with your own freed-up bandwidth you can more fully engage in things like application enablement and other activities that really deliver value to your company.

Something else to keep in mind: this is an IBM DB2 tool we're talking about, so of course it supports -- and exploits -- the latest version of DB2 for z/OS (that's of course DB2 10). Some examples of DB2 feature exploitation by DB2 Automation Tool are:
  • The DB2 administrative task scheduler. DB2 Automation tool makes it easier to utilize this DB2 capability and helps you to reduce associated time to value; furthermore, the DB2 administrative task scheduler offers an operationally flexible means of submitting utility jobs generated via DB2 Automation Tool.
  • DB2 10 autonomic statistics. DB2 Automation Tool provides an interface to the DB2 10 autonomic statistics stored procedures, which can be used to determine when statistics need to be collected for database objects, and to automate RUNSTATs execution. DB2 Automation Tool also provides an interface to manage maintenance windows for executing the RUNSTATS utility.
  • The new (with DB2 10) REORGCLUSTERSENS and REORGSCANACCESS columns of the SYSTABLESPACESTATS real-time statistics table in the DB2 catalog. These are among the exception criteria that can be used with DB2 Automation Tool to help determine when REORG should be executed for a table space.

Obviously, a high degree of integration with the DB2 "engine" is there. So, too, is integration with other IBM DB2 for z/OS tools. Want to generate an image copy of an object using a system-level DB2 backup generated with DB2 Recovery Expert for z/OS? Easily done with DB2 Automation Tool.

You have better things to do than deal with the scheduling of DB2 utilities, and your organization should get the benefits that come with need-based utility execution. Do yourself and your company a favor, and check out the IBM DB2 Automation Tool for z/OS. Smart utility execution is within your reach.

Tuesday, August 14, 2012

Got DB2 for z/OS? GET DATA STUDIO

If you work with DB2 for z/OS -- as a systems programmer, a DBA, an application developer, or in some other capacity -- you REALLY ought to be using IBM Data Studio. Why? Well, start with the fact that it's FREE and downloadable from the Web. Now, free is not such a big deal if the product in question is a fluffy piece of almost-nothing that delivers practically zilch in the way of useful functionality. In the case of Data Studio, free is a great thing, because the tool is loaded with features that can boost your productivity and effectiveness as a mainframe DB2 professional (in addition to DB2 for z/OS, Data Studio can be used with DB2 for Linux, UNIX, and Windows; DB2 for i, Informix, and several non-IBM relational database management systems). In this blog entry, I'll describe a few of the Data Studio capabilities that I most appreciate (and yes, I most definitely have it on my PC). I encourage you to get your own copy of Data Studio, and to explore its functionality to see how the tool can best deliver for you.

Before going further, a couple of preliminary information items: first, Data Studio as we know it today essentially made the scene in the fall of last year, with the delivery of Version 3.1 (the current release is Version 3.1.1). If you last looked at Data Studio prior to Version 3.1, look again. That was a big-time product upgrade.

Second, note that Data Studio comes in two forms: the administration client and the full client. There's a page on IBM's Web site that provides detailed information on the features of the administration client and the full client, but I'll tell you here that the major difference between the two Data Studio clients is the support for development of DB2-accessing Java code provided by the full client, and the ability you have with the full client to "shell-share" with other Eclipse-based tools, including a number of IBM's Rational and InfoSphere Optim products. Now, you might think, "Given that Data Studio is free, why wouldn't I just download the full client, and not bother with figuring out whether or not a capability I want is or isn't provided by the administration client?" In fact, going with the full client is a pretty good idea, as far as I'm concerned. One reason there are two clients is that the full client is about three times the size of the administration client. That can be important in terms of download time, IF you don't have access to a high-speed Internet connection. These days, lots of folks have high-speed Internet connections of 50+ or even 100+ Mbps, and when that's the case download time is less of a concern and getting the Data Studio full client is a good choice. If your download capacity is more limited, and if you don't need Java development support and/or shell-sharing capability with other Eclipse-based tools, the Data Studio administration client could be best for you.

OK, on now to my personal favorites among the capabilities provided by Data Studio:
  • SQL statement tuning. So much to like here. I like the access plan graph that I can get for a query. I like how I can quickly get information about an element in the access plan graph (for example, a table accessed by the query you're analyzing) just by hovering over it with my cursor. I like how I can get more information about that element in the access plan graph by clicking on it, and even more information via the handy explorer window I get for a clicked-on element of the access plan graph (if the element clicked on is a table, the explorer window lets me very quickly obtain information about indexes on the table, keys of these indexes, columns in the index keys, etc.). I like that the access plan graph shows me the optimizer estimates for the number of result set rows going into a step of the access plan, and for the number of rows coming out of that step. I REALLY like Data Studio's statistics advisor, which provides me with RUNSTATS utility control statements aimed at gathering statistics which could help the DB2 optimizer to choose a better-performing access path for the query I'm tuning (people at the IBM Support Center who spend their time working on query performance problems will tell you that MOST of the situations with which they deal could be resolved through updated and/or enriched catalog statistics). I am one who for years analyzed query performance by slogging through data retrieved from the PLAN_TABLE. I finally got wise and started leveraging the query tuning aids provided by Data Studio, and as a result I got more efficient -- and more effective -- in getting queries to run faster.
  • Submitting SQL statements. As I stayed for a long time with the old ways of query tuning before getting modern with Data Studio, so I also persisted in submitting SQL statements the old way: via SPUFI in a TSO/ISPF session. Know what got me into using Data Studio versus SPUFI for interactive SQL statement execution? XML. I was trying one day to retrieve data in an XML column in a DB2 for z/OS table using SPUFI, and got majorly frustrated in trying to get that to work. Thinking, "Why not?" I gave it a go using Data Studio, and it worked like a champ right off the bat. Not only did I get the XML data that I wanted using Data Studio, I also got it very nicely formatted, with different lines and different levels of indentation showing clearly the structure of the returned XML data. I've also gotten to where I appreciate what I can do with a multi-row query result set obtained via Data Studio -- an example is easy sorting of the result set rows by any column's values. I also like that Data Studio kind of looks over your shoulder as you enter a query's text, showing a little red X by a line of the query if you've left something out that would cause a syntax error -- I like seeing this and getting a chance to correct the text BEFORE submitting the statement for execution. I like that Data Studio keeps a record of statements you've submitted, making it very easy to re-submit a query if you want to. I could go on and on, but the bottom line is this: Data Studio is not just "SPUFI on a PC" when it comes to entering and executing SQL statements -- it's better than SPUFI.
  • Formatting SQL statements. Over the years I've done a fair amount of query analysis work in DB2 for z/OS-based data warehouse environments. Some of the queries you might be asked to look over in such an environment might take up several pages when printed out. If one of these humongous queries (lots of tables joined, lots of subquery predicates, lots of CASE expressions, etc.) is handed to you unformatted, you've got a big job on your hands just to make it readable -- a big job, that is, if you don't have Data Studio. If you do have Data Studio, you just paste the query into an SQL input window and ask Data Studio to format it for you. You then get indentations and line breaks that make the big hairy query easy to read, and that makes working with the query much easier.
  • Retrieving data from the DB2 catalog. Sure, you can do this by issuing queries against catalog tables, but it can be done a whole lot more quickly using Data Studio's data source explorer when you're connected to a DB2 for z/OS subsystem. 
  • Developing SQL stored procedures. So-called native SQL procedures, introduced with DB2 9 for z/OS, are, in my opinion, the future of DB2 stored procedures. Developing, testing, and deploying native SQL procedures is a snap with Data Studio. I've done it, and so can you.

The above-listed Data Studio features are just the ones that I've used most extensively -- there is a LOT more that you can do with the tool, such as creating Web services for DB2 data access; creating, altering, and dropping tables and indexes and other database objects; generating DDL from objects in a database; generating diagrams that show database objects and relationships between those objects; and comparing one database object with another (e.g., the definition of table A as compared to the definition of table B). The best way to see what Data Studio can do for YOU is to get your hands on it and exercise the functionality yourself.

Hey, if you're like me -- a longtime mainframe DB2 professional -- then you've done most of the DB2 work you've needed to do over the years using the good old 3270 "green screen" interface. That interface will be with us for a long time to come, I'm sure, but for a number of DB2-related tasks a GUI is not just a different way -- it's a better way. Data Studio is your opportunity to put this assertion of mine to the test. Give it a try. I think you'll like what you find.
    

Wednesday, July 18, 2012

Quickly and Easily Clone DB2 Subsystems and Objects with DB2 Cloning Tool for z/OS

There are times when a clone of an existing DB2 for z/OS subsystem can be very handy for program testing purposes (or to speed up a production application process -- more on that momentarily). Suppose you want to clone one of your DB2 subsystems. What will you have to do? Some of the required steps -- identifying the subsystem to z/OS via PARMLIB updates, setting up RACF (or third-party equivalent) security, providing DB2 libraries for the subsystem (e.g., SDSNLOAD, SDSNEXIT), determining the volumes in the disk subsystem that will be used for the clone -- are pretty straightforward and not particularly time-consuming. Then there's the data set part. You have to initialize all the "system" data sets for the clone (catalog, directory, bootstrap data set, log data sets, etc.), and you have to deal with all of the "user" data sets (potentially, tens of thousands) associated with table spaces and indexes -- including any required data set renaming and alteration of object identifier information so that these IDs will line up with those in the clone's DB2 catalog. Ugh. You can grind through this data set tedium yourself, but there's a MUCH better way to get the job done: use IBM's DB2 Cloning Tool for z/OS to handle all the data set-related work, and get your clone established way faster (and without the risk of error linked to the manual approach).

DB2 Cloning Tool makes life easier for DBAs with respect to two main functions: cloning a DB2 subsystem in its entirety, and cloning individual objects or sets of objects (table spaces and indexes) by overlaying objects on a target subsystem with objects from a source subsystem (this object-level cloning functionality is sometimes referred to as "object refresh"). For the cloning of an entire subsystem, what has to be done ahead of time is the system stuff I mentioned above (PARMLIB update to identify the new subsystem to z/OS, RACF set-up, DB2 libraries, etc.). DB2 Cloning Tool takes it from there, doing the heavy lifting related to dealing with all the DB2 data sets. As noted previously, a key benefit of DB2 Cloning Tool utilization is speed. One of the ways in which DB2 Cloning Tool delivers here is through its ability to utilize ultra-fast, disk subsystem-based data replication technology, such as FlashCopy in IBM disk storage systems, and compatible technologies provided by other storage system vendors. FlashCopy replication of one volume's contents to another volume is near-instantaneous from a source system perspective, and on top of that you get some nice CPU savings on the source and target z/OS LPARs because the physical data copying work is accomplished by processors in the disk subsystem.

FlashCopy exploitation is great, but DB2 Cloning Tool is by no means limited to using that technology to accomplish a subsystem cloning operation. You can direct DB2 Cloning Tool to use any of a number of mechanisms to create a DB2 subsystem clone. For example, you could choose to go with dump/restore processing, or you could have DB2 Cloning Tool create a clone using your disk subsystem's remote mirroring functionality (e.g., IBM's Metro Mirror technology, formerly known as PPRC). DB2 Cloning Tool can also take a system-level backup generated by the IBM DB2 Recovery Expert for z/OS and use that to create a clone of a DB2 subsystem (I blogged about Recovery Expert earlier this year). DB2 Cloning Tool's flexibility with respect to the mechanisms used to establish a clone of a DB2 subsystem means that you can go with the approach that best suits a particular situation in your shop.

Another way in which DB2 Cloning Tool speeds things up is its proprietary process for renaming data sets, when that is required for a cloning operation. Far faster than IDCAMS RENAME, this process by itself can take a big chunk out of the time that would otherwise be required to clone a large DB2 subsystem.

Speed of execution is also enhanced through features that boost DBA productivity. This certainly comes into play when you want to use DB2 Cloning Tool to clone objects -- that is, to refresh a subset of the data in a cloned DB2 subsystem. Here, DBA work is accelerated through an interface that is very similar to the DB2 LISTDEF utility. The DB2 utility DSN1COPY, which can be used to perform object ID translation when copying a DB2 data set from one subsystem to another, is limited to one input data set per execution. By contrast, DB2 Cloning Tool allows "wild-carding" and INCLUDE/EXCLUDE specifications that make it easy to pull a whole set of objects into one data refresh operation. For example, you could refresh data in a target subsystem using all source system table spaces in database XYZ whose names begin with the characters 'EMP'. Want to include the indexes on those table spaces? No problem. Want to do a refresh using source table space ABC and all table spaces referentially related to it? Piece of cake.

And hey, DB2 Cloning Tool doesn't just make you more productive -- it also helps to keep you out of trouble by taking care of the storage system stuff (e.g., volume initialization) that needs doing when a DB2 subsystem is cloned. You might say that DB2 Cloning Tool understands database-speak (the interface is DBA-centric) and translates that as needed into data set and storage-speak. In addition to protecting you from mistakes, this "storage-aware" aspect of DB2 Cloning Tool also cuts way down on the time investment required of your colleagues in storage administration: you collaborate with these folks for the initial set-up of a DB2 subsystem clone, and then you just turn the crank using the DB2 Cloning Tool.

As for use cases, I mentioned in my opening paragraph that a DB2 subsystem clone is most often thought of in the context of application testing. Indeed, DB2 clones are very useful for that purpose, but don't think that this is the only way your organization can derive value from DB2 Cloning Tool. One company used the product to surmount what had been a vexing challenge in their production DB2 environment. The problem in this case concerned a report-generation application that took three hours to complete. That would be OK, except for the fact that the DB2 data on which the reports are based had to remain static during execution of the application, and at most there was a two-hour window during which the target DB2 tables could be in a read-only state. This organization hit on the idea of using DB2 Cloning Tool to create a clone of the DB2 subsystem, against which the reporting application could run. The clone is created by DB2 Cloning Tool in only 7 minutes (thanks to exploitation of fast replication technology, mentioned previously), and is "thrown away" once the report-generating application has completed (clone disposal is another DB2 Cloning Tool function -- done in a neat and orderly fashion, on request). Because the clone is a separate DB2 subsystem, database object names don't have to be changed, and that means no required program code changes for the reporting application. Because the clone subsystem is created so quickly, data-changing applications can get going almost two hours earlier than before. Everybody's happy.

Seems like a good point on which to close: when creating (and refreshing) a DB2 subsystem clone is so easy and so fast (and it can be when you use DB2 Cloning Tool), you can advantageously use DB2 clones in ways that might not otherwise have occurred to you. Food for thought.