Thursday, April 19, 2012

Taming DB2 for z/OS Dynamic SQL with Optim pureQuery Runtime

There was a time, not too many years ago, when dynamic SQL was relatively uncommon in many DB2 for z/OS environments. DB2 systems programmers and DBAs insisted on the use of static SQL, and very often got their way.

Nowadays, dynamic SQL is pervasive at a great many mainframe DB2 sites. This is due to several factors, including:
  • An increase in the use of DB2 for z/OS for business analytics purposes. This is kind of a "back to the future" thing. DB2 for the mainframe platform was initially positioned, when introduced in the mid-1980s, as a DBMS intended for use with decision support applications (versus "run the business" transactional and batch applications). Many organizations decided that the major programmer productivity gains achievable with SQL and the relational data model more than offset the additional overhead of data access versus non-relational DBMSs and flat files, and before long "run-the-business" applications that exclusively utilized static SQL came to dominate the DB2 for z/OS scene. Those static SQL applications are still going strong, but business analytics is "coming home" to where the data is, and that brings with it a lot of dynamic queries.
  • The growing prevalence of client-server application development and deployment in mainframe DB2 environments. I'm talking here about the application activity that a DB2 person would refer to as "the DDF workload" or "the DRDA workload:" SQL statements issued from programs on network-attached application servers, typically by way of standard database-access interfaces such as JDBC (for Java) and CLI/ODBC (often used with C, C++, and C# programs). These interfaces are very popular with application developers because they are non-DBMS-specific (and so can be used with a variety of database management systems). JDBC or CLI/ODBC on the client side of things generally means dynamic SQL at the DB2 server.       
  • Greater utilization of packaged applications with DB2 for z/OS. When I first started working with DB2 (in the Version 1 Release 2 time frame), virtually all application code accessing data on mainframe computers was custom-built, either by a company's in-house developers or by third-party programmers working on a contract basis. In recent years, packaged applications -- often used for enterprise resource planning (ERP) or customer relationship management (CRM) -- have become very commonplace, and organizations using these products frequently use DB2 for z/OS to manage the associated data. Vendors of packaged applications typically utilize standard, non-DBMS-specific data access interfaces, and, as mentioned in the item above, that usually involves execution of dynamic SQL statements on the DB2 server.

So, dynamic SQL is, for most mainframe DB2 DBAs and systems programmers these days, a part of the landscape. That doesn't change the fact that dynamic SQL can present DB2 people with several challenges, relative to static SQL. These include:
  • Access path instability. A static SQL statement's data access path is "locked in" at program bind time. For dynamic SQL, access path selection happens at statement execution time. That being the case, changes in DB2 catalog statistics or other factors in the execution environment could cause a dynamic SQL statement's access path to change from one execution of the statement to another -- not so good if you're looking for consistency of performance.
  • CPU overhead. Things like DB2 for z/OS dynamic statement caching can help to reduce the high cost of dynamic SQL statement preparation, but dynamic SQL will still consume more CPU cycles than equivalent static SQL statements.
  • Security. The DB2 authorization ID of a process issuing dynamic SQL statements has to have the DB2 privileges necessary for execution of the statements (e.g., SELECT, INSERT, UPDATE, and/or DELETE privileges on target tables). Data access security can be more tightly controlled when static SQL is used, as in that case the authorization ID of the SQL-issuing process requires only the EXECUTE privilege on the DB2 package associated with the static SQL statements -- direct table access privileges are not needed.
  • Problem resolution. Suppose a dynamic SQL statement is causing an application performance problem. What client-side program issued the statement? The package name associated with the statement is not likely to be of much help to you: most all the statements issued by way of (for example) the JDBC driver provided by IBM's DB2 clients (e.g., DB2 Connect or the IBM Data Server Driver) will use the same package (that being one of the packages associated with the DB2 client). Throw an object-relational framework such as Hibernate or .NET LINQ into the mix, and tracing a poorly-performing SQL statement back to the statement-issuing client-side program can be even more difficult.

If you have to deal with these headaches in your environment, you owe it to yourself (and your organization) to check out the relief provided by an IBM offering called InfoSphere Optim pureQuery Runtime for z/OS. pureQuery Runtime provides multiple features that can help you to improve the performance of, and get a better handle on, your client-server DB2 for z/OS dynamic query workload. In this blog entry I'm going to highlight my two favorite pureQuery features: dynamic-to-static SQL statement transformation, and enhanced development of data-access code for Java applications. More information on these and other pureQuery capabilities can be found via the Web page pointed to by the preceding hyperlink.


Dynamic-to-static SQL statement transformation

I'm using the word "transformation" somewhat loosely here. pureQuery doesn't change JDBC or CLI/ODBC calls in client-side source programs. Instead, via a feature called client optimization, pureQuery captures SQL statements issued by client programs and store these in a DB2-managed repository database (this does not disrupt execution of the programs -- copies of the JDBC or CLI/ODBC calls are stored in the repository, and the program-issued statements proceed to execution on the back-end DB2 for z/OS server). An administrative interface provided via an IBM Data Studio plug-in enables the binding of captured SQL statements into packages, and subsequently the statement-issuing programs can execute in "static" mode, meaning that when pureQuery recognizes a statement that was previously captured and bound, the static form of the statement in the server-side DB2 package will be invoked -- this instead of going down the dynamic SQL execution path.

Statement capture and static-for-dynamic statement substitution is performed by a piece of pureQuery code that runs in an application server where client-side programs execute (the other pureQuery code component is the aforementioned Data Studio plug-in). An application server administrator can turn pureQuery statement capture mode on for client-side programs (usually done at the data source level -- multiple data sources can map to one DB2 system), and can set execution mode to static for programs whose SQL statements have been captured and bound into a package or packages.

Another useful capability delivered by pureQuery client optimization is replacement of literal values coded in client-side SQL statements with parameter markers at run time. Using parameter markers instead of literal values can significantly boost the "hit ratio" for DB2's dynamic statement cache, thereby reducing the CPU overhead of dynamic SQL. Perhaps developers of client-side programs at your site don't follow this practice, or it may be that you have to deal with literal-containing dynamic SQL statements produced by a query tool or issued by a purchased application package. Whatever the reason for literals in client program SQL statements in your environment, pureQuery-enabled literal substitution can boost the CPU efficiency of your DB2 for z/OS DDF workload (and keep in mind that even though server-side literal substitution -- sometimes referred to as statement concentration -- is an option provided by DB2 10 for z/OS, host performance is likely to be optimized when literal replacement is accomplished on the client side of an application).

pureQuery's client optimization feature, which requires NOTHING in the way of client-side program changes (and can be used with both custom-developed and packaged applications, and with applications that use frameworks such as Hibernate), delivers the benefits you expect from static SQL: stable access paths, improved CPU efficiency, more-robust security (programs executing in static mode can be executed using an ID that has only the EXECUTE privilege on the associated package), and accelerated SQL statement performance problem resolution. Regarding that last point, finding the client-side program that issued a problematic SQL statement is simplified when you have the name of a package that is much more specific to the program in question than are the DB2 client packages used generically by JDBC- and CLI/ODBC-using applications. That's an advantage delivered for all programs for which pureScale client optimization is used. For Java applications, pinpointing the originating point of a client-server SQL statement is even easier: pureQuery provides you with the name of the Java class and the line number at which the statement can be located.

A key ingredient in pureQuery's remedy for relief of headaches brought on by dynamic SQL is the previously-mentioned statement repository, and that repository serves as more than an input for package bind operations. It's also a great resource for SQL statement analysis in tuning, and you can boost the effective leveraging of that resource with Data Studio (pureQuery statement repository information can be imported into Data Studio). You'll find not only statement text information, but associated metadata and some statistics, too (statement execution counts and elapsed time). pureQuery's statement repository information delivers even more value when you combine it with IBM's InfoSphere Optim Query Workload Tuner for performance tuning and with the end-to-end DB2 application performance monitoring capability provided by the Extended Insight feature of IBM's Tivoli OMEGAMON XE for DB2 Performance Expert on z/OS.

One more thing about pureQuery client optimization: I noted near the beginning of this entry that the rise of business analytics activity on mainframe DB2 systems has contributed to the current prevalence of dynamic SQL in DB2 for z/OS environments. You might think that business intelligence applications couldn't get a performance boost from pureQuery because all the queries associated with such apps are ad-hoc. In fact, it is often the case that many BI queries are NOT ad-hoc. So-called "operational" analytics can be characterized by a large volume of structured queries that are transactional in nature, mixed in with the more complex, customized queries submitted by "power" users. pureQuery might indeed be a CPU efficiency booster for transactional BI -- give this some thought in the context of your environment.


Enhanced development of data-access code for Java applications

I pointed out previously that pureQuery's client optimization capability requires nothing in the way of changes to existing client-side programs. For new applications written in Java, pureQuery can enhance programmer productivity via an API that can be exploited to not only generate code for data access (in DB2, Informix, and Oracle databases), but to generate performance-optimized code, to boot. For example: with JDBC, there's no concept of what in DB2 SQL we call a singleton SELECT (i.e., a SELECT INTO statement) -- you have to go with open cursor/fetch/close cursor, even if you know that a query's result set can only contain a single row (as would be the case if rows were qualified using an "equals" predicate referencing a unique key). In such situations, utilizing the pureQuery API will result in the generation of a static singleton SELECT statement, potentially saving significant resources on the DB2 server side of the application, especially for a frequently-executed transaction. The pureQuery API can also optimize SQL statement performance with multi-row INSERT batching and through a feature referred to as "heterogeneous batching" -- a mechanism by which multiple database update operations (which could target several different tables) are combined in one "batch," thereby reducing network operations and improving transaction elapsed time -- sometimes dramatically.

If you want to get the most out of the productivity and performance benefits associated with coding to the pureQuery API, it's strongly recommended that you use Data Studio and its plug-in interface to pureQuery. This plug-in enables Data Studio to leverage built-in Eclipse capabilities on which pureQuery's Java code generation capabilities rely. I've mentioned Data Studio several times in this entry, and with good reason: the combination of pureQuery and Data Studio (the latter is free and downloadable from the Web) can provide your organization with a Java-DB2 application development environment that will boost programmer productivity, speed query analysis and tuning, and foster increased collaboration between Java developers and DB2 DBAs. I can't overstate how important Data Studio is becoming in getting the most out of DB2 for z/OS and IBM DB2 tools. I'll have more to say on this point in future entries I'll post to this blog.


A closing thought: if you haven't looked at pureQuery lately...

Have you looked at pureQuery before? Was that before October of 2011? If so, you need to look again. The 3.1 release delivered in October of 2011 was really big, providing multiple enhancements such as:
  • Improved performance for pureQuery client optimization
  • Simplified administration for enabling and managing pureQuery client optimization
  • pureQuery API productivity enhancements (including automatic mapping of JOIN queries to structured Java objects and a pluggable formatter for handling data conversions such as date/time values)
  • pureQuery API deployment enhancements (including more control over package names used for Java Data Access Objects)

It could be that the capabilities of today's Optim pureQuery Runtime intersect nicely with your organization's needs and concerns around dynamic SQL in your DB2 for z/OS client-server application environment. Don't just shrug your shoulders in the face of a growing DB2 for z/OS dynamic SQL workload. Manage it. Optimize it. Analyze and tune it. Productively develop it. With pureQuery.