Identify the track name here
Technical Solutions Using Oracle and SAS
Andrew Holdsworth, Director World and ISV Performance, Server Technologies, Oracle Corporation and Howard Plemmons, Senior Software Manager, Database Interface Development, SAS Institute
Introduction
This paper attempts to communicate to both Oracle and SAS users both product specific technical information and the nature of the Engineering relationships between the two companies. If also provides some case study information concerning Oracle and SAS questions along with performance considerations when using the two products.
Oracle and SAS Development Relationship
Oracle has been supporting SAS Institute directly out of the database development organization for just over two years now. What this means is that SAS developers have direct access to the database development organization for technical questions and advice, rapid bug escalation and early availability of software.
The goal of this activity is to optimize the SAS/Oracle stack for SAS analytical data processing and SAS applications. As this relationship matures it is hoped that the SAS analytical engine is able to generate more complex, efficient and scalable SQL for the Oracle Database Server to process.
Both Oracle and SAS are fully committed to working on engineering issues to improve the user productivity of the Oracle/SAS product stack. Both Oracle and SAS have and will continue to attend each other’s user conferences on joint platform to give and receive user feedback.
Oracle and SAS Support Matrix
SAS has supported Oracle data access for many years on many different platforms and OS/Oracle combinations. For the new release of SAS 9, GA 2003, Oracle release support will be Oracle 8.1.7 and up, Oracle 9i and Oracle 10G, if supported by Oracle, on the following platform/OS combinations:
Windows – NT, XP, 2000, W64
UNIX –
AIX 64 bit running OS 5L
HPUX 11i 64 bit
Solaris V8 64 bit
CA/True 64
HPUX Itanium
Mainframe -
MVS – ZOS and OS390
Midrange -
Alpha VMS
SAS provides information off their technical support web site that shows specific Oracle versions that have been tested with specific SAS releases. To view this support matrix go to the following link:
This web application provides you with release/platform lookup capabilities for SAS/ACCESS by DBMS by Platform. If you have any support issues you can contact SAS Technical Support.
Current Oracle Technology Adoption at SAS
- OCI programming
- Partitioning and Parallelism
- RAC implementation
- LOBS
Oracle Database 10g New Features Applicable to SAS
- Wide Table Selects
The feature is enabled automatically. It improves performance of large data set selects.
- New CBO algorithms and costing model
- LOBS performance improvements
- Data pump Export and Import Utilities
The utilities provide very high-speed bulk movement of data and metadata from one database to another.
- Server Manageability
SYSAUX tablespace
Enhanced Wait Model
Enhanced Database Time Model
End-to-End Application Tracing
Server-Generated Alerts
Automatic Workload Repository
- OLAP support for Analytic Applications
Grouped table outer join.
Increased number of aggregates per query
- Globalization and Unicode improvements
CLOB and NCLOB implicit conversions
Globalization Development kit
Unicode 3.2 support
- SQL improvements
Case-Insensitive and Accent-Insensitive Query and Sort
Enhanced CONNECT BY support
Expression Filter
SQL Regular Expressions
Row Timestamp
- .NET and 64 bit Windows support
SAS System Overview
Figure 1
At the core of the SAS System is the Intelligence Architecture, the foundation of the SAS Intelligence model as shown above in Figure 1. The Intelligence Architecture contains the technology to perform ETL, business intelligence and analytical intelligence, leveraging the manageability, interoperability, usability, and scalability of the SAS system. Over 60 products are included in the Intelligence Architecture and are fully integrated to offer robust and portable functions of the SAS system.
Below the Intelligence Architecture is the infrastructure layer that consists of ERP, DBMS, mainframes, and various data sources. Regardless of the infrastructure, SAS can complement the existing infrastructure and leverage the IT investment to optimize total cost of ownership. For example, if the infrastructure consists of Oracle 9i running on HPUX 11i 64 bit, SAS provides the technology to access the data using SAS/ACCESS Interface to Oracle product. In addition, SAS fully integrates and cleanses the Oracle data, preparing the data to be analyzed with SAS software.
As described above the SAS/ACCESS Interface to Oracle product, it is one of many components of the Intelligence Architecture. In order to apply and gain any intelligence, accessing the data is the first critical step. Once the SAS system is plugged into your infrastructure, the power of SAS can be optimized. Leveraging 27 years of SAS knowledge and experience, the Intelligence Architecture also delivers a comprehensive suite of analytical and business intelligence software. Bundled into the SAS solutions, the technology provides you with insight and the power to know your customers, organization and suppliers. SAS gives you the intelligence and “The Power to Know.”
SAS Intelligence Value Chain SAS 9
Figure 2
The SAS Intelligence Value Chain is a framework for delivering consistent, quality enterprise intelligence, emphasizing the incremental value that each component brings and the additional value gained from these components being integrated. The SAS/ACCESS to Oracle product fits into the ETL(q) link in the chain. The SAS/ACCESS engine provides extraction and loading services (EL components of ETL) for applications, products and solutions in SAS.
SAS/ACCESS Products
The Access Research and Development Division (ARD) at SAS are responsible for implementation and support of the DBMS access products that interface SAS software to an underlying DMBS. We have developed and supported the Oracle interface for many years in my group on many different platforms and OS. Some basic terminology concerning SAS/ACCESS libname engine development:
SAS/ACCESS to Oracle Libname Engine – The SAS Libname Statement can be issued via the SAS 4GL or by external SAS applications via communication paths to MVA. Once issued the libname statement (e.g. libname x oracle user=scott pass=tiger;) is processed by the libname engine then connection is made to the DBMS server and the engine waits for application requests. Requests received by the libname engine are processed against the
DBMS tables identified in the request. The engine provides different modes of ACCESS against the DBMS tables (e.g. select, insert, update, delete and direct DDL and SQL statement processing).
MVA – Multi-Vendor Architecture provides SAS with a framework for portability and I/O between layers of the SAS system. The libname engine architecture is driven by a SAS I/O model that is used by all SAS applications that need to access data within the MVA framework.
SAS I/O –This identifies a system at SAS that processes application request for data. This API provides applications with a consistent methodology to access data from any source supported by SAS.
SAS Options – provides user control of the SAS/ACCESS Interface to Oracle Libname Engine and other SAS Procedures and processes. These options provide a granularity of control that allows you great latitude in tuning the SAS system and applications, such as the Oracle Libname Engine.
SAS Procedure – a program that can be executed using the SAS 4GL Language. This program performs a desired set of operations depending on parameters passed in on the execute line or set as SAS system options. Many procedures have the capability of interacting with the Oracle DBMS using the SAS I/O model and the SAS/ACCESS Interface to Oracle Libname Engine.
SAS V9 – a release of SAS that contains threaded I/O functionality that has been added to the SAS I/O model. This allows SAS applications and the underlying engines to access data in parallel in both engine extract and application processing functions. Implementation of this model has resulted in significant performance increases
Key features/what’s new in SAS/ACCESS Interface to Oracle in SAS V9:
There are many key features and new features in the SAS/ACCESS Interface to Oracle in SAS 9. Note that most features are geared toward performance improvements, and information display concerning how we interact with the Oracle DBMS. A general knowledge of the SAS/ACCESS Interface to Oracle product is a necessary aid in understanding the impact of these features.
The following options and features allow you to control interaction of SAS with Oracle are:
- Threaded reads - divide resource-intensive tasks into multiple independent units of work and execute those units in parallel. This extends the SAS MVA I/O model and provides blocked data in parallel to many multi-threaded SAS applications.
- Temporary table support - enables DBMS temporary tables to persist from one SAS step to the next. This support involves establishing a SAS connection to the DBMS that persists across SAS procedures and DATA steps.
- The new SQL options MULTI_DATASRC_OPT= and DBMASTER= optimize the performance of the SQL procedure. More detailed information is available about passing joins to the DBMS, determining when joins will fail, and optimizing WHERE clauses passed to Oracle.
- The SASTRACE= system option now provides improved debugging capabilities. Many layers of debugging are now available that supply information that can be used to tune SAS and Oracle.
- DBMS metadata can be stored and processed using the SAS Open Metadata Repository. Information needed to drive the Oracle engine is stored in the repository. Using the repository the SAS user can define metadata that will be used by SAS vertical products and solutions.
- The MULTI_DATASRC_OPT= option in the LIBNAME statement can be used in place of the DBKEY= option to improve performance when you are processing a join between two data sources.
- The DIRECT_EXE= option in the LIBNAME statement enables you to pass an SQL statement directly to a database using explicit pass-through, when you are using PROC SQL with a libref.
- You now have the ability to encode the DBMS password that appears in SAS source code so that it does not appear as text in SAS programs.
- The CHANGE statement can be used to rename SAS/ACCESS tables.
- BL_PARFILE = data set option creates a file that contains the SQL*Loader parameters.
- DBSLICE= data set option specifies user-supplied WHERE clauses to partition a DBMS query into component queries for threaded reads.
- DBSLICEPARM= data set option and option in the LIBNAME statement controls the scope of DBMS threaded reads and the number of threads.
- OR_UPD_NOWHERE= is now a data set option, in addition to being an option in the LIBNAME statement.
The new SAS I/O model for providing parallel data to SAS 9 applications. New in SAS 9 and enhanced in subsequent SAS 9 releases. This process gives you the ability to scale up with SAS and Oracle.
With the data streaming back and processed on multiple threads within the SAS application has netted significant performance boosts as compared with the previous SAS I/O model. This technology will be generally available with the SAS 9 technology release in 2003.
Current Oracle Technology Adoption at SAS:
Currently SAS/Access Interface utilizes the following Oracle features:
- OCI programming
- Partitioning and parallelism
- RAC support
- LOBS
Oracle Database 10g Features and Optimizations
When using SAS/Access interface to Oracle 10g the best optimization should be considered. The following new technologies introduced in 10g could prove beneficial top many SAS applications running on Oracle:
- Wide Table Selects
- New CBO algorithms and costing model
- LOB improvements
- Data pump
Performance Issues and Concerns with Oracle and SAS
Joint Customer Concerns
SAS technical support processes many questions from SAS customers that use Oracle as their DBMS. These questions from our current customer base, along with SAS Technical support analysis of new SAS 9 features, should provide insight into what our joint customers see:
0Implicit vs. Explicit pass through –
Customer ability to pass critical components of the SQL statement to Oracle may be directly proportional to performance. Joint customers ask about when and how components are passed to Oracle and how they can control them. Explicit pass through is where you write the Oracle SQL that is to be sent to the DBMS server. This is Oracle SQL that is not interpreted or changed by SAS. It is simply passed to the DBMS for processing.
Implicit pass through is where the SAS SQL, procedure or datastep code is interpreted and DBMS SQL is generated on your behalf. Components of Implicit SQL where statement may be passed to the DBMS and components may be processed within SAS. How the user constructs the where criteria and how the user sets up and tunes DBMS components is critical.
1Performance issues –
“Who is to blame for my poor performance?” The idea here is not to attach blame but to help joint customers work through performance issues. The key to this is obtaining information about what is happening under the covers. To do this the SASTRACE option in SAS 9 was given a considerable face lift. It now can report details on SQL, time to execute, where the execution time is spent and other details. This information along with Oracle tools such as the explain process, detailed later in this paper, give clues on how to increase performance.
2How SAS accommodates new releases of Oracle –
Each new Oracle release drives support questions through SAS Technical support. The main question is do we support the new Oracle release? The rule of thumb here is yes. If there have been significant changes in the client server structure then a new SAS/ACCESS image will be provided. For example with SAS 9 we will only support 64 bit access to Oracle. Given that we will provide 64 bit shared library images so you can access Oracle 8.1.7.x, 9.i and 10G.
3Many options to choose from, which is best –
The SAS/ACCESS product in SAS 9 provides many options that allow users to control the interactions between SAS and Oracle. Many of these options are performance related and should be studied carefully. Some of the options you need to investigate are, threaded data access using DBSLICE and DBSLICEPARM, data buffering using READBUFF and tracing information using SASTRACE.
4SAS support for SAS 6.12, 8.2 and SAS 9 components –
Support of old SAS/ACCESS components continues in SAS 9. For example PROC ACCESS and PROC DBLOAD can still be used in SAS V9; however, they are not being recommended. Support for these components continues at a SAS V6 level. They will not be enhanced or modified.
How you would move forward involves investigation into PROC MIGRATE and PROC CV2VIEW with the use of VALIDVARNAME options to maintain compatibility to existing jobs. You should investigate to see what component is right for you as you convert to SAS 9.
5Masking passwords –
The ability to mask Oracle passwords in SAS Datastep code generates a good deal of customer concern. We have recommended $OPSYSID as a solution; however, with customers where this is not an option mutual customers are asked to consider:
- Permissions on SAS files that contain code with passwords. SAS V6, V8 and V9 feature.
- Storing libname statements in the SAS registry where passwords are encrypted. SAS V8 and V9 feature.
- Using PROC PWENCODE for encrypting SAS/ACCESS passwords and include them in the SAS Datastep and procedure code. This combined with SAS Metadata Repository password management helps secure DBMS passwords used in SAS. SAS V9 feature.
Performance Issues
Concerns when using Oracle and SAS can be alleviated with a little design and process consideration. The following four common topics and included SAS code snips show how you can increase performance. What is required is consideration of the process that must take place in Oracle and SAS then choosing the best course of action.
SQL Statement Not Written for Index
When it comes to putting together SQL that will process against the Oracle DBMS it is critical to form the SQL correctly. The following two SAS code snips detail what you might be doing vs. what you should be doing. The Oracle explain output is attached for comparison between the two SAS procedures. Note the execution difference between the first and second proc SQL is more than 4x with the second proc SQL running faster than the first.