Performance Tuning & Best Practices for Informatica Developer Version 1.

Informatica 8.6

Performance Tuning & Best Practices on Power Center 8.6

Table of Content

1. Introduction

2. Transformations & Best Practices

1)  Source qualifier

2)  Expression

3)  Lookup

4)  Sequence Generator

5)  Aggregator

6)  Filter

7)  Router

8)  Joiner

9)  Normalizer

10)  Sorter

11)  Rank

12)  Update Strategy

13)  External Procedure

14)  Store Procedure

15)  XML Source Qualifier

16)  Union

3. Performance Tuning

o  Source Query Tuning

o  Transformation Tuning

o  Memory Optimization

o  Session Tuning

o  Session Partitioning

·  Database Partitioning

·  Hash Partitioning

·  Key Range

·  Pass-through

·  Round-Robin

o  Pushdown optimization

o  Identifying & Eliminating Bottlenecks

§  Target Bottleneck

§  Source Bottlenecks

§  Transformations /Mapping Bottlenecks

§  Session Bottlenecks

1. Introduction

This document will address brief idea on performance tuning to optimize the performance of informatica in real time environment. Also it includes frequently used transformation detail and the best practices that can be used to optimize the performance. It also focus on the other areas like SQL/Database tuning, debugging techniques, parallel processing, pushdown optimization etc.

The primary objective of this document to help informatica developer while dealing with different scenarios in Informatica that helps to optimize the performance.

2. Transformations

Transformations are the Informatica repository objects that used to build the business logic according to which we can perform ETL.

Below is the list of frequently used Informatica transformations.

17)  Source qualifier

18)  Expression

19)  Lookup

20)  Sequence Generator

21)  Aggregator

22)  Filter

23)  Router

24)  Joiner

25)  Normalizer

26)  Sorter

27)  Rank

28)  Update Strategy

29)  External Procedure

30)  Store Procedure

31)  XML Source Qualifier

32)  Union

2.1 Source Qualifier

The Source Qualifier transformation is any data’s first entry point into a mapping. It is used to perform the following tasks:

·  Join data originating from the same source database

·  Filter records when the Informatica Server reads source data. (i.e. a SQL WHERE condition)

·  Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds an ORDER BY clause to the default SQL query.

·  Select only distinct values from the source. If you choose Select Distinct, the Informatica Server adds a SELECT DISTINCT statement to the default SQL query.

·  Create a custom query (SQL override) to issue a special SELECT statement for the Informatica Server to read source data.

The following figure shows joining two tables with one Source Qualifier transformation:

Best Practices

·  Only use SQL Overrides if there is a substantial performance gain or complexity decrease. SQL Overrides need to be maintained manually and any changes to the data structure will result in rewriting or modifying the SQL Override.

·  Do use the WHERE condition and SORTED ports in the Source Qualifier if possible, rather than adding a filter or a sorter transformation.

·  Delete unused ports / only connect what is used. Reducing the number of records used throughout the mapping provides better performance by minimizing the amount of data moved.

·  Tune source qualifier queries to return only the data you need.

·  Perform large lookups in the Source Qualifier instead of through a traditional lookup.

·  When applicable, generate the default SQL in the Source Qualifier and use the ‘Validate’ option to verify that the resulting SQL is valid.

2.2 Expression

The Expression transformation is a passive transformation, used to calculate values in a single row before you write to the target. For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.

Local variables can be used in Expression transformations and greatly enhance the capabilities of this transformation object.

Best Practices

·  Calculate once, use many times. Avoid calculating or testing the same value over and over. Calculate it once in an expression, and set a true/false flag. Within an expression, use variables to calculate a value used several times.

·  Create an anchor expression transformation that will map the source table to an intermediary transformation using the source column names. Do simple processes (LTRIM/RTRIM, string/numeric conversions, testing for NULL, etc.) in this transformation. This will enable an easier transition if the source table changes in the future.

·  Watch your data types. The engine will automatically convert compatible types. Sometimes conversion is excessive and happens on every transformation which slows the mapping.

·  Expression names should begin with "EXP" followed by descriptive words

·  Do not propagate ports out of an Expression transformation if they are not used in the mapping going forward.

·  Group input-outputs ports first, followed by variable ports and then by output ports. Incorrectly ordering the ports in an Expression transformation can lead to errors and/or inaccurate results.

·  If a reusable expression is being used to perform common calculations, consider using User Defined Functions (UDF). UDFs are a new feature in PowerCenter 8.x

2.3 Lookup Transformation

The Lookup transformation is used to look up data in a relational table, view, synonym, or flat file. When a lookup is used, the Informatica Server queries the lookup table based on the lookup ports in the transformation. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target.

One common error encounter when using the Lookup transformation deals with using the Informatica $Source and $Target variables for the relational connection needed for the lookup. When a bulk loader is used, the $Target variable is not valid and must be replaced with the proper connection. Likewise, care should be taken when using the $Source variable to ensure that the proper database is being queried for results.

Lookups are dealt with in greater detail below.

Best Practices

·  Avoid using the $Source and $Target variables in the Lookup Connection Information. Connection names have been set up to be generic across Production, Test. If possible, set the Connection Information in the Lookup transformation to one of these non-level specific connection names.

·  Set the connections in the session for ease of migration.

·  Do not include any more ports in the Lookup other than necessary. Reducing the amount of data processed provides better performance.

·  Avoid date time comparisons in lookup; replace with string.

·  Not all sources and targets treat strings with leading or trailing blanks the same. It may be necessary to RTRIM and LTRIM string data prior to using it in a Lookup.

·  Lookups on small tables (<10,000) records can be cached and use ODBC. Lookups on large tables should be cached As a general practice, do not use uncached lookups.

·  In place of lookups, tables can be joined in the source qualifier. However, often necessitates left joins, which can complicate source qualifiers. Weigh performance vs ease of maintenance when deciding between source qualifiers and lookups.

·  When you create more than one lookup condition, place conditions with an equal first in order to optimize lookup performance.

·  Where lookup data does not change frequently, consider using a persistent cache to improve performance. For example, validating state codes in the United States of America.

2.4 Sequence Generator

The Sequence Generator transformation generates numeric values and is used to create unique primary key values, replace missing primary keys, or cycle through a sequential range of numbers.

It contains two output ports that you can connect to one or more transformations.

When NEXTVAL is connected to the input port of another transformation, the Integration Service generates a sequence of numbers. When CURRVAL is connected to the input port of another transformation, the Integration Service generates the NEXTVAL value plus the Increment By value.

If you connect the CURRVAL port without connecting the NEXTVAL port, the Integration Service passes a constant value for each row.

The Sequence Generator transformation is unique among all transformations because you cannot add, edit, or delete the default ports, NEXTVAL and CURRVAL

Best Practices

·  Try to use reusable sequence generator than using separate sequence generator if you are using it for generating unique primary key.

2.5 Aggregator

The Aggregator transformation allows you to perform aggregate calculations, such as averages and sums. The Aggregator transformation is an active transformation, which means that it can change the number of rows that pass through it, in contrast to the passive Expression transformation.

Additionally, Informatica allows for Incremental aggregation. When this feature is used, the repository stores aggregate values so that the target table does not need to be queried during a mapping run.

Best Practices

·  Factor out aggregate function calls where possible. SUM(A) + SUM(B) can become SUM(A+B). Therefore, the server only searches through and groups the data once.

·  Do not use Aggregators for simple sorting; use the sorter transformation or the SORTED ports option of the Source Qualifier.

·  Minimize aggregate function calls by using “group by”.

·  Do place Aggregators as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.

· 

·  Wherever possible, sort incoming data to an Aggregated use the ‘Sorted input’ option to improve performance.

2.6 Filter Transformation

The Filter transformation allows you to filter rows in a mapping. You pass all the rows from a source transformation through the Filter transformation, and then enter a filter condition for the

transformation. All ports in a Filter transformation are input/output and only rows that meet the condition pass through the Filter transformation.

Best Practices

·  Place Filters as early in the mapping as possible, as they reduce the number of records being processed, thereby improving performance.

·  Use a Filter to screen rows that would be rejected by an update strategy. (Rejected rows from an update strategy are logged to the bad file, decreasing performance.)

·  If you have aggregator transformation in mapping, use filter before aggregation to avoid unnecessary aggregation.

If you need to test the same input data based on multiple conditions, consider using a Router transformation instead of creating multiple Filter transformations. When you use a Router transformation, the Integration Service processes incoming data only once. When you use multiple Filter transformations, the Integration Service processes incoming data for each transformation

2.7 Router Transformation

A Router transformation is used to conditionally test data and route records based upon that conditional test. It is similar to a Filter transformation because both transformations allow you to use a condition to test data, but a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition whereas a Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group.

Additionally, a Router transformation allows the programmer to test the same input data for multiple conditions – multiple Filter transformations would be needed to accomplish the same functionality.

If multiple routing conditions are needed, the Router transformation should be used in deference to multiple Filters as this is more readable and more efficient since each row need only be tested once.

Best Practices

·  Routers may not be the best choice if load order of the target(s) is important since it is not possible to control the load order of the legs from a router.

·  The target load method(s) must be carefully chosen when using routers, especially if the data is loading to the same target, in order to avoid table locks and ensure that the data is loaded in the correct order.

2.8 Joiner Transformation

The Joiner transformation joins two related heterogeneous sources residing in different locations or file systems. It can also join two tables from the same source. This is generally only done when trying to avoid outer joins in the Source Qualifiers.

The two input pipelines include a master pipeline and a detail pipeline or a master and a detail branch. The master pipeline ends at the Joiner transformation, while the detail pipeline continues to the target.

One common point of confusion concerns the join types available. Below table summarizes the join types available and their associated behavior:

Normal Join / The Informatica Server discards all rows of data from the master and detail source that do not match, based on the condition.
Master Outer Join / The Informatica Server keeps all rows of data from the detail source and the matching rows from the master source. It discards the unmatched rows from the master source. Null values are inserted in the data stream where needed.
Detail Outer Join / The Informatica Server keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source. Null values are inserted in the data stream where needed.
Full Outer Join / The Informatica Server keeps all rows of data from both the master and detail sources. Null values are inserted in the data stream where needed.

Best Practices

·  Whenever possible, perform joins in the database i.e. the source qualifier itself.

·  Whenever possible, sort incoming data to a Joiner transformation and use the ‘Sorted input’ option to improve performance.

·  To improve performance of an unsorted Joiner transformation, designate the source with fewer s as the ‘Master’.

2.8 Normalizer Transformation

The Normalizer transformation normalizes records from COBOL and other sources, allowing you to organize data in different formats. The Normalizer transformation is used primarily with COBOL sources, which are often stored in a de-normalized format. The OCCURS statement in a COBOL file nests multiple records of information in a single record. Using the Normalizer transformation, you break out repeated data within a record into separate records.

The Normalizer transformation can also be used with relational sources to create multiple rows from a single row of data.