Introduction to Informatica
What is Informatica?
Informatica is a tool, supporting all the steps of Extraction, Transformation and Load process. Now a days Informatica is
also being used as an Integration tool.
Informatica is an easy to use tool. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load. These process flow diagrams are known as mappings. Once a mapping is made, it can be scheduled to run as and when required. In the background Informatica server takes care of fetching data from source, transforming it, & loading it to the target systems/databases.
Informatica can communicate with all major data sources (mainframe/RDBMS/Flat Files/XML/VSM/SAP etc), can move/transform data between them. It can move huge volumes of data in a very effective way, many a times better than even bespoke programs written for specific data movement only. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively join data from two distinct data sources (even a xml file can be joined with a relational table). In all, Informatica has got the ability to effectively integrate heterogeneous data sources & converting raw data into useful information.
In short, Informatica is worlds leading ETL tool & its rapidly acquiring market as an Enterprise Integration Platform.
Informatica Software Architecture
Informatica Software Architecture illustrated
Informatica ETL product, known as Informatica Power Center consists of 3 main components.
Informatica Software Architecture
Informatica Software Architecture illustrated
Informatica ETL product, known as Informatica Power Center consists of 3 main components.
1. Informatica PowerCenter Client Tools:
These are the development tools installed at developer end. These tools enable a developer to
- Define transformation process, known as mapping. (Designer)
- Define run-time properties for a mapping, known as sessions (Workflow Manager)
- Monitor execution of sessions (Workflow Monitor)
- Manage repository, useful for administrators (Repository Manager)
- Report Metadata (Metadata Reporter)
2. Informatica PowerCenter Repository:
Repository is the heart of Informatica tools. Repository is a kind of data inventory where all the data related
to mappings, sources, targets etc is kept. This is the place where all the metadata for your application is stored.
All the client tools and Informatica Server fetch data from Repository. Informatica client and server without repository
is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process.
This can be treated as backend of Informatica.
3. Informatica PowerCenter Server:Server is the place, where all the executions take place. Server makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system. |
This architecture is visually explained in diagram below:
Standard: RDBMS, Flat Files, XML, ODBC Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2 EAI: MQ Series, Tibco, JMS, Web Services Legacy: Mainframes (DB2, VSAM, IMS, IDMS, Adabas)AS400 (DB2, Flat File) Remote Sources |
Standard: RDBMS, Flat Files, XML, ODBC Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2 EAI: MQ Series, Tibco, JMS, Web Services Legacy: Mainframes (DB2)AS400 (DB2) Remote Targets |
This is the sufficient knowledge to start with Informatica. So lets go straight to development in Informatica.
Informatica Transformations
In Informatica, Transformations help to transform the source
data according to the requirements of target system and it ensure the
quality of the data being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.
Transformations can be Connected or UnConnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.
List of Transformations
Following are the list of Transformations available in
PowerCenter:
Active Transformation
An active transformation can change the number of rows that pass through it from source to target i.e it eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it i.e it passes all rows through the transformation.
Transformations can be Connected or UnConnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is called within another transformation, and returns a value to that transformation.
List of Transformations
Following are the list of Transformations available in
PowerCenter:
- Aggregator Transformation
- Expression Transformation
- Filter Transformation
- Joiner Transformation
- Lookup Transformation
- Normalizer Transformation
- Rank Transformation
- Router Transformation
- Sequence Generator Transformation
- Stored Procedure Transformation
- Sorter Transformation
- Update Strategy Transformation
- XML Source Qualifier Transformation
- Advanced External Procedure Transformation
- External Transformation
- Union Transformation
In the following pages, we will discuss in detail, all the above
Informatica Transformations and their significances in the ETL process
in detail.
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages
and sums (mainly to perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average
of monthly or yearly sales. Aggregate functions such as AVG, FIRST,
COUNT, PERCENTILE, MAX, SUM etc. can be used in aggregate
transformation.
Expression Transformation
Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.
Expression Transformation
Expression transformation is a Passive and Connected transformation. This can be used to calculate values in a single row before writing to the target. For example, to calculate discount of each product or to concatenate first and last names or to convert date to a string field.
Filter Transformation
Filter transformation is an Active and Connected transformation. This
can be used to filter rows in a mapping that do not meet the condition.
For example, to know all the employees who are working in Department 10
or to find out the products that falls between the rate category $500
and $1000.
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This
can be used to join two sources coming from two different locations or
from same location. For example, to join a flat file and a relational
source or to join two flat files or to join a relational source and a
XML source. In order to join two sources, there must be at least one
matching port. While joining two sources it is a must to specify one
source as master and the other as detail. The Joiner transformation
supports the following types of joins:
- Normal
- Master Outer
- Detail Outer
- Full Outer
Normal join discards all the rows of data from the master and detail source that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
Lookup Transformation
Lookup transformation is Passive and it can be both Connected and
UnConnected as well. It is used to look up data in a relational table,
view, or synonym. Lookup definition can be imported either from source
or from target tables.
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the sales data resides in another table. Here instead of using the sales table as one more source, use Lookup transformation to lookup the data for the product, with ID 10 in sales table.
Difference between Connected and UnConnected Lookup Transformation:
Connected lookup receives input values directly from mapping pipeline whereas UnConnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas UnConnected lookup has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas UnConnected lookup does not support user defined values..
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is
stored in de-normalized format. Also, Normalizer transformation can be
used to create multiple rows from a single row of data.
Rank Transformation
Rank transformation is an Active and Connected transformation. It is
used to select the top or bottom rank of data. For example, to select
top 10 Regions where the sales volume was very high or to select 10
lowest priced products.
Router Transformation
Router is an Active and Connected transformation. It is similar to
filter transformation. The only difference is, filter transformation
drops the data that do not meet the condition whereas router has an
option to capture the data that do not meet the condition. It is useful
to test multiple conditions. It has input, output and default groups.
For example, if we want to filter data like where State=Michigan,
State=California, State=New York and all other States. It’s easy to
route data to different tables..
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected
transformation. It is used to create unique primary key values or cycle
through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
It has two output ports to connect transformations. By default it has two fields CURRVAL and NEXTVAL(You cannot add ports to this transformation). NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the Increment By value.
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected &
UnConnected transformation. It is useful to automate time-consuming
tasks and it is also used in error handling, to drop and recreate
indexes and to determine the space in database, a specialized
calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure transformation, and the stored procedure can exist in a source, target, or any database with a valid connection to the Informatica Server. Stored Procedure is an executable script with SQL statements and control statements, user-defined variables and conditional statements. In case of stored procedure transformation procedure will be compiled and executed in a relational data source. You need data base connection to import the stored procedure in to your maping
Sorter Transformation
Sorter transformation is a Connected and an Active transformation. It allows to sort data either in ascending or descending order according to a specified field. Also used to configure for case-sensitive sorting, and specify whether the output rows should be distinct.
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected
transformation. When adding a relational or a flat file source
definition to a mapping, it is must to connect it to a Source Qualifier
transformation. The Source Qualifier performs the various tasks such as
overriding default SQL query, filtering records; join data from two or
more tables etc.
Update Strategy Transformation
Update strategy transformation is an Active and Connected
transformation. It is used to update data in target table, either to
maintain history of data or recent changes. You can specify how to treat
source rows in table, insert, update, delete or data driven.
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation. XML Source Qualifier is used only with an XML source definition. It represents the data elements that the Informatica Server reads when it executes a session with XML sources.
Union Transformation
The union transformation is used to merge multiple datasets from various streams or pipelines into one dataset. This transformation works similar to the UNION ALL, it does not remove any duplicate rows. It is recommended to use aggregator to remove duplicates are not expected at the target.
External Procedure Transformation
External Procedure transformation is an Active and Connected/UnConnected transformations. Sometimes, the standard transformations such as Expression transformation may not provide the functionality that you want. In such cases External procedure is useful to develop complex functions within a dynamic link library (DLL) or UNIX shared library, instead of creating the necessary Expression transformations in a mapping.
Differences between Advanced External Procedure and External Procedure Transformations:
External Procedure returns single value, where as Advanced External Procedure returns multiple values.
External Procedure supports COM and Informatica procedures where as AEP supports only Informatica Procedures.
Advanced External Procedure Transformation
Advanced External Procedure transformation is an Active and Connected transformation. It operates in conjunction with procedures, which are created outside of the Designer interface to extend PowerCenter/PowerMart functionality. It is useful in creating external transformation applications, such as sorting and aggregation, which require all input rows to be processed before emitting any output rows.
No comments:
Post a Comment