Tuesday, August 5, 2008

More Informatica Faq's


What is Informatica ?

If you want to set up a Data Warehouse - then you will love to have Informatica software. It will greatly simplify DW design, and numerous routine tasks related to data transformation and migration (ETL - Extract, Transform, and Load), day-to-day maintenance and management.

* www.informatica.com

Informatica has a simple visual interface. You do most of the work by simply dragging and dropping with your mouse in the Designer. This graphical approach makes it also very easy to understand what is going on (it is "self-documenting" in a sense).

Informatica can communicate with all major databases, can move/transform data between them. It can move huge volumes of data in a very effective way. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively do joins between tables in different databases on different servers. The tasks are performed by Informatica Server (Unix or MS Windows). You get a client application called "Server Manager" to work with the server.

You design your processes in a client application called "Designer". This is where you you tell what the source databases and tables will be, what will be the targets, and how you move/transform the data.

Informatica uses its own database called "Metadata Repository Database", or simply a Repository. Repository stores the data (rules) needed for data extraction, transformation, loading, and management. You get a client application "Repository Manager" to work with the repository.

Informatica comes in different packages:

* Informatica PowerCenter license - has all options, including distributed metadata, ability to organize repositories into a data mart domain and share metadata accross repositories.
* PowerMart PowerMart - a limited license (all features except distributed metadata and multiple registered servers).

The short overview below is based on PowerCenter v.1.7, and PowerMart 4.7 (this is dated at ~2000). Since then new versions were made, and all the version numbering was changed. The latest version of the PowerCenter is v.7 (end of 2003 - cost ~$200,000).

Other products by Informatica:

* Informatica PowerAnalyzer (web based tool for data analysis - Business Intelligence)
* Informatica SuperGlue - managing metadata (data about data). Directory of data (personalized), graphical representation of data quality and flow, flexible analysis and reporting (based on PowerAnalyzer) of overall data volumes, loading performance, etc.


Working with Informatica


Here are the pieces of the puzzle:

* source database(s), target database(s), repository metadata database
* Informatica server
* PC-based client software (Designer, Server Manager, Repository Manager)

Setting everything up is also straighforward. Once the server components are installed and configured, you install the client applications, configure ODBC, register the Informatica Server in the Server Manager. Create a Repository, create users and groups, edit users profiles. Add source and target definitions, set up mapping between the sources and targets, create a session for each mapping - and run sessions (resulting in writing data to targets).



Repository Manager




The Repository Manager allows you navigate through multiple repositories and folders inside the repositories. Navigating it is very similar to navigating standard MS Windows Explorer. You have expandable tree on the left (Navigator Window) - and list of details of the objects in the selected folder (Main Window).

click on the image to enlarge it.
Repository Manager


click on the image to enlarge it.
Repository Login

Folders main contain Nodes (subfolders) - Sessions, Batches, Sources, Targets, Transformations, Mapplets (reusable sets of transformations) and Mappings. They in turn may contain corresponding individual repository objects - sessions, batches, sources, targets, transformations, mapplets and mappings, as well as shortcuts, batches, and session logs.

Interface is simple and intuitive, For example, to see the properties of an object - right-click on it - and select Properties. To create a new repository - choose Repository-Create Repository (you have to run in admin mode to be able to do this). Etc. You can reorder the columns in the main window by dragging, and sort by any column (just click on the corresponding header). The set of columns in the main window is different for each kind of the node or object.

Note: PowerMart Repositories are standalone. PowerCenter repositories can be standalone, local, or global.

You can work with:

* Repositories - create, backup, copy, restore, upgrade, and delete repositories.
* Users & Groups - (choose Security menu) - Create, edit, and delete repository users and groups, assign and revoke repository privileges (on a group or user level), and folder permissions, view locks - and unlock objects, versions, and folders. Privelege types:
o Session Operator
o Use Designer
o Browse Repository
o Create Sessions and Batches
o Administer Repository
o Administer Server
o Super User
* Folders - (choose Folder menu) - create/edit/delete folders inside repositories, copy within a repository or between repositories. Folders can be shared or not shared.
* Reports - add/remove reports
* Import/export repository connection info
* Analyze source/target, mapping, shortcut dependencies.
* Search by keyword
* View properties of repository objects
* Customize the Repository Manager (add, edit, remove repositories in the Navigator, edit repository connection info, view/hide windows)

Below the Navigator and the Main Window you may see two more windows:

* Dependency window - to see dependencies for a selected object (source-target, mapping, shortcut dependencies).
* Output window - shows what is happening.


Designer Client Application



Designer consists of several tools (choose Tools menu):

* Source Analyzer - (choose Sources menu) to import or create source definitions for flat file, Cobol, ERP, and Relational Databases. Note - double-click on the title-bar opens a pop-up to edit definitions.
* Warehouse Designer - to import or create target definitions (choose Targets - Generate/Execute SQL, or Targets-Create to create manually).
* Transformation Developer - to create reusable transformations.
* Mapplet Designer - to create mapplets (reusable sets of transformations)
* Mapping Designer - to create mappings (m_somename).

Windows:

* Navigator - to connect and word with multiple repositories and folders, copy objects and create shortcuts.
* Workspace - to view/edit sources, targets, mapplets, transformations, and mappings.
* Output window and Status bar
* Overview (choose View-Overview) - optional window - to simplify wiewing workbooks containing large mappings or large number of objects.

Warehouse Designer - Import Tables:

Click on the image to enlarge it

Edit Table's Definitions:

Mapping Designer - creating mappings:

Click on the image to enlarge it
Mapping Designer


Click on the image to enlarge it.
Example of a mapping


Click on the image to enlarge it.
Overview window

Note: you can open several workspaces (workbooks) - choose Window - New Window, and then select appropriate tool.

To make a mapping:

* Switch to the Mapping Designer
* Choose Mapping-Create - and enter a new name (m_xxxx)
* Drag a source table from the navigator to the work space. Note, that the designer will also automatically create and show a Source Qualifier transformation (this is a temp. table created by Informatica Server).
* Drag a target table to the work space
* Drag one-by-one fields from source to target - thus creating graphical connections. Note - you can also delete connection by selecting it - and pressing DEL button.
* Choose Layout-Arrange

Note: Source has only Ouput ports, Source Qualifier has both input and output ports.

Here are some transformations:

* Advanced External Procedure - ...
* Aggregator - to do things like "group by".
* ERP Source Qualifier - ...
* Expression - to use various expressions.
* External procedure - ...
* Filter - to filter data.
* Joiner - to make joins between separate databases, file, ODBC sources.
* Lookup - to create local copy of the data.
* Normalizer - to transform denormalized data into normalized data.
* Rank - to select only top (or bottom) ranked data.
* Sequence Generator - to generate unique IDs for target tables.
* Source Qualifier - to filter sources (SQL, select distinct, join, etc.)
* Stored Procedure - to run stored procedures in the database - and capture their returned values.
* Update Strategy - to flag records in target for insert,delete, update (defined inside a mapping).

To create a transformation, simply click on the corresponding transformation icon on the transformations toolbar - and then click in the workspace between the tables. The Designer adds a new transformation.

Click on the image to enlarge it.

Chose Layout-Link Columns, drag needed fields from Source Qualifier to the Transformation, double-click on the title bar of the transformation to edit the transformation.

In the "Edit Transformations" dialog box you can check/uncheck necessary options (I/O ports, Group-By), add new ports as necessary, edt the expressions for each port (and validate them).

Click on the image to enlarge it.

You can click on the Expression field - and edit expression in the Expression Editor:

You can chain transformations. You can do joins between tables in different databases using "Lookup" transformation to create local copy of the data. You connect transformations by dragging with the mouse from port to port.


Server Manager



Sessions are sets of instructions for Informatica Server when and how to move data from sources to targets.

Server Manager - a client application used to create and manage sessions and batches, and to configure session connections. You can monitor multiple Informatica Servers, navigate through folders and repositories. Here is what you can do in Server Manager:

* Monitor, add, edit, delete Informatica Server info in the repository
* Stop the infomratica Server
* Configure database, external loader, and FTP connections
* Manage sessions and batches - create, edit, delete, copy/move within a folder, start/stop, abort sessions, view session logs, details, session performance details.

Windows:

* Navigator & Configure windows
* Monitor
* Output
* Status bar

Click on the image to enlarge it.

- Note:As usual you can dock/undock the windows by double-clicking the title bar and/or dragging.
- Note: Cancel button - appears at the bottom-left when the program communicates with the Informatica Server.
- Note: Server Manager can mark a session invalid if something is wrong. You can open session properties, edit, and try again.
- Note: you can create/customize toolbars.

To create a session in Server Manager:

* Select a Repository in the Navigator - and connect to it.
* Choose "Server Configuration - Database Conections" - and add connections to sources and target.
* Choose "Server Configuration - Register Server" - to connect to the server.
* In the Navigator - open a folder with mappings.
* Choose Operations - Add Sessions (or click on "Add Session" button) and select the mapping.
* You will get a session Wizard:







Monitoring and Running a Session:

* Select the Informatica Server in the navigator - and choose Server Configuration - Monitor - to toggle the monitor window. Then you can choose Server Requests - Start/Stop polling, or you can choose Server Requests - Session overview - to refresh the monitor.

Running the Session:

* Select the session wit the mouse - and choose Server Requests - Start (or click on the start button on the toolbar).

Organize sessions in a batch:

* To create a batch choose Operations - Add Batch (or click on the corresponding button on the toolbar).
* Once you created and opened the batch - you can add seesions into it by dragging them into the batch. You can reorder them inside the batch, or you can check the Concurrent option to run the sessions concurrently inside the batch.
* You start the batch the same way as a session - select it - and click the start button (or choose Server Requests - Start).

No comments: