Tuesday, August 5, 2008

Frequently Asked Questions in Datawarehouse - Concepts

Beginners
1. What is a Data Warehouse?
2. What is a DataMart?
3. What is Data Mining?
4. What do you mean by Dimension Attributes?
5. What is the difference between a data warehouse and a data mart?
6. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
7. What is a star schema?
8. What does it mean by grain of the star schema?
9. What is a snowflake schema?
10. What is a surrogate key?
11. What oracle tools are available to design and build a data warehosue/data mart?
12. What is a Cube?
13. What does ETL stand for?
14. What is Aggregation?
15. what is Business Intelligence?
16. What is transitive dependency?
17. what is the current version of informatica?
18. What are the tools in informatica?Why we are using that tools?
19. What is a transformation?
20. What is a mapping?
21. what is fact less fact table?
22. What is a Schema?
23. What is A Context?
24. What is a Bomain key?
Advanced
25. Who are the Data Stewards and whats their role?
26. What are the most important features of a data warehouse?
27. What the easiest way to build a corporate specific time dimension?
28. What is a Real-Time Data Warehouse - RTDW?
29. What is Slowly Changing Dimension?
30. What is a Conformed Dimension?
31. What is TL9000?

Add a FAQ


Beginners
1. What is a Data Warehouse?






2. What is a DataMart?






3. What is Data Mining?






4. What do you mean by Dimension Attributes?






5. What is the difference between a data warehouse and a data mart?






6. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?






7. What is a star schema?






8. What does it mean by grain of the star schema?






9. What is a snowflake schema?






10. What is a surrogate key?






11. What oracle tools are available to design and build a data warehosue/data mart?






12. What is a Cube?






13. What does ETL stand for?






14. What is Aggregation?






15. what is Business Intelligence?






16. What is transitive dependency?






17. what is the current version of informatica?






18. What are the tools in informatica?Why we are using that tools?






19. What is a transformation?






20. What is a mapping?






21. what is fact less fact table?






22. What is a Schema?






23. What is A Context?






24. What is a Bomain key?






Advanced
25. Who are the Data Stewards and whats their role?






26. What are the most important features of a data warehouse?






27. What the easiest way to build a corporate specific time dimension?






28. What is a Real-Time Data Warehouse - RTDW?






29. What is Slowly Changing Dimension?






30. What is a Conformed Dimension?






31. What is TL9000?













What kinds of data belong in a data warehouse?
Data that comes from your mainframe or client/server computing systems, data that you use to manage your business, or any type of data that has value to your business. The idea behind the data warehouse is to capture all types of data into a central location. Once this is done you have the ability to link different types of data together and turn that data into valuable information that can be used for your business needs, analysis, discovery and planning.
Why would I want to access the data warehouse when I have a mainframe computing system?
Your computing system is set up to handle subject specific day to day business and transaction processing, such as payroll or course registration. The reports created in this type of system are specific to the subject matter. The benefits to putting your data into the data warehouse include:
 Merging subject specific data together to create information
 Standardizing data across the University
 Improving turnaround time for reporting
 Lowering costs because you can produce your own reports instead of costly, centrally printed and distributed mainframe reports
 Sharing data or allowing others to easily access your data will free staff from the tasks of extracting data and reporting for other departments or colleges
What's metadata? What's a data dictionary?
Metadata is data about data. Metadata gives you data element definitions, data layouts, and information about the data element's location. Data elements are the smallest unit of data that can be described, for example the zip code field within an address database record. The University's data warehouse refers to their metadata as data dictionaries. You can access the data dictionaries on the IDEA web page . Click on the Information button, then click on the Data Element Dictionary for the database of your choice.

Informatica Faq's

7. What is the maplet? Maplet is a set of transformations that you build in the maplet designer and U can use in multiple mappings

8. What is a transformation? It is a repository object that generates, modifies or passes data
can U improve session performance in aggregator transformation? Use sorted input


29.What r the types of lookup? Connected and unconnected

35.How the informatica server sorts the string values in Ranktransformation?


When the informatica server runs in the ASCII data movement mode it sorts session data using Binary sortorder.If U configure the seeion to use a binary sort order,the informatica server caluculates the binary value of each string and returns the specified number of rows with the higest binary values for the string



36.What r the rank caches?





During the session ,the informatica server compares an inout row with rows in the datacache.If the input row out-ranks a stored row,the informatica server replaces the stored row with the input row.The informatica server stores group information in an index cache and row data in a data cache





37.What is the Rankindex in Ranktransformation?





The Designer automatically creates a RANKINDEX port for each Rank transformation. The Informatica Server uses the Rank Index port to store the ranking position for each record in a group. For example, if you create a Rank transformation that ranks the top 5 salespersons for each quarter, the rank index numbers the salespeople from 1 to 5:





38.What is the Router transformation?





A Router transformation is similar to a Filter transformation because both transformations allow you to use a condition to test data. However, a Filter transformation tests data for one condition and drops the rows of data that do not meet the condition. A Router transformation tests data for one or more conditions and gives you the option to route rows of data that do not meet any of the conditions to a default output group.


If you need to test the same input data based on multiple conditions, use a Router Transformation in a mapping instead of creating multiple Filter transformations to perform the same task





39.What r the types of groups in Router transformation? Input group Output group





The designer copies property information from the input ports of the input group to create a set of output ports for each output group


Two types of output groups


User defined groups


Default group


U can not modify or delete default groups





40.Why we use stored procedure transformation? For populating and maintaining data bases





42.What r the types of data that passes between informatica server and stored procedure?


3 types of data


Input/Out put parameters


Return Values


Status code


43.What is the status code?





Status code provides error handling for the informatica server during the session.The stored procedure issues a status code that notifies whether or not stored procedure completed sucessfully.This value can not seen by the user.It only used by the informatica server to determine whether to continue running the session or stop





44. What is source qualifier transformation?





When U add a relational or a flat file source definition to a maping,U need to connect it to


a source qualifer transformation.The source qualifier transformation represnets the records


that the informatica server reads when it runs a session





45.What r the tasks that source qualifier performs?





Join data originating from same source data base


Filter records when the informatica server reads source data


Specify an outer join rather than the default inner join


specify sorted records


Select only distinct values from the source


Creating custom query to issue a special SELECT statement for the informatica server to read


source data





46. What is the target load order?





U specify the target loadorder based on source qualifiers in a maping.If u have the multiple


source qualifiers connected to the multiple targets,U can designatethe order in which informatica


server loads data into the targets





47.What is the default join that source qualifier provides?





Inner equi join





48. What r the basic needs to join two sources in a source qualifier?





Two sources should have primary and Foreign key relation ships


Two sources should have matching data types





49.what is update strategy transformation ?





This transformation is used to maintain the history data or just most recent changes in to target


table





50.Describe two levels in which update strategy transformation sets?





Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations.





Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.





51.What is the default source option for update stratgey transformation?





Data driven





52. What is Datadriven?





The informatica server follows instructions coded into update strategy transformations with


in the session maping determine how to flag records for insert,update,,delete or reject


If u do not choose data driven option setting,the informatica server ignores all update strategy


transformations in the mapping





53.What r the options in the target session of update strategy transsformatioin?





Insert


Delete


Update


Update as update


Update as insert


Update esle insert


Truncate table





54. What r the types of maping wizards that r to be provided in Informatica?





The Designer provides two mapping wizards to help you create mappings quickly and easily. Both wizards are designed to create mappings for loading and maintaining star schemas, a series of dimensions related to a central fact table.





Getting Started Wizard. Creates mappings to load static fact and dimension tables, as well as slowly growing dimension tables.


Slowly Changing Dimensions Wizard. Creates mappings to load slowly changing dimension tables based on the amount of historical dimension data you want to keep and the method you choose to handle historical dimension data.





55. What r the types of maping in Getting Started Wizard?





Simple Pass through maping :





Loads a static fact or dimension table by inserting all rows. Use this mapping when you want to drop all existing data from your table before loading new data.





Slowly Growing target :





Loads a slowly growing fact or dimension table by inserting new rows. Use this mapping to load new data when existing data does not require updates.





56. What r the mapings that we use for slowly changing dimension table?





Type1: Rows containing changes to existing dimensions are updated in the target by overwriting the existing dimension. In the Type 1 Dimension mapping, all rows contain current dimension data.


Use the Type 1 Dimension mapping to update a slowly changing dimension table when you do not need to keep any previous versions of dimensions in the table.





Type 2: The Type 2 Dimension Data mapping inserts both new and changed dimensions into the target. Changes are tracked in the target table by versioning the primary key and creating a version number for each dimension in the table.


Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension table when you want to keep a full history of dimension data in the table. Version numbers and versioned primary keys track the order of changes to each dimension.





Type 3: The Type 3 Dimension mapping filters source rows based on user-defined comparisons and inserts only those found to be new dimensions to the target. Rows containing changes to existing dimensions are updated in the target. When updating an existing dimension, the Informatica Server saves existing data in different columns of the same row and replaces the existing data with the updates





57.What r the different types of Type2 dimension maping?





Type2 Dimension/Version Data Maping: In this maping the updated dimension in the source will gets inserted in target along with a new version number.And newly added dimension


in source will inserted into target with a primary key





Type2 Dimension/Flag current Maping: This maping is also used for slowly changing dimensions.In addition it creates a flag value for changed or new dimension


Flag indiactes the dimension is new or newlyupdated.Recent dimensions will gets saved with cuurent flag value 1. And updated dimensions r saved with the value 0





Type2 Dimension/Effective Date Range Maping: This is also one flavour of Type2 maping used for slowly changing dimensions.This maping also inserts both new and changed dimensions in to the target.And changes r tracked by the effective date range for each version of each dimension





58.How can u recognise whether or not the newly added rows in the source r gets insert in the target ?





In the Type2 maping we have three options to recognise the newly added rows


Version number


Flagvalue


Effective date Range





59. What r two types of processes that informatica runs the session?





Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes


The DTM process. Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.





60. What r the new features of the server manager in the informatica 5.0?





U can use command line arguments for a session or batch.This allows U to change the values of session parameters,and mapping parameters and maping variables





Parallel data processig: This feature is available for powercenter only.If we use the informatica server on a SMP system,U can use multiple CPU's to process a session concurently





Process session data using threads: Informatica server runs the session in two processes.Explained in previous question





61. Can u generate reports in Informatcia?





Yes. By using Metadata reporter we can generate reports in informatica





62.What is metadata reporter?





It is a web based application that enables you to run reports againist repository metadata


With a meta data reporter,u can access information about U'r repository with out having knowledge of sql,transformation language or underlying tables in the repository





63.Define maping and sessions?





Maping: It is a set of source and target definitions linked by transformation objects that define the rules for transformation


Session : It is a set of instructions that describe how and when to move data from source to targets





64.Which tool U use to create and manage sessions and batches and to monitor and stop the informatica server?





Informatica server manager





65.Why we use partitioning the session in informatica?





Partitioning achieves the session performance by reducing the time period of reading the source and loading the data into target





66.To achieve the session partition what r the necessary tasks u have to do?





Configure the session to partition source data





Install the informatica server on a machine with multiple CPU's





67.How the informatica server increases the session performance through partitioning the source?





For a relational sources informatica server creates multiple connections for each parttion of a single source and extracts seperate range of data for each connection.Informatica server reads multiple partitions of a single source concurently.Similarly for loading also informatica server creates multiple connections to the target and loads partitions of data concurently





For XML and file sources,informatica server reads multiple files concurently.For loading the data informatica server creates a seperate file for each partition(of a source file).U can choose to merge the targets





68. Why u use repository connectivity?





When u edit,schedule the sesion each time,informatica server directly communicates the repository to check whether or not the session and users r valid.All the metadata of sessions and mappings will be stored in repository





69.What r the tasks that Loadmanger process will do?





Manages the session and batch scheduling: Whe u start the informatica server the load maneger launches and queries the repository for a list of sessions configured to run on the informatica server.When u configure the session the loadmanager maintains list of list of sessions and session start times.When u sart a session loadmanger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process





Locking and reading the session: When the informatica server starts a session lodamaager locks the session from the repository.Locking prevents U starting the session again and again





Reading the parameter file: If the session uses a parameter files,loadmanager reads the parameter file and verifies that the session level parematers are declared in the file





Verifies permission and privelleges: When the sesson starts load manger checks whether or not the user have privelleges to run the session





Creating log files: Loadmanger creates logfile contains the status of session





70. What is DTM process?





After the loadmanger performs validations for session,it creates the DTM process.DTM is to create and manage the threads that carry out the session tasks.I creates the master thread.Master thread creates and manges all the other threads





71. What r the different threads in DTM process?





Master thread: Creates and manages all other threads





Maping thread: One maping thread will be creates for each session.Fectchs session and maping information





Pre and post session threads: This will be created to perform pre and post session operations





Reader thread: One thread will be created for each partition of a source.It reads data from source





Writer thread: It will be created to load data to the target





Transformation thread: It will be created to tranform data





72.What r the data movement modes in informatcia?





Datamovement modes determines how informatcia server handles the charector data.U choose the datamovement in the informatica server configuration settings.Two types of datamovement modes avialable in informatica





ASCII mode


Uni code mode





73. What r the out put files that the informatica server creates during the session running?





Informatica server log: Informatica server(on unix) creates a log for all status and error messages(default name: pm.server.log).It also creates an error log for error messages.These files will be created in informatica home directory





Session log file: Informatica server creates session log file for each session.It writes information about session into log files such as initialization process,creation of sql commands for reader and writer threads,errors encountered and load summary.The amount of detail in session log file depends on the tracing level that u set





Session detail file: This file contains load statistics for each targets in mapping.Session detail include information such as table name,number of rows written or rejected.U can view this file by double clicking on the session in monitor window





Performance detail file: This file contains information known as session performance details which helps U where performance can be improved.To genarate this file select the performance detail option in the session property sheet





Reject file: This file contains the rows of data that the writer does notwrite to targets





Control file: Informatica server creates control file and a target file when U run a session that uses the external loader.The control file contains the information about the target flat file such as data format and loading instructios for the external loader





Post session email: Post session email allows U to automatically communicate information about a session run to designated recipents.U can create two different messages.One if the session completed sucessfully the other if the session fails





Indicator file: If u use the flat file as a target,U can configure the informatica server to create indicator file.For each target row,the indicator file contains a number to indicate whether the row was marked for insert,update,delete or reject





output file: If session writes to a target file,the informatica server creates the target file based on file prpoerties entered in the session property sheet





Cache files: When the informatica server creates memory cache it also creates cache files.For the following circumstances informatica server creates index and datacache files





Aggreagtor transformation


Joiner transformation


Rank transformation


Lookup transformation





74.In which circumstances that informatica server creates Reject files?





When it encounters the DD_Reject in update strategy transformation


Violates database constraint


Filed in the rows was truncated or overflowed





75. What is polling?


It displays the updated information about the session in the monitor window. The monitor window displays the status of each session when U poll the informatica server





76. Can u copy the session to a different folder or repository?





Yes. By using copy session wizard u can copy a session in a different folder or repository.But that


target folder or repository should consists of mapping of that session


If target folder or repository is not having the maping of copying session ,


u should have to copy that maping first before u copy the session


77. What is batch and describe about types of batches?





Grouping of session is known as batch.Batches r two types


Sequential: Runs sessions one after the other


Concurrent: Runs session at same time





If u have sessions with source-target dependencies u have to go for sequential batch to start the


sessions one after another.If u have several independent sessions u can use concurrent batches


Whch runs all the sessions at the same time





78. Can u copy the batches? NO





79.How many number of sessions that u can create in a batch? Any number of sessions





80.When the informatica server marks that a batch is failed?





If one of session is configured to run if previous completes and that previous session fails





81. What is a command that used to run a batch? pmcmd is used to start a batch





82. What r the different options used to configure the sequential batches? Two options





Run the session only if previous session completes sucessfully. Always runs the session





83. In a sequential batch can u run the session if previous session fails?





Yes.By setting the option always runs the session





84. Can u start a batches with in a batch?


U can not. If u want to start batch that resides in a batch,create a new independent batch and copy the necessary sessions into the new batch





85. Can u start a session inside a batch idividually?


We can start our required session only in case of sequential batch.in case of concurrent batch


we cant do like this





86. How can u stop a batch? By using server manager or pmcmd





87. What r the session parameters?





Session parameters r like maping parameters,represent values U might want to change between


sessions such as database connections or source files





Server manager also allows U to create userdefined session parameters.Following r user defined


session parameters


Database connections


Source file names: use this parameter when u want to change the name or location of


session source file between session runs


Target file name : Use this parameter when u want to change the name or location of


session target file between session runs


Reject file name : Use this parameter when u want to change the name or location of


session reject files between session runs





88. What is parameter file?


Parameter file is to define the values for parameters and variables used in a session.A parameter


file is a file created by text editor such as word pad or notepad


U can define the following values in parameter file


Maping parameters


Maping variables


session parameters





89. How can u access the remote source into U'r session?





Relational source: To acess relational source which is situated in a remote place ,u need to


configure database connection to the datasource





FileSource : To access the remote source file U must configure the FTP connection to the


host machine before u create the session





Hetrogenous : When U'r maping contains more than one source type,the server manager creates


a hetrogenous session that displays source options for all types








90. What is difference between partioning of relatonal target and partitioning of file targets?





If u parttion a session with a relational target informatica server creates multiple connections


to the target database to write target data concurently.If u partition a session with a file target


the informatica server creates one target file for each partition.U can configure session properties


to merge these target files





91. what r the transformations that restricts the partitioning of sessions?





Advanced External procedure tranformation and External procedure transformation: This


transformation contains a check box on the properties tab to allow partitioning





Aggregator Transformation: If u use sorted ports u can not parttion the assosiated source





Joiner Transformation : U can not partition the master source for a joiner transformation





Normalizer Transformation





XML targets





92. Performance tuning in Informatica?





The goal of performance tuning is optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following





The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections ofteny affect on session performance.So aviod netwrok connections





Flat files: If u'r flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server


Relational datasources: Minimize the connections to sources ,targets and informatica server to


improve session performance.Moving target database into server system may improve session


performance


Staging areas: If u use staging areas u force informatica server to perform multiple datapasses


Removing of staging areas may improve session performance





U can run the multiple informatica servers againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance





Run the informatica server in ASCII datamovement mode improves the session performance.Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character





If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.





We can improve the session performance by configuring the network packet size,which allows


data to cross the network at one time.To do this go to server manger ,choose server configure database connections





If u r target consists key constraints and indexes u slow the loading of data.To improve the session performance in this case drop constraints and indexes before u run the session and rebuild them after completion of session





Running a parallel sessions by using concurrent batches will also reduce the time of loading the


data.So concurent batches may also increase the session performance





Partittionig the session improves the session performance by creating multiple connections to sources and targets and loads data in paralel pipe lines





In some cases if a session contains a aggregator transformation ,u can use incremental aggregation to improve session performance





Aviod transformation errors to improve the session performance





If the sessioin containd lookup transformation u can improve the session performance by enabling the look up cache





If U'r session contains filter transformation ,create that filter transformation nearer to the sources


or u can use filter condition in source qualifier





Aggreagator,Rank and joiner transformation may oftenly decrease the session performance .Because they must group data before processing it.To improve session performance in this case use sorted ports option





92. What is difference between maplet and reusable transformation?





Maplet consists of set of transformations that is reusable.A reusable transformation is a


single transformation that can be reusable





If u create a variables or parameters in maplet that can not be used in another maping or maplet.Unlike the variables that r created in a reusable transformation can be usefull in any other maping or maplet





We can not include source definitions in reusable transformations.But we can add sources to a maplet





Whole transformation logic will be hided in case of maplet.But it is transparent in case of reusable transformation





We cant use COBOL source qualifier,joiner,normalizer transformations in maplet.Where as we can make them as a reusable transformations





93. Define informatica repository?





The Informatica repository is a relational database that stores information, or metadata, used by the Informatica Server and Client tools. Metadata can include information such as mappings describing how to transform source data, sessions indicating when you want the Informatica Server to perform the transformations, and connect strings for sources and targets.





The repository also stores administrative information such as usernames and passwords, permissions and privileges, and product version.





Use repository manager to create the repository.The Repository Manager connects to the repository database and runs the code needed to create the repository tables.Thsea tables


stores metadata in specific format the informatica server,client tools use





94. What r the types of metadata that stores in repository?





Following r the types of metadata that stores in the repository





Database connections


Global objects


Mappings


Mapplets


Multidimensional metadata


Reusable transformations


Sessions and batches


Short cuts


Source definitions


Target defintions


Transformations





95. What is power center repository?





The PowerCenter repository allows you to share metadata across repositories to create a data mart domain. In a data mart domain, you can create a single global repository to store metadata used across an enterprise, and a number of local repositories to share the global metadata as needed





96. How can u work with remote database in informatica?did u work directly by using remote


connections?





To work with remote datasource u need to connect it with remote connections.But it is not


preferable to work with that remote source directly by using remote connections .Instead u bring that source into U r local machine where informatica server resides.If u work directly with remote source the session performance will decreases by passing less amount of data across the network in a particular time





97. What r the new features in Informatica 5.0?





U can Debug U'r maping in maping designer


U can view the work space over the entire screen


The designer displays a new icon for a invalid mapings in the navigator window


U can use a dynamic lookup cache in a lokup transformation


Create maping parameters or maping variables in a maping or maplet to make mapings more


flexible


U can export objects into repository and import objects from repository.when u export a repository object,the designer or server manager creates an XML file to describe the repository metadata


The designer allows u to use Router transformation to test data for multiple conditions.Router transformation allows u route groups of data to transformation or target


U can use XML data as a source or target





Server Enahancements:





U can use the command line program pmcmd to specify a parameter file to run sessions or batches.This allows you to change the values of session parameters, and mapping parameters and variables at runtime.





If you run the Informatica Server on a symmetric multi-processing system, you can use multiple CPUs to process a session concurrently. You configure partitions in the session properties based on source qualifiers. The Informatica Server reads, transforms, and writes partitions of data in parallel for a single session. This is avialable for Power center only





Informatica server creates two processes like loadmanager process,DTM process to run the sessions





Metadata Reporter: It is a web based application which is used to run reports againist repository metadata





U can copy the session across the folders and reposotories using the copy session wizard in the informatica server manager





With new email variables, you can configure post-session email to include information, such as the mapping used during the session





98. what is incremantal aggregation?





When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the Informatica Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.





99. What r the scheduling options to run a sesion?





U can shedule a session to run at a given time or intervel,or u can manually run the session





Different options of scheduling





Run only on demand: server runs the session only when user starts session explicitly


Run once: Informatica server runs the session only once at a specified date and time


Run every: Informatica server runs the session at regular intervels as u configured


Customized repeat: Informatica server runs the session at the dats and times secified in the repeat dialog box





100 .What is tracing level and what r the types of tracing level?





Tracing level represents the amount of information that informatcia server writes in a log file


Types of tracing level


Normal


Verbose


Verbose init


Verbose data





101. What is difference between stored procedure transformation and external procedure transformation?





In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.U need data base connection to import the stored procedure in to u'r maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r maping.No need to have data base connection in case of external procedure transformation





102. Explain about Recovering sessions?





If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.


Use one of the following methods to complete the session:


· Run the session again if the Informatica Server has not issued a commit.


· Truncate the target tables and run the session again if the session is not recoverable


· Consider performing recovery if the Informatica Server has issued at least one commit





103. If a session fails after loading of 10,000 records in to the target.How can u load the records from 10001 th record when u run the session next time?





As explained above informatcia server has 3 methods to recovering the sessions.Use performing recovery to load the records from where the session fails





104. Explain about perform recovery?





When the Informatica Server starts a recovery session, it reads the OPB_SRVR_RECOVERY table and notes the row ID of the last row committed to the target database. The Informatica Server then reads all sources again and starts processing from the next row ID. For example, if the Informatica Server commits 10,000 rows before the session fails, when you run recovery, the Informatica Server bypasses the rows up to 10,000 and starts loading with row 10,001.


By default, Perform Recovery is disabled in the Informatica Server setup. You must enable Recovery in the Informatica Server setup before you run a session so the Informatica Server can create and/or write entries in the OPB_SRVR_RECOVERY table.





105. How to recover the standalone session?





A standalone session is a session that is not nested in a batch. If a standalone session fails, you can run recovery using a menu command or pmcmd. These options are not available for batched sessions.





To recover sessions using the menu:


1. In the Server Manager, highlight the session you want to recover.


2. Select Server Requests-Stop from the menu.


3. With the failed session highlighted, select Server Requests-Start Session in Recovery Mode from the menu.





To recover sessions using pmcmd:


1.From the command line, stop the session.


2. From the command line, start recovery





106. How can u recover the session in sequential batches?





If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and then runs the rest of the batch. Use the Perform Recovery session property





To recover sessions in sequential batches configured to stop on failure:





1.In the Server Manager, open the session property sheet.


2.On the Log Files tab, select Perform Recovery, and click OK.


3.Run the session.


4.After the batch completes, open the session property sheet.


5.Clear Perform Recovery, and click OK.





If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session


If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone session.



107. How to recover sessions in concurrent batches?





If multiple sessions in a concurrent batch fail, you might want to truncate all targets and run the batch again. However, if a session in a concurrent batch fails and the rest of the sessions complete successfully, you can recover the session as a standalone session.


To recover a session in a concurrent batch:


1.Copy the failed session using Operations-Copy Session.


2.Drag the copied session outside the batch to be a standalone session.


3.Follow the steps to recover a standalone session.


4.Delete the standalone copy.



108. How can u complete unrcoverable sessions?


Under certain circumstances, when a session does not complete, you need to truncate the target tables and run the session from the beginning. Run the session from the beginning when the Informatica Server cannot run recovery or when running recovery might result in inconsistent data.





109. What r the circumstances that infromatica server results an unreciverable session?





The source qualifier transformation does not use sorted ports


If u change the partition information after the initial session fails


Perform recovery is disabled in the informatica server configuration


If the sources or targets changes after initial session fails


If the maping consists of sequence generator or normalizer transformation


If a concuurent batche contains multiple failed sessions


110. If i done any modifications for my table in back end does it reflect in informatca warehouse or maping desginer or source analyzer?


NO. Informatica is not at all concern with back end data base.It displays u all the information
that is to be stored in repository.If want to reflect back end changes to informatica screens,


again u have to import from back end to informatica by valid connection.And u have to replace the existing files with imported files

111. After draging the ports of three sources(sql server,oracle,informix) to a single source qualifier, can u map these three ports directly to target?

NO. Unless and until u join those three ports in source qualifier u cannot map them directly

More Informatica Faq's


What is Informatica ?

If you want to set up a Data Warehouse - then you will love to have Informatica software. It will greatly simplify DW design, and numerous routine tasks related to data transformation and migration (ETL - Extract, Transform, and Load), day-to-day maintenance and management.

* www.informatica.com

Informatica has a simple visual interface. You do most of the work by simply dragging and dropping with your mouse in the Designer. This graphical approach makes it also very easy to understand what is going on (it is "self-documenting" in a sense).

Informatica can communicate with all major databases, can move/transform data between them. It can move huge volumes of data in a very effective way. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log). It can effectively do joins between tables in different databases on different servers. The tasks are performed by Informatica Server (Unix or MS Windows). You get a client application called "Server Manager" to work with the server.

You design your processes in a client application called "Designer". This is where you you tell what the source databases and tables will be, what will be the targets, and how you move/transform the data.

Informatica uses its own database called "Metadata Repository Database", or simply a Repository. Repository stores the data (rules) needed for data extraction, transformation, loading, and management. You get a client application "Repository Manager" to work with the repository.

Informatica comes in different packages:

* Informatica PowerCenter license - has all options, including distributed metadata, ability to organize repositories into a data mart domain and share metadata accross repositories.
* PowerMart PowerMart - a limited license (all features except distributed metadata and multiple registered servers).

The short overview below is based on PowerCenter v.1.7, and PowerMart 4.7 (this is dated at ~2000). Since then new versions were made, and all the version numbering was changed. The latest version of the PowerCenter is v.7 (end of 2003 - cost ~$200,000).

Other products by Informatica:

* Informatica PowerAnalyzer (web based tool for data analysis - Business Intelligence)
* Informatica SuperGlue - managing metadata (data about data). Directory of data (personalized), graphical representation of data quality and flow, flexible analysis and reporting (based on PowerAnalyzer) of overall data volumes, loading performance, etc.


Working with Informatica


Here are the pieces of the puzzle:

* source database(s), target database(s), repository metadata database
* Informatica server
* PC-based client software (Designer, Server Manager, Repository Manager)

Setting everything up is also straighforward. Once the server components are installed and configured, you install the client applications, configure ODBC, register the Informatica Server in the Server Manager. Create a Repository, create users and groups, edit users profiles. Add source and target definitions, set up mapping between the sources and targets, create a session for each mapping - and run sessions (resulting in writing data to targets).



Repository Manager




The Repository Manager allows you navigate through multiple repositories and folders inside the repositories. Navigating it is very similar to navigating standard MS Windows Explorer. You have expandable tree on the left (Navigator Window) - and list of details of the objects in the selected folder (Main Window).

click on the image to enlarge it.
Repository Manager


click on the image to enlarge it.
Repository Login

Folders main contain Nodes (subfolders) - Sessions, Batches, Sources, Targets, Transformations, Mapplets (reusable sets of transformations) and Mappings. They in turn may contain corresponding individual repository objects - sessions, batches, sources, targets, transformations, mapplets and mappings, as well as shortcuts, batches, and session logs.

Interface is simple and intuitive, For example, to see the properties of an object - right-click on it - and select Properties. To create a new repository - choose Repository-Create Repository (you have to run in admin mode to be able to do this). Etc. You can reorder the columns in the main window by dragging, and sort by any column (just click on the corresponding header). The set of columns in the main window is different for each kind of the node or object.

Note: PowerMart Repositories are standalone. PowerCenter repositories can be standalone, local, or global.

You can work with:

* Repositories - create, backup, copy, restore, upgrade, and delete repositories.
* Users & Groups - (choose Security menu) - Create, edit, and delete repository users and groups, assign and revoke repository privileges (on a group or user level), and folder permissions, view locks - and unlock objects, versions, and folders. Privelege types:
o Session Operator
o Use Designer
o Browse Repository
o Create Sessions and Batches
o Administer Repository
o Administer Server
o Super User
* Folders - (choose Folder menu) - create/edit/delete folders inside repositories, copy within a repository or between repositories. Folders can be shared or not shared.
* Reports - add/remove reports
* Import/export repository connection info
* Analyze source/target, mapping, shortcut dependencies.
* Search by keyword
* View properties of repository objects
* Customize the Repository Manager (add, edit, remove repositories in the Navigator, edit repository connection info, view/hide windows)

Below the Navigator and the Main Window you may see two more windows:

* Dependency window - to see dependencies for a selected object (source-target, mapping, shortcut dependencies).
* Output window - shows what is happening.


Designer Client Application



Designer consists of several tools (choose Tools menu):

* Source Analyzer - (choose Sources menu) to import or create source definitions for flat file, Cobol, ERP, and Relational Databases. Note - double-click on the title-bar opens a pop-up to edit definitions.
* Warehouse Designer - to import or create target definitions (choose Targets - Generate/Execute SQL, or Targets-Create to create manually).
* Transformation Developer - to create reusable transformations.
* Mapplet Designer - to create mapplets (reusable sets of transformations)
* Mapping Designer - to create mappings (m_somename).

Windows:

* Navigator - to connect and word with multiple repositories and folders, copy objects and create shortcuts.
* Workspace - to view/edit sources, targets, mapplets, transformations, and mappings.
* Output window and Status bar
* Overview (choose View-Overview) - optional window - to simplify wiewing workbooks containing large mappings or large number of objects.

Warehouse Designer - Import Tables:

Click on the image to enlarge it

Edit Table's Definitions:

Mapping Designer - creating mappings:

Click on the image to enlarge it
Mapping Designer


Click on the image to enlarge it.
Example of a mapping


Click on the image to enlarge it.
Overview window

Note: you can open several workspaces (workbooks) - choose Window - New Window, and then select appropriate tool.

To make a mapping:

* Switch to the Mapping Designer
* Choose Mapping-Create - and enter a new name (m_xxxx)
* Drag a source table from the navigator to the work space. Note, that the designer will also automatically create and show a Source Qualifier transformation (this is a temp. table created by Informatica Server).
* Drag a target table to the work space
* Drag one-by-one fields from source to target - thus creating graphical connections. Note - you can also delete connection by selecting it - and pressing DEL button.
* Choose Layout-Arrange

Note: Source has only Ouput ports, Source Qualifier has both input and output ports.

Here are some transformations:

* Advanced External Procedure - ...
* Aggregator - to do things like "group by".
* ERP Source Qualifier - ...
* Expression - to use various expressions.
* External procedure - ...
* Filter - to filter data.
* Joiner - to make joins between separate databases, file, ODBC sources.
* Lookup - to create local copy of the data.
* Normalizer - to transform denormalized data into normalized data.
* Rank - to select only top (or bottom) ranked data.
* Sequence Generator - to generate unique IDs for target tables.
* Source Qualifier - to filter sources (SQL, select distinct, join, etc.)
* Stored Procedure - to run stored procedures in the database - and capture their returned values.
* Update Strategy - to flag records in target for insert,delete, update (defined inside a mapping).

To create a transformation, simply click on the corresponding transformation icon on the transformations toolbar - and then click in the workspace between the tables. The Designer adds a new transformation.

Click on the image to enlarge it.

Chose Layout-Link Columns, drag needed fields from Source Qualifier to the Transformation, double-click on the title bar of the transformation to edit the transformation.

In the "Edit Transformations" dialog box you can check/uncheck necessary options (I/O ports, Group-By), add new ports as necessary, edt the expressions for each port (and validate them).

Click on the image to enlarge it.

You can click on the Expression field - and edit expression in the Expression Editor:

You can chain transformations. You can do joins between tables in different databases using "Lookup" transformation to create local copy of the data. You connect transformations by dragging with the mouse from port to port.


Server Manager



Sessions are sets of instructions for Informatica Server when and how to move data from sources to targets.

Server Manager - a client application used to create and manage sessions and batches, and to configure session connections. You can monitor multiple Informatica Servers, navigate through folders and repositories. Here is what you can do in Server Manager:

* Monitor, add, edit, delete Informatica Server info in the repository
* Stop the infomratica Server
* Configure database, external loader, and FTP connections
* Manage sessions and batches - create, edit, delete, copy/move within a folder, start/stop, abort sessions, view session logs, details, session performance details.

Windows:

* Navigator & Configure windows
* Monitor
* Output
* Status bar

Click on the image to enlarge it.

- Note:As usual you can dock/undock the windows by double-clicking the title bar and/or dragging.
- Note: Cancel button - appears at the bottom-left when the program communicates with the Informatica Server.
- Note: Server Manager can mark a session invalid if something is wrong. You can open session properties, edit, and try again.
- Note: you can create/customize toolbars.

To create a session in Server Manager:

* Select a Repository in the Navigator - and connect to it.
* Choose "Server Configuration - Database Conections" - and add connections to sources and target.
* Choose "Server Configuration - Register Server" - to connect to the server.
* In the Navigator - open a folder with mappings.
* Choose Operations - Add Sessions (or click on "Add Session" button) and select the mapping.
* You will get a session Wizard:







Monitoring and Running a Session:

* Select the Informatica Server in the navigator - and choose Server Configuration - Monitor - to toggle the monitor window. Then you can choose Server Requests - Start/Stop polling, or you can choose Server Requests - Session overview - to refresh the monitor.

Running the Session:

* Select the session wit the mouse - and choose Server Requests - Start (or click on the start button on the toolbar).

Organize sessions in a batch:

* To create a batch choose Operations - Add Batch (or click on the corresponding button on the toolbar).
* Once you created and opened the batch - you can add seesions into it by dragging them into the batch. You can reorder them inside the batch, or you can check the Concurrent option to run the sessions concurrently inside the batch.
* You start the batch the same way as a session - select it - and click the start button (or choose Server Requests - Start).

Data Warehousing FAQ's II

1)What are the various test procedures used to check whether the data is loaded in the backend, performance of the mapping, and quality of the data loaded in INFORMATICA.
2) What are the common problems developers face while ETL development
If you want to know the performance of a mapping at transformation level, then select the option in the session properties-> collect performance data. At the run time in the monitor you can see it in theƂ performance tab or you can get it from a file.
The PowerCenter Server names the file session_name.perf, and stores it in the same directory as the session log. If there is no session-specific directory for the session log, the PowerCenter Server saves the file in the default log files directory.
Quality of the data loaded depends on the quality of data in the source. If cleansing is required then have to perform some data cleansing operations in informatica. Final data will always be clean if followed.




What will happen if you are using Update Strategy Transformation and your session is configured for "insert"?