Monday, August 31, 2009

How many ways u create ports?

Two ways

1.Drag the port from another transformation

2.Click the add button on the ports tab

What is the PivotTable® Service?

This is the primary component that connects clients to the Microsoft® SQL Server™ 2000 Analysis Server. It also provides the capability for clients to create local offline cubes using it as an OLAP server. PivotTable® Service does not have a user interface, the clients using its services has to provide its user interface.

Sunday, August 30, 2009

What r the connected or unconnected transformations?

An unconnected transformation is not connected to other transformations in the mapping.Connected transformation is connected to other transformations in the mapping

What are offline OLAP cubes?

These are OLAP cubes created by clients, end users or third-party applications accessing a data warehouse, relational database or OLAP cube through the Microsoft® PivotTable® Service. E.g. Microsoft® Excel™ is very popular as a client for creating offline local OLAP cubes from relational databases for multidimensional analysis. These cubes have to be maintained and managed by the end users who have to manually refresh their data.

Saturday, August 29, 2009

What r the active and passive transformations?

An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it

What are virtual cubes?

These are combinations of one or more real cubes and require no disk space to store them. They store only the definitions and not the data of the referenced source cubes. They are similar to views in relational databases.

Friday, August 28, 2009

What r the designer tools for creating transformations?

Mapping designer

Transformation developer
Mapplet designer

What are MOLAP cubes?

MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is best when extra storage space is available on the Analysis Server computer and the best query performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates and can be used offline. MOLAP cubes provide the fastest query response time and performance but require additional storage space for the extra copy of data from the fact table.

Thursday, August 27, 2009

How can U create or import flat file definition in to the warehouse designer?

U can not create or import flat file definition in to warehouse designer directly. Instead U must analyze the file in source analyzer, and then drag it into the warehouse designer. When u drags the flat file source definition into warehouse designer workspace, the warehouse designer creates a relational target definition not a file definition. If u want to load to a file, configure the session to write to a flat file. When the informatica server runs the session, it creates and loads the flat file

What are ROLAP cubes?

ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is best when there is limited space on the Analysis Server and query performance is not very important. ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when they are needed. ROLAP cubes requires less storage space than MOLAP and HOLAP cubes.

Wednesday, August 26, 2009

Which transformation should u need while using the COBOL sources as source definitions?

Normalizer transformation which is used to normalize the data. Since COBOL sources r often consists of Denormailzed data

What are HOLAP cubes?

HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and MOLAP cube characteristics. It does not create a copy of the source data however, data aggregations are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best when storage space is limited but faster query responses are needed.

Tuesday, August 25, 2009

To provide support for Mainframes source data, which files r used as a source definitions?

COBOL files

What are cubes?

Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients.

What are aggregations?

Aggregations are precalculated numeric data. By calculating and storing the answers to a query before users ask for it, the query processing time can be reduced. This is key in providing fast query performance in OLAP.

What are measures?

Measures are numeric data based on columns in a fact table. They are the primary data which end users are interested in. E.g. a sales fact table may contain a profit measure which represents profit on each sale.

What are fact tables?

A fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables.

What are dimensions?

Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension (profit by country, state, city), Product dimension (profit for product1, product2).

What is OLTP?

OLTP stands for Online Transaction Processing.OLTP uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.

What is OLAP?

OLAP stands for Online Analytical Processing.It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data. E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company's historical data for trends and patterns.

What is a data warehouse?

A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company.

What are the benefits of data warehousing?

Data warehouses are designed to perform well with aggregate queries running on large amounts of data.
The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.
Data warehouses enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.
Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.
Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.
Data warehousing is an efficient way to manage demand for lots of information from lots of users.
Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.

What is a data mart?

A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.

What is the approximate size of a data warehouse?

You can estimate the approximate size of a data warehouse made up of only fact and dimension tables by estimating the approximate size of the fact tables and ignoring the sizes of the dimension tables.

To estimate the size of the fact table in bytes, multiply the size of a row by the number of rows in the fact table. A more exact estimate would include the data types, indexes, page sizes, etc. An estimate of the number of rows in the fact table is obtained by multiplying the number of transactions per hour by the number of hours in a typical work day and then multiplying the result by the number of days in a year and finally multiply this result by the number of years of transactions involved. Divide this result by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.

E.g. A data warehouse will store facts about the help provided by a company’s product support representatives. The fact table is made of up of a composite key of 7 indexes (int data type) including the primary key. The fact table also contains 1 measure of time (datetime data type) and another measure of duration (int data type). 2000 product incidents are recorded each hour in a relational database. A typical work day is 8 hours and support is provided for every day in the year. What will be approximate size of this data warehouse in 5 years?

First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime data type = 8 bytes):

size of a row = size of all composite indexes (add the size of all indexes) + size of all measures (add the size of all measures).

Size of a row (bytes) = (4 * 7) + (8 + 4).
Size of a row (bytes) = 40 bytes.

Number of rows in fact table = (number of transactions per hour) * (8
hours) * (365 days in a year).
Number of rows in fact table = (2000 product incidents per hour) * (8
Hours ) * (365 days in a year).
Number of rows in fact table = 2000 * 8 * 365
Number of rows in fact table = 5840000
Size of fact table (1 year) = (Number of rows in fact table) * (Size of a
row)

Size of fact table (bytes per year) = 5840000 * 40
Size of fact table (bytes per year) = 233600000.
Size of fact table (megabytes per year) = 233600000 / (1024*1024)
Size of fact table (in megabytes for 5 years) =
(23360000 * 5) / (1024 *1024)
Size of fact table (megabytes) = 1113.89 MB
Size of fact table (gigabytes) = 1113.89 / 1024
Size of fact table (gigabytes) = 1.089 GB

Monday, August 24, 2009

How many ways U can update a relational source definition and what r they?

Two ways

1. Edit the definition

2. Re import the definition

3. Where should U place the flat file to import the flat file definition to the designer?

Place it in local folder

Sunday, August 23, 2009

Saturday, August 22, 2009

What are the types of External Loader available with Informatica?

If you have rank index for top 10. However if you pass only 5 records, what will be the output of such a Rank Transformation?
if u r using a update strategy in any of ur mapping, then in session properties u have to set treat source rows as Data Driven. if u select insert or udate or delete, then the info server will not consider UPD for performing any DB operations.

ELSE

u can use the UPD session level options. instead of using a UPD in mapping just select the update in treat source rows and update else insert option. this will do the same job as UPD. but be sure to have a PK in the target table.

2) for oracle : SQL loader

for teradata:tpump,mload.

3) if u pass only 5 rows to rank, it will rank only the 5 records based on the rank port.

Friday, August 21, 2009

How u will create header and footer in target using informatica?

If you are focus is about the flat files then one can set it in file properties while creating a mapping or at the session level in session properties
What are mapping parameters and varibles in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.

Thursday, August 20, 2009

Which is better among connected lookup and unconnected lookup transformations in informatica or any other ETL tool?

f you are having defined source you can use connected, source is not well defined or from different database you can go for unconnected
We are using like that only

Tuesday, August 18, 2009

Explain about Recovering sessions?

If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the
session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.

Monday, August 17, 2009

What is a time dimension? give an example.

In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.

Sunday, August 16, 2009

Can u copy the session to a different folder or repository?

In addition, you can copy the workflow from the Repository manager. This will automatically copy the mapping, associated source,targets and session to the target folder.
Discuss the advantages & Disadvantages of star & snowflake schema?
In a STAR schema there is no relation between any two dimension tables, whereas in a SNOWFLAKE schema there is a possible relation between the dimension tables.

Saturday, August 15, 2009

What is difference between stored procedure transformation and external procedure transformation?

In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.U need data base connection to import the stored procedure
in to u’r maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r
maping.No need to have data base connection in case of external procedure transformation.

Friday, August 14, 2009

How to use mapping parameters and what is their use

In designer u will find the mapping parameters and variables options.u can assign a value to them in designer. comming to there uses suppose u r doing incremental extractions daily. suppose ur source system contains the day column. so every day u have to go to that mapping and change the day so that the particular data will be extracted . if we do that it will be like a layman's work. there comes the concept of mapping parameters and variables. once if u assign a value to a mapping variable then it will change between sessions.

Thursday, August 13, 2009

In Dimensional modeling fact table is normalized or denormalized?in case of star schema and incase of snow flake schema?

In Dimensional modeling, Star Schema: A Single Fact table will be surrounded by a group of Dimensional tables comprise of de- normalized data Snowflake Schema: A Single Fact table will be surrounded by a group of Dimensional tables comprised of normalized dataThe Star Schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The Star Schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.The Snowflake Schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a Product-category table, and a product-manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance.
We are using Update Strategy Transformation in mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica.
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.

Wednesday, August 12, 2009

What is hash table informatica?

In hash partitioning, the Informatica Server uses a hash function to group rows of data among partitions. The Informatica Server groups the data based on a partition key.Use hash partitioning when you want the Informatica Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.

Tuesday, August 11, 2009

What r the joiner caches?

When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows


After building the caches, the Joiner transformations reads records from the detail source and perform joins

Sunday, August 9, 2009

How can u recover the session in sequential batches?

If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and
then runs the rest of the batch. Use the Perform Recovery session property
To recover sessions in sequential batches configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone
session.

Saturday, August 8, 2009

Data Warehousing - Introduction



1. Explain the evolution of database technology to data mining 2 to 5
2. Describe the steps involved in data mining as a process of knowledge discovery 11 to
3. How is a data warehouse different from a database? How are they similar?
4. Briefly describe different advance database systems and their applications.
5. Define different data mining functionalities 53 to 72
6. Define interestingness, and explain the issues related to interestingness 73 to 78
7. Explain the classification of data mining systems 79 to 84
8. Describe the issues pertaining to data mining systems 85 to 94