1. List the types of statistics collected for monitoring the functioning of the data
warehouse.
2. Describe any six different types of action for growth planning prompted by the
collected statistics.
3. How do the statistics help in fine-tuning the data warehouse?
4. Do you think publishing the statistics and similar data for the users is helpful? If
so, why?
5. What are the three main subjects in user training content? Why are these essential?
6. Describe any four major tasks needed to prepare a training program.
7. What are the responsibilities of the training administrator?
8. Do you think a multitier user support structure is suitable for the data warehouse
environment? What are the alternatives?
9. What role can the company’s intranet play in training and user support?
10. List any five factors to be part of ETL management.
Wednesday, November 18, 2009
Evluating Vendor Solutions
First and foremost, determine the functions in your data warehouse that absolutely
need vendor tools and solutions.
For each type of product you need, carefully list the features that are expected. Divide
the features into groups by importance—high, medium, and low. Use these
groups of features to grade the products you are considering.
Allocate enough time to research available solutions and vendors thoroughly.
If you try to incorporate solutions from too many different vendors, you must be
prepared to face serious challenges of incompatibilities and restrictions for integration.
Stay with two or three vendors whose products are most appropriate for your
needs.
Metadata is a key component of a data warehouse. Ensure that the vendor products
you choose can handle metadata satisfactorily.
The standing and stability of the vendor is equally important as the effectiveness of
the products themselves. Even when the products are suitable for your environment,
if you are concerned about the staying power of the vendor, have second thoughts on
selecting these products.
Never rely solely on vendor demonstrations as the basis for your selection, nor
should you check only the references furnished by the vendors themselves.
Test the tools and products in your environment, with subsets of your own data.
Arrange for both user representatives and IT members of the project team to test the
products, jointly and independently.
Establish a definitive method for comparing and scoring competing products. You
may devise a point system to score the various features you are looking for in a
product type.
The success of your data warehouse rides on the end-user tools. Pay special attention
to the choice of the end-user tools. You may compromise a bit on the other
types, but not on the end-user tools.
Most of your end-users will be new to data warehousing. Good, intuitive and easyto-
use tools go a long way in winning them over.
Users like tools that seamlessly incorporate online queries, batch reporting, and data
extraction for analysis.
need vendor tools and solutions.
For each type of product you need, carefully list the features that are expected. Divide
the features into groups by importance—high, medium, and low. Use these
groups of features to grade the products you are considering.
Allocate enough time to research available solutions and vendors thoroughly.
If you try to incorporate solutions from too many different vendors, you must be
prepared to face serious challenges of incompatibilities and restrictions for integration.
Stay with two or three vendors whose products are most appropriate for your
needs.
Metadata is a key component of a data warehouse. Ensure that the vendor products
you choose can handle metadata satisfactorily.
The standing and stability of the vendor is equally important as the effectiveness of
the products themselves. Even when the products are suitable for your environment,
if you are concerned about the staying power of the vendor, have second thoughts on
selecting these products.
Never rely solely on vendor demonstrations as the basis for your selection, nor
should you check only the references furnished by the vendors themselves.
Test the tools and products in your environment, with subsets of your own data.
Arrange for both user representatives and IT members of the project team to test the
products, jointly and independently.
Establish a definitive method for comparing and scoring competing products. You
may devise a point system to score the various features you are looking for in a
product type.
The success of your data warehouse rides on the end-user tools. Pay special attention
to the choice of the end-user tools. You may compromise a bit on the other
types, but not on the end-user tools.
Most of your end-users will be new to data warehousing. Good, intuitive and easyto-
use tools go a long way in winning them over.
Users like tools that seamlessly incorporate online queries, batch reporting, and data
extraction for analysis.
Friday, September 18, 2009
Which transformation should we use to normalize the COBOL and relational sources?
Normalizer Transformation
When U drag the COBOL source in to the mapping Designer workspace,the normalizer transformation automatically appears,creating input and output ports for every column in the source
When U drag the COBOL source in to the mapping Designer workspace,the normalizer transformation automatically appears,creating input and output ports for every column in the source
Thursday, September 17, 2009
Difference between static cache and dynamic cache
Static cache Dynamic cache
U can not inert or update the cache. U can insert rows into the cache as u pass
to the target
The informatic server returns a value from The informatic server inserts rows into cache
the lookup table opr cache when the condition when the condition is false.This indicates that
is true.When the condition is not true,the the row is not in the cache or target table
informatica server returns the default value U can pass these rows to the target table
for connected transformations and null for
unconnected transformations
i
U can not inert or update the cache. U can insert rows into the cache as u pass
to the target
The informatic server returns a value from The informatic server inserts rows into cache
the lookup table opr cache when the condition when the condition is false.This indicates that
is true.When the condition is not true,the the row is not in the cache or target table
informatica server returns the default value U can pass these rows to the target table
for connected transformations and null for
unconnected transformations
i
Wednesday, September 16, 2009
What r the types of lookup caches?
Persistent cache: U can save the lookup cache files and reuse them the next time the informatica server processes a lookup transformation configured to use the cache
Recache from database: If the persistent cache is not synchronized with he lookup table,U can configure the lookup transformation to rebuild the lookup cache
Static cache: U can configure a static or read-only cache for only lookup table. By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.when the lookup condition is true,the informatica server does not update the cache while it prosesses the lookup transformation
Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can create a look up transformation to use dynamic cache.The informatica server dynamically inerts data to the target table
shared cache: U can share the lookup cache between multiple transactions.U can share unnamed cache between transformations inthe same maping
Recache from database: If the persistent cache is not synchronized with he lookup table,U can configure the lookup transformation to rebuild the lookup cache
Static cache: U can configure a static or read-only cache for only lookup table. By default informatica server creates a static cache.It caches the lookup table and lookup values in the cache for each row that comes into the transformation.when the lookup condition is true,the informatica server does not update the cache while it prosesses the lookup transformation
Dynamic cache: If u want to cache the target table and insert new rows into cache and the target,u can create a look up transformation to use dynamic cache.The informatica server dynamically inerts data to the target table
shared cache: U can share the lookup cache between multiple transactions.U can share unnamed cache between transformations inthe same maping
Tuesday, September 15, 2009
what is meant by lookup caches?
The informatica server builds a cache in memory when it processes the first row af a data in a cached look up transformation.It allocates memory for the cache based on the amount u configure in the transformation or session properties.The informatica server stores condition values in the index cache and output values in the data cache
Monday, September 14, 2009
Differences between connected and unconnected lookup?
Connected lookup Unconnected lookup
Receives input values diectly from Receives input values from the result of a
the pipe line. lkp expression in a another transformation
U can use a dynamic or static cache U can use a static cache
Cache includes all lookup columns Cache includes all lookup out put ports in the
used in the mapping lookup condition and the lookup/return port
Support user defined default values Does not support user defiend default values
Receives input values diectly from Receives input values from the result of a
the pipe line. lkp expression in a another transformation
U can use a dynamic or static cache U can use a static cache
Cache includes all lookup columns Cache includes all lookup out put ports in the
used in the mapping lookup condition and the lookup/return port
Support user defined default values Does not support user defiend default values
Sunday, September 13, 2009
Why use the lookup transformation ?
To perform the following tasks
Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target.
Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read.
Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).
Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target.
Saturday, September 12, 2009
What is the look up transformation?
Use lookup transformation in u'r mapping to lookup data in a relational table, view, synonym
Informatica server queries the look up table based on the lookup ports in the transformation. It compares the lookup transformation port values to lookup table column values based on the look up condition
Informatica server queries the look up table based on the lookup ports in the transformation. It compares the lookup transformation port values to lookup table column values based on the look up condition
Thursday, September 10, 2009
Wednesday, September 9, 2009
What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
Type of join
Condition of the join
Tuesday, September 8, 2009
which condtions we can not use joiner transformation(Limitaions of joiner transformation)?
Both pipelines begin with the same original data source.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation.
Both input pipelines originate from the same Source Qualifier transformation.
Both input pipelines originate from the same Normalizer transformation.
Both input pipelines originate from the same Joiner transformation.
Either input pipelines contains an Update Strategy transformation.
Either input pipelines contains a connected or unconnected Sequence Generator transformation.
Monday, September 7, 2009
What r the diffrence between joiner transformation and source qualifier transformation?
U can join hetrogenious data sources in joiner transformation which we can not achieve in source qualifier transformation
U need matching keys to join two relational sources in source qualifier transformation.Where as u doesn't need matching keys to join two sources
Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources which r coming from diffrent sources also
U need matching keys to join two relational sources in source qualifier transformation.Where as u doesn't need matching keys to join two sources
Two relational sources should come from same datasource in sourcequalifier.U can join relatinal sources which r coming from diffrent sources also
Sunday, September 6, 2009
What is aggregate cache in aggregator transforamtion?
The aggregator stores data in the aggregate cache until it completes aggregate calculations.When u run a session that uses an aggregator transformation,the informatica server creates index and data caches in memory to process the transformation.If the informatica server requires more space,it stores overflow values in cache files
Saturday, September 5, 2009
Can u use the maping parameters or variables created in one maping into any other reusable transformation?
Yes.Because reusable tranformation is not contained with any maplet or maping
Can U use the maping parameters or variables created in one maping into another maping?
NO
We can use mapping parameters or variables in any transformation of the same maping or mapplet in which U have created maping parameters or variables
We can use mapping parameters or variables in any transformation of the same maping or mapplet in which U have created maping parameters or variables
Friday, September 4, 2009
What r the mapping paramaters and maping variables?
Maping parameter represents a constant value that U can define before running a session.A mapping parameter retains the same value throughout the entire session
When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session
Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session
When u use the maping parameter ,U declare and use the parameter in a maping or maplet.Then define the value of parameter in a parameter file for the session
Unlike a mapping parameter,a maping variable represents a value that can change throughout the session.The informatica server saves the value of maping variable to the repository at the end of session run and uses that value next time U run the session
Thursday, September 3, 2009
What r the unsupported repository objects for a mapplet?
COBOL source definition
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions
Joiner transformations
Normalizer transformations
Non reusable sequence generator transformations
Pre or post session stored procedures
Target definitions
Power mart 3.5 style Look Up functions
XML source definitions
IBM MQ source definitions
Wednesday, September 2, 2009
What r the methods for creating reusable transforamtions?
Two methods
1.Design it in the transformation developer
2.Promote a standard transformation from the mapping designer.After U add a transformation to the mapping , U can promote it to the status of reusable transformation
Once U promote a standard transformation to reusable status,U can demote it to a standard transformation at any time
If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable transformation properties by clicking the revert button
1.Design it in the transformation developer
2.Promote a standard transformation from the mapping designer.After U add a transformation to the mapping , U can promote it to the status of reusable transformation
Once U promote a standard transformation to reusable status,U can demote it to a standard transformation at any time
If u change the properties of a reusable transformation in mapping,U can revert it to the original reusable transformation properties by clicking the revert button
Tuesday, September 1, 2009
What r the reusable transformations?
Reusable transformations can be used in multiple mappings.When u need to incorporate this transformation into maping,U add an instance of it to maping.Later if U change the definition of the transformation ,all instances of it inherit the changes.Since the instance of reusable transforamation is a pointer to that transforamtion,U can change the transforamation in the transformation developer,its instances automatically reflect these changes.This feature can save U great deal of work
Monday, August 31, 2009
How many ways u create ports?
Two ways
1.Drag the port from another transformation
2.Click the add button on the ports tab
1.Drag the port from another transformation
2.Click the add button on the ports tab
What is the PivotTable® Service?
This is the primary component that connects clients to the Microsoft® SQL Server™ 2000 Analysis Server. It also provides the capability for clients to create local offline cubes using it as an OLAP server. PivotTable® Service does not have a user interface, the clients using its services has to provide its user interface.
Sunday, August 30, 2009
What r the connected or unconnected transformations?
An unconnected transformation is not connected to other transformations in the mapping.Connected transformation is connected to other transformations in the mapping
What are offline OLAP cubes?
These are OLAP cubes created by clients, end users or third-party applications accessing a data warehouse, relational database or OLAP cube through the Microsoft® PivotTable® Service. E.g. Microsoft® Excel™ is very popular as a client for creating offline local OLAP cubes from relational databases for multidimensional analysis. These cubes have to be maintained and managed by the end users who have to manually refresh their data.
Saturday, August 29, 2009
What r the active and passive transformations?
An active transformation can change the number of rows that pass through it. A passive transformation does not change the number of rows that pass through it
What are virtual cubes?
These are combinations of one or more real cubes and require no disk space to store them. They store only the definitions and not the data of the referenced source cubes. They are similar to views in relational databases.
Friday, August 28, 2009
What r the designer tools for creating transformations?
Mapping designer
Transformation developer
Mapplet designer
Transformation developer
Mapplet designer
What are MOLAP cubes?
MOLAP Cubes: stands for Multidimensional OLAP. In MOLAP cubes the data aggregations and a copy of the fact data are stored in a multidimensional structure on the Analysis Server computer. It is best when extra storage space is available on the Analysis Server computer and the best query performance is desired. MOLAP local cubes contain all the necessary data for calculating aggregates and can be used offline. MOLAP cubes provide the fastest query response time and performance but require additional storage space for the extra copy of data from the fact table.
Thursday, August 27, 2009
How can U create or import flat file definition in to the warehouse designer?
U can not create or import flat file definition in to warehouse designer directly. Instead U must analyze the file in source analyzer, and then drag it into the warehouse designer. When u drags the flat file source definition into warehouse designer workspace, the warehouse designer creates a relational target definition not a file definition. If u want to load to a file, configure the session to write to a flat file. When the informatica server runs the session, it creates and loads the flat file
What are ROLAP cubes?
ROLAP Cubes: stands for Relational OLAP. In ROLAP cubes a copy of data from the fact table is not made and the data aggregates are stored in tables in the source relational database. A ROLAP cube is best when there is limited space on the Analysis Server and query performance is not very important. ROLAP local cubes contain the dimensions and cube definitions but aggregates are calculated when they are needed. ROLAP cubes requires less storage space than MOLAP and HOLAP cubes.
Wednesday, August 26, 2009
Which transformation should u need while using the COBOL sources as source definitions?
Normalizer transformation which is used to normalize the data. Since COBOL sources r often consists of Denormailzed data
What are HOLAP cubes?
HOLAP Cubes: stands for Hybrid OLAP. A ROLAP cube has a combination of the ROLAP and MOLAP cube characteristics. It does not create a copy of the source data however, data aggregations are stored in a multidimensional structure on the Analysis Server computer. HOLAP cubes are best when storage space is limited but faster query responses are needed.
Tuesday, August 25, 2009
What are cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients.
What are aggregations?
Aggregations are precalculated numeric data. By calculating and storing the answers to a query before users ask for it, the query processing time can be reduced. This is key in providing fast query performance in OLAP.
What are measures?
Measures are numeric data based on columns in a fact table. They are the primary data which end users are interested in. E.g. a sales fact table may contain a profit measure which represents profit on each sale.
What are fact tables?
A fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables.
What are dimensions?
Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension (profit by country, state, city), Product dimension (profit for product1, product2).
What is OLTP?
OLTP stands for Online Transaction Processing.OLTP uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.
What is OLAP?
OLAP stands for Online Analytical Processing.It uses database tables (fact and dimension tables) to enable multidimensional viewing, analysis and querying of large amounts of data. E.g. OLAP technology could provide management with fast answers to complex queries on their operational data or enable them to analyze their company's historical data for trends and patterns.
What is a data warehouse?
A data warehouse is a collection of data marts representing historical data from different operations in the company. This data is stored in a structure optimized for querying and data analysis as a data warehouse. Table design, dimensions and organization should be consistent throughout a data warehouse so that reports or queries across the data warehouse are consistent. A data warehouse can also be viewed as a database for historical data from different functions within a company.
What are the benefits of data warehousing?
Data warehouses are designed to perform well with aggregate queries running on large amounts of data.
The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.
Data warehouses enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.
Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.
Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.
Data warehousing is an efficient way to manage demand for lots of information from lots of users.
Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.
The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.
Data warehouses enable queries that cut across different segments of a company's operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.
Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.
Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.
Data warehousing is an efficient way to manage demand for lots of information from lots of users.
Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.
What is a data mart?
A data mart is a segment of a data warehouse that can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales, payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller than the corporate data warehouse.
What is the approximate size of a data warehouse?
You can estimate the approximate size of a data warehouse made up of only fact and dimension tables by estimating the approximate size of the fact tables and ignoring the sizes of the dimension tables.
To estimate the size of the fact table in bytes, multiply the size of a row by the number of rows in the fact table. A more exact estimate would include the data types, indexes, page sizes, etc. An estimate of the number of rows in the fact table is obtained by multiplying the number of transactions per hour by the number of hours in a typical work day and then multiplying the result by the number of days in a year and finally multiply this result by the number of years of transactions involved. Divide this result by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.
E.g. A data warehouse will store facts about the help provided by a company’s product support representatives. The fact table is made of up of a composite key of 7 indexes (int data type) including the primary key. The fact table also contains 1 measure of time (datetime data type) and another measure of duration (int data type). 2000 product incidents are recorded each hour in a relational database. A typical work day is 8 hours and support is provided for every day in the year. What will be approximate size of this data warehouse in 5 years?
First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime data type = 8 bytes):
size of a row = size of all composite indexes (add the size of all indexes) + size of all measures (add the size of all measures).
Size of a row (bytes) = (4 * 7) + (8 + 4).
Size of a row (bytes) = 40 bytes.
Number of rows in fact table = (number of transactions per hour) * (8
hours) * (365 days in a year).
Number of rows in fact table = (2000 product incidents per hour) * (8
Hours ) * (365 days in a year).
Number of rows in fact table = 2000 * 8 * 365
Number of rows in fact table = 5840000
Size of fact table (1 year) = (Number of rows in fact table) * (Size of a
row)
Size of fact table (bytes per year) = 5840000 * 40
Size of fact table (bytes per year) = 233600000.
Size of fact table (megabytes per year) = 233600000 / (1024*1024)
Size of fact table (in megabytes for 5 years) =
(23360000 * 5) / (1024 *1024)
Size of fact table (megabytes) = 1113.89 MB
Size of fact table (gigabytes) = 1113.89 / 1024
Size of fact table (gigabytes) = 1.089 GB
To estimate the size of the fact table in bytes, multiply the size of a row by the number of rows in the fact table. A more exact estimate would include the data types, indexes, page sizes, etc. An estimate of the number of rows in the fact table is obtained by multiplying the number of transactions per hour by the number of hours in a typical work day and then multiplying the result by the number of days in a year and finally multiply this result by the number of years of transactions involved. Divide this result by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.
E.g. A data warehouse will store facts about the help provided by a company’s product support representatives. The fact table is made of up of a composite key of 7 indexes (int data type) including the primary key. The fact table also contains 1 measure of time (datetime data type) and another measure of duration (int data type). 2000 product incidents are recorded each hour in a relational database. A typical work day is 8 hours and support is provided for every day in the year. What will be approximate size of this data warehouse in 5 years?
First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime data type = 8 bytes):
size of a row = size of all composite indexes (add the size of all indexes) + size of all measures (add the size of all measures).
Size of a row (bytes) = (4 * 7) + (8 + 4).
Size of a row (bytes) = 40 bytes.
Number of rows in fact table = (number of transactions per hour) * (8
hours) * (365 days in a year).
Number of rows in fact table = (2000 product incidents per hour) * (8
Hours ) * (365 days in a year).
Number of rows in fact table = 2000 * 8 * 365
Number of rows in fact table = 5840000
Size of fact table (1 year) = (Number of rows in fact table) * (Size of a
row)
Size of fact table (bytes per year) = 5840000 * 40
Size of fact table (bytes per year) = 233600000.
Size of fact table (megabytes per year) = 233600000 / (1024*1024)
Size of fact table (in megabytes for 5 years) =
(23360000 * 5) / (1024 *1024)
Size of fact table (megabytes) = 1113.89 MB
Size of fact table (gigabytes) = 1113.89 / 1024
Size of fact table (gigabytes) = 1.089 GB
Monday, August 24, 2009
How many ways U can update a relational source definition and what r they?
Two ways
1. Edit the definition
2. Re import the definition
3. Where should U place the flat file to import the flat file definition to the designer?
Place it in local folder
1. Edit the definition
2. Re import the definition
3. Where should U place the flat file to import the flat file definition to the designer?
Place it in local folder
Sunday, August 23, 2009
While importing the relational source definition from database, what are the metadata of source U import?
Source name
Database location
Column name
Data types
Key constraints
Database location
Column name
Data types
Key constraints
Saturday, August 22, 2009
What are the types of External Loader available with Informatica?
If you have rank index for top 10. However if you pass only 5 records, what will be the output of such a Rank Transformation?
if u r using a update strategy in any of ur mapping, then in session properties u have to set treat source rows as Data Driven. if u select insert or udate or delete, then the info server will not consider UPD for performing any DB operations.
ELSE
u can use the UPD session level options. instead of using a UPD in mapping just select the update in treat source rows and update else insert option. this will do the same job as UPD. but be sure to have a PK in the target table.
2) for oracle : SQL loader
for teradata:tpump,mload.
3) if u pass only 5 rows to rank, it will rank only the 5 records based on the rank port.
if u r using a update strategy in any of ur mapping, then in session properties u have to set treat source rows as Data Driven. if u select insert or udate or delete, then the info server will not consider UPD for performing any DB operations.
ELSE
u can use the UPD session level options. instead of using a UPD in mapping just select the update in treat source rows and update else insert option. this will do the same job as UPD. but be sure to have a PK in the target table.
2) for oracle : SQL loader
for teradata:tpump,mload.
3) if u pass only 5 rows to rank, it will rank only the 5 records based on the rank port.
Friday, August 21, 2009
How u will create header and footer in target using informatica?
If you are focus is about the flat files then one can set it in file properties while creating a mapping or at the session level in session properties
What are mapping parameters and varibles in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.
What are mapping parameters and varibles in which situation we can use it
If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we need to edit the parameter file .
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables .
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.
Thursday, August 20, 2009
Which is better among connected lookup and unconnected lookup transformations in informatica or any other ETL tool?
f you are having defined source you can use connected, source is not well defined or from different database you can go for unconnected
We are using like that only
We are using like that only
Tuesday, August 18, 2009
Explain about Recovering sessions?
If you stop a session or if an error causes a session to stop, refer to the session and error logs to determine the cause of failure. Correct the errors, and then complete the
session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.
session. The method you use to complete the session depends on the properties of the mapping, session, and Informatica Server configuration.
Use one of the following methods to complete the session:
· Run the session again if the Informatica Server has not issued a commit.
· Truncate the target tables and run the session again if the session is not recoverable.
· Consider performing recovery if the Informatica Server has issued at least one commit.
Monday, August 17, 2009
What is a time dimension? give an example.
In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup, and week lookups are not merged as a single table. In a dimensional data modeling(star schema), these tables would be merged as a single table called TIME DIMENSION for performance and slicing data.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.
This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have a trend analysis by comparing this year sales with the previous year or this week sales with the previous week.
Sunday, August 16, 2009
Can u copy the session to a different folder or repository?
In addition, you can copy the workflow from the Repository manager. This will automatically copy the mapping, associated source,targets and session to the target folder.
Discuss the advantages & Disadvantages of star & snowflake schema?
In a STAR schema there is no relation between any two dimension tables, whereas in a SNOWFLAKE schema there is a possible relation between the dimension tables.
Discuss the advantages & Disadvantages of star & snowflake schema?
In a STAR schema there is no relation between any two dimension tables, whereas in a SNOWFLAKE schema there is a possible relation between the dimension tables.
Saturday, August 15, 2009
What is difference between stored procedure transformation and external procedure transformation?
In case of storedprocedure transformation procedure will be compiled and executed in a relational data source.U need data base connection to import the stored procedure
in to u’r maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r
maping.No need to have data base connection in case of external procedure transformation.
in to u’r maping.Where as in external procedure transformation procedure or function will be executed out side of data source.Ie u need to make it as a DLL to access in u r
maping.No need to have data base connection in case of external procedure transformation.
Friday, August 14, 2009
How to use mapping parameters and what is their use
In designer u will find the mapping parameters and variables options.u can assign a value to them in designer. comming to there uses suppose u r doing incremental extractions daily. suppose ur source system contains the day column. so every day u have to go to that mapping and change the day so that the particular data will be extracted . if we do that it will be like a layman's work. there comes the concept of mapping parameters and variables. once if u assign a value to a mapping variable then it will change between sessions.
Thursday, August 13, 2009
In Dimensional modeling fact table is normalized or denormalized?in case of star schema and incase of snow flake schema?
In Dimensional modeling, Star Schema: A Single Fact table will be surrounded by a group of Dimensional tables comprise of de- normalized data Snowflake Schema: A Single Fact table will be surrounded by a group of Dimensional tables comprised of normalized dataThe Star Schema (sometimes referenced as star join schema) is the simplest data warehouse schema, consisting of a single "fact table" with a compound primary key, with one segment for each "dimension" and with additional columns of additive, numeric facts.The Star Schema makes multi-dimensional database (MDDB) functionality possible using a traditional relational database. Because relational databases are the most common data management system in organizations today, implementing multi-dimensional views of data using a relational database is very appealing. Even if you are using a specific MDDB solution, its sources likely are relational databases. Another reason for using star schema is its ease of understanding. Fact tables in star schema are mostly in third normal form (3NF), but dimensional tables in de-normalized second normal form (2NF). If you want to normalize dimensional tables, they look like snowflakes (see snowflake schema) and the same problems of relational databases arise - you need complex queries and business users cannot easily understand the meaning of data. Although query performance may be improved by advanced DBMS technology and hardware, highly normalized tables make reporting difficult and applications complex.The Snowflake Schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a Product-category table, and a product-manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance.
We are using Update Strategy Transformation in mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica.
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.
We are using Update Strategy Transformation in mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica.
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.
Wednesday, August 12, 2009
What is hash table informatica?
In hash partitioning, the Informatica Server uses a hash function to group rows of data among partitions. The Informatica Server groups the data based on a partition key.Use hash partitioning when you want the Informatica Server to distribute rows to the partitions by group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.
Tuesday, August 11, 2009
What r the joiner caches?
When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows
After building the caches, the Joiner transformations reads records from the detail source and perform joins
After building the caches, the Joiner transformations reads records from the detail source and perform joins
Sunday, August 9, 2009
How can u recover the session in sequential batches?
If you configure a session in a sequential batch to stop on failure, you can run recovery starting with the failed session. The Informatica Server completes the session and
then runs the rest of the batch. Use the Perform Recovery session property
To recover sessions in sequential batches configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone
session.
then runs the rest of the batch. Use the Perform Recovery session property
To recover sessions in sequential batches configured to stop on failure:
1.In the Server Manager, open the session property sheet.
2.On the Log Files tab, select Perform Recovery, and click OK.
3.Run the session.
4.After the batch completes, open the session property sheet.
5.Clear Perform Recovery, and click OK.
If you do not clear Perform Recovery, the next time you run the session, the Informatica Server attempts to recover the previous session.
If you do not configure a session in a sequential batch to stop on failure, and the remaining sessions in the batch complete, recover the failed session as a standalone
session.
Saturday, August 8, 2009
Data Warehousing - Introduction
1. Explain the evolution of database technology to data mining 2 to 5
2. Describe the steps involved in data mining as a process of knowledge discovery 11 to
3. How is a data warehouse different from a database? How are they similar?
4. Briefly describe different advance database systems and their applications.
5. Define different data mining functionalities 53 to 72
6. Define interestingness, and explain the issues related to interestingness 73 to 78
7. Explain the classification of data mining systems 79 to 84
8. Describe the issues pertaining to data mining systems 85 to 94
Thursday, July 2, 2009
What kinds of data belong in a data warehouse?
Data that comes from your mainframe or client/server computing systems, data that you use to manage your business, or any type of data that has value to your business. The idea behind the data warehouse is to capture all types of data into a central location. Once this is done you have the ability to link different types of data together and turn that data into valuable information that can be used for your business needs, analysis, discovery and planning
Monday, June 29, 2009
Why would I want to access the data warehouse when I have a mainframe computing system?
Your computing system is set up to handle subject specific day to day business and transaction processing, such as payroll or course registration. The reports created in this type of system are specific to the subject matter. The benefits to putting your data into the data warehouse include:
Merging subject specific data together to create information
Standardizing data across the University
Improving turnaround time for reporting
Lowering costs because you can produce your own reports instead of costly, centrally printed and distributed mainframe reports
Sharing data or allowing others to easily access your data will free staff from the tasks of extracting data and reporting for other departments or colleges
Merging subject specific data together to create information
Standardizing data across the University
Improving turnaround time for reporting
Lowering costs because you can produce your own reports instead of costly, centrally printed and distributed mainframe reports
Sharing data or allowing others to easily access your data will free staff from the tasks of extracting data and reporting for other departments or colleges
Sunday, June 28, 2009
Definitions of OLAP on the Web:
- On-line retrieval and analysis of data to reveal business trends and statistics not directly visible in the data directly retrieved from a data warehouse. Also known as multidimensional analysis.
www.georgetown.edu/uis/ia/dw/GLOSSARY0816.html - Online Analytical Processing or OLAP provides multidimensional, summarized views of business data and is used for reporting, analysis, modeling and planning for optimizing the business. SQL Server supports OLAP via Analysis Services. Analysis Services can be used to work with data warehouses or data marts designed for sophisticated enterprise intelligence systems. These systems process queries required to discover trends and analyze critical factors. ...
www.yukonxml.com/Glossary/ - Online Analytical Processing
www.dmreview.com/resources/glossary.cfm - On-line analytical processing. OLAP is defined as providing fast access to shared multi-dimensional data. OLAP is a term used to generically refer to software and applications that provide users with the ability to store and access data in multi-dimensional cubes. (See also Multi-Dimensional Analysis and Cubes.)
www.oranz.co.uk/glossary_text.htm - On-Line Analytical Processing. A category of applications and technologies for collecting, managing, processing and presenting multidimensional data for analysis and management purposes.
www.olapreport.com/glossary.htm - On-line Analytical Processing. Information analysis that pass the FASMI test: Fast Analysis of Multidimensional Information.
www.intoweb.co.za/it-terms.php - On-Line Analytical Processing. OLAP should be designed to provide end users a quick way of slicing and dicing the data.
www.1keydata.com/datawarehousing/glossary1.html - OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.
www.creotec.com/index.php - “Drilling down” on various data dimensions to gain a more detailed view of the data. For instance, a user might begin by looking at North American sales and then drill down on regional sales, then sales by state, and then sales by major metro area. Enables a user to view different perspectives of the same data to facilitate decision-making.
it.csumb.edu/departments/data/glossary.html - On-Line Analytical Processing, originally introduced in 1994 in a paper by EF Codd, is a decision support counterpart to On-Line Transaction Processing. OLAP allows users to derive information and business intelligence from Data Warehouse systems by providing tools for querying and analyzing the information in the Warehouse. IN particular, OLAP allows multidimensional views and analysis of the data for decision support processes.
planning.ucsc.edu/irps/dwh/DWHGLOSS.HTM - (online analytical processing) A technology that uses multidimensional structures to provide rapid access to data for analysis. OLAP data sets for Project Server are stored in SQL Server tables and managed with Project Web Access and SQL Server Analysis Services. See also: cube
msdn.microsoft.com/library/en-us/pjsdk/html/pjsdkCubesGlossary_HV01096283.asp - A multidimensional, multi-user, client/server computing environment for users who need to analyze consolidated enterprise data in real time. OLAP systems feature zooming, data pivoting, complex calculations, trend analyses, and data modeling.
www-306.ibm.com/software/data/db2/db2olap/docs/v82docs/miner82/glossary.htm - Software that assists in fast analysis of data stored in databases or data warehouses across multiple dimensions.
home.att.net/~c-and-d/Definitions.htm - OLAP is an acronym for online analytical processing. It is an approach to quickly provide the answer to complex database queries. It is used in business reporting for sales, marketing, management reporting, data mining and similar areas. Some people have suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information, or FASMI.
en.wikipedia.org/wiki/OLAP
Saturday, June 27, 2009
Definitions of decision tree on the Web:
-
Definitions of decision tree on the Web:
Is a listing of all the possible outcomes of an expression. Typically decision trees are for boolean expressions.
www.oasismanagement.com/frames/TECHNOLOGY/GLOSSARY/d.html - A graphical representation of all possible outcomes and the paths by which they may be reached; often used in classification tasks. The top layer consists of input nodes (eg, meteorological observations and data). Decision nodes determine the order of progression through the graph. The leaves of the tree are all possible outcomes or classifications, while the root is the final outcome (for example, a weather prediction or climate classification). ...
amsglossary.allenpress.com/glossary/browse - A decision analysis tool that represents multiple collections of rules in the form of branches on a tree, that lead to a larger value or class.
www.bridgefieldgroup.com/glos2.htm - form of flow diagram in which readers are routed according to their response to questions, usually consisting of graphic rectangles and diamonds connected with lines and arrows.
www.techcommunicators.com/dkmanual/glossary.html - A predictive model based on a branching series of tests. Each test examines the value of a single column in the data and uses it to determine the next test to apply. The results of all tests determine which label to predict.
www.purpleinsight.com/downloads/docs/visualizer_tutorial/glossary/go01.html - A diagram consisting of nodes and branches that depicts the information for a decision problem. (page 790)
highered.mcgraw-hill.com/sites/0072470267/student_view0/chapter17/glossary.html - In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions.
en.wikipedia.org/wiki/Decision_tree
Subscribe to:
Posts (Atom)