Tuesday, August 25, 2009

What is the approximate size of a data warehouse?

You can estimate the approximate size of a data warehouse made up of only fact and dimension tables by estimating the approximate size of the fact tables and ignoring the sizes of the dimension tables.

To estimate the size of the fact table in bytes, multiply the size of a row by the number of rows in the fact table. A more exact estimate would include the data types, indexes, page sizes, etc. An estimate of the number of rows in the fact table is obtained by multiplying the number of transactions per hour by the number of hours in a typical work day and then multiplying the result by the number of days in a year and finally multiply this result by the number of years of transactions involved. Divide this result by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.

E.g. A data warehouse will store facts about the help provided by a company’s product support representatives. The fact table is made of up of a composite key of 7 indexes (int data type) including the primary key. The fact table also contains 1 measure of time (datetime data type) and another measure of duration (int data type). 2000 product incidents are recorded each hour in a relational database. A typical work day is 8 hours and support is provided for every day in the year. What will be approximate size of this data warehouse in 5 years?

First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime data type = 8 bytes):

size of a row = size of all composite indexes (add the size of all indexes) + size of all measures (add the size of all measures).

Size of a row (bytes) = (4 * 7) + (8 + 4).
Size of a row (bytes) = 40 bytes.

Number of rows in fact table = (number of transactions per hour) * (8
hours) * (365 days in a year).
Number of rows in fact table = (2000 product incidents per hour) * (8
Hours ) * (365 days in a year).
Number of rows in fact table = 2000 * 8 * 365
Number of rows in fact table = 5840000
Size of fact table (1 year) = (Number of rows in fact table) * (Size of a
row)

Size of fact table (bytes per year) = 5840000 * 40
Size of fact table (bytes per year) = 233600000.
Size of fact table (megabytes per year) = 233600000 / (1024*1024)
Size of fact table (in megabytes for 5 years) =
(23360000 * 5) / (1024 *1024)
Size of fact table (megabytes) = 1113.89 MB
Size of fact table (gigabytes) = 1113.89 / 1024
Size of fact table (gigabytes) = 1.089 GB

No comments: