Posts

Differences between OLTP and OLAP

Online transaction processing system: *******************************   In OLTP system, the tables are in normalized structure As the OLTP is in normalized structure, DML operations like Updates and deletes will be faster Simple queries are run against the OLTP system OLTP system contains only current data. As the OLTP system does not contain history data, it cannot be used in reporting   Online analytic processing system: ****************************   In OLAP system, the tables are in de normalized structure   As the OLAP is in de normalized structure, the numbers of joins between tables are less and hence select statements will be faster   Complex queries are run against the OLAP system   OLTP system contains only history data and can be used for reporting purpose

Differences between the Star and Snow flake schemas in Dimensional Modeling

Star Schema: 1.    In a start schema, the fact table will be at the center and is connected to the dimension tables. 2.    The tables are completely in de normalized structure. 3.    SQL queries performance is good as there are less number of joins involved. 4.    Data redundancy is high and occupies more disk space. Snow Flake Schema: 1.    Snow flake schema is an extension of star schema where the dimension tables are further connected to one or more dimensions. 2.    The tables are partially de normalized in structure. 3.    The performance of SQL queries are a bit less when compared to star schema as more number of joins are involved. 4.    Data redundancy is low and occupies less disk space when compared to star schema.

DWH Staging Area Importance

The Data Warehouse Staging Area is temporary location where data from source systems is pulled in. A staging area is mainly required in Data Warehousing architecture for timing reasons.  In short, all required data must be available before data can be integrated into the Data Warehouse.  Due to varying business cycles, data processing cycles, hardware and network resource limitations and geographical factors, it is not feasible to extract all the data from all Operational databases at exactly the same time.  One another reason for having staging area is to collect the data from heterogeneous databases and processing the data. Not all business require a Data Warehouse Staging Area.  For many businesses it is feasible to use ETL to copy data directly from operational databases into the Data Warehouse.  

Parameter Files in DVO

We can use parameter file in Informatica DVO while defining the where clause of either table pair or single pair. However there are certain rules in order to use. We must add the parameters to the parameter file before we can add the parameters to a table pair or single table definition. The parameters in the parameter file must be in the format $$parameter name=value . We need to place the parameter file in a location which is accessible by the Data Validation Option Client or DVOCmd that runs the tests. If we specify the location of the parameter file in the Data Validation Option Client, enter the parameter file location as a network path. Example, if the Informatica services runs on Windows, use the following syntax: \\Host name of the Informatica services machine\shared folder path\parameter file name If we specify the parameter file on a UNIX machine, the Data Validation Option Client on Windows cannot access the parameter file. In this case, we need to install D

ETL Testing

Image
Data ware house is relational database which is subject oriented, integrated, time-variant and nonvolatile collection of data used to support strategic decision making process Subject Oriented Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Integrated Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not cha

Informatica DVO

What is Data Validation Option ? What can be tested ? To answer above two questions, Data Validation Option is a ETL Testing tool which comes along with power center and enables to test and validate data Data Validation Option reads table definitions from Power Center metadata repositories, and checks the data for inconsistencies Verify that moved or transformed data is complete and accurate and has not been changed because of errors in the movement or transformation process Will not check the correctness of transformations or mappings designed in Informatica Data Validation stages A typical workflow or stages for data validation consists of multiple tasks- 1. Data Validation Option reads one or more Power Center metadata repositories 2. We define the validation rules(creation of tests) in Data Validation Option 3. We run the rules(tests) to ensure the data conforms to the validation rules. When we do this, Data  Validation Option performs the following tasks: 1.

Test Scenarios to be considered during each phase of E-T-L process till Reporting

During Extraction Process Extraction scripts are extracting all the attributes/columns required as per the extraction design Extraction scripts have access to all the source systems from which data needs to be extracted Extracted data format (flat file/complex file/table) is as per the interface specification design Extraction scripts are extracting complete data (full dump/delta) as per Changed Data Capture (CDC) design Appropriate audit/error logs for extraction are getting generated Extraction process is getting finished within the expected time frame During Transformation Process All the data required for loading has been extracted and is in correct format Attribute level source to target mapping is as per mapping design Business rules have been implemented correctly as per mapping design There is no data leakage during various transformation phases in workflow Data being rejected for any reason

BIG DATA TESTING POINT OF VIEW

1.   Overview Organizations are adopting "Big Data" as their Data Analytics solution, they are finding it difficult to define a robust testing strategy and setting up an optimal test environment for Big Data. This is mostly due to the lack of knowledge and understanding on Big Data testing. Big Data involves processing of huge volume of structured/unstructured data across different nodes using languages such as "Map-reduce", "Hive" and "Pig".   A robust testing strategy needs to be defined well in advance in order to ensure that the functional and non-functional requirements are met and that the data conforms to acceptable quality.   In this document we intend to define recommended test approaches in order to test Big data Projects. 2.   Definition We are living in the data age. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years a