Data Warehouse Testing
A data warehouse:
- is a relational database that is designed for query and analysis rather than for transaction processing.
- usually contains historical data derived from the transaction data, also data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
3.is subject-oriented,
integrated, time-variant and non-volatile collection of data in support of
management's decision making process. 
- Subject-Oriented: A data warehouse can be used to analyze a particular subject area.
- Integrated: A data warehouse integrates data from multiple data sources.
- Non volatile: Data once loaded is usually never changed.
- Time-Variant: Historical data is kept in a data warehous
- an extraction, transformation, and loading (ETL),
- an online analytical processing (OLAP) engine,
- client analysis tools, and other applications that manage the process of gathering data and delivering it to business users.
- A data warehouse is optimized to perform well for a wide variety of possible query operations.
- A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques.
- Data warehouses use de-normalized schemas (star schema) to optimize query performance.
- A data warehouse query scans millions of rows.
Data warehouse Architecture:
Data warehouses and their architectures vary depending upon the specifics of an organization's situation depending on its data modeling, security, reference data management, staging area etc.
1. Data Warehouse Architecture (Basic):- Data from different sources are transformed in the data warehouse directly. This data, once processed is used by users for the reporting and analysis thereafter.
2. Data Warehouse Architecture (with a Staging Area):- Data from different sources get processed in a place called staging area before it reaches the warehouse.
3. Data Warehouse Architecture (with a Staging Area and Data Marts):- Data mart is created on top of the data warehouse to cater to different business function areas or geographies. The data that are accessed by end users in the front end applications are fed by the data mart.
Terminology used in Data Warehouse:
Meta Data: - Data that describes data and other structures, such as objects, business rules, and processes. A repository contains metadata.
Summary Data: - A Summary Data Report offers you data about workloads and filters, and then generates recommendations.
Data Mart: - A data warehouse that is designed for a particular line of business, such as sales, marketing, or finance. In a dependent data mart, the data can be derived from an enterprise-wide data warehouse. In an independent data mart, data can be collected directly from sources.
Staging Area: - A place where data is processed before entering the warehouse.
Data Mining: - Data mining (or data discovery) is the process of autonomously extracting useful information or knowledge (“actionable assets”) from large data stores or sets.
Different Types of Tables in Data Warehouse
Dimension table (lookup or reference tables.)
Dimension tables describe the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Fact Table
A table in a star schema that contains facts. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables).
Aggregation Table
The process of consolidating data values into a single value. The data can then be referred to as aggregate data. Aggregation is synonymous with summarization, and aggregate data is synonymous with summary data.
Level value table
A database table that stores the values or data for the levels you created as part of your dimensions and hierarchies.
Phases in testing of Data Warehouses:
Extraction Transformation & Loading:
Business functionality testing, Performance testing and Data Quality testing are required in this phase.
Data Load:
Configuration testing is required in this phase.
Data Loaded:
Availability / Performance testing, Security testing, Recovery testing and Data Quality testing are required in this phase.
Incremental Data Addition:
Regression testing required in the phase.
End to End Business Process:
Interoperability, API and Reports Testing are required in this phase
Data Warehouse Functional Validations:
Standard Validation
- Data Validation: This evaluates and makes sure that a correctness of the data in the warehouse
- Count Validation: This refers to running database queries on a given table or related tables to get the total count of a given transaction or business scenario
- End to End testing: End-to-end testing here refers to validating the data as they pass various stages of the system
- Straight/Direct move: Data is directly moved from source to target system without applying any transformation rule
- Data transformation: Validates whether the transformation is in line with the business rules applied to the source data base or not
- Look up validation: Positive and negative test cases will be created to verify ETL process and lookup validations rules with valid and invalid data respectively.
- Defaulting: The fields in Source (to be defaulted) having invalid values w.r.t. the requirements in Target get defaulted to some appropriate value during ETL run.
- Signage: Various General Ledger amounts are assigned appropriate sign (based on certain rules) before being populated in Target tables.
- Translation: Usually projects may need to translate the currency from one format to another.
- Filtering: Some records have filtering criteria attached to them. The filtered out record will be left behind in the Source. It helps in maximizing the performance of the query in the database and retrieves only required data.
- Average Balance Calculation: The average balance is usually computed every weekend or month end.
- Data integrity validation: Testing to validate the exactness of the data stored. Testing conducted on a periodic basis as data get changed over a period of time in database.
- External field validation: Validates whether the database is able to perform the same function/operation for newly added fields or existing fields in a table or not.
- Load Testing –Incremental load, bulk load and full load
- Performance testing
- Volume testing
Testing Strategies: - Customized test Cases -> Aggregation Validation -> Automation à Data Sampling -> 100% Verification
1. Data Sampling is applicable for Straight hold and homogeneous data. Pros: Time Effective is an advantage, Cons: Precision cannot be high.
2. Automating Data Warehouse is applicable for Straight loads that require exhaustive testing. Pros: High confidence level and more Coverage of scenarios. Cons: Heavily dependent on source target mapping. Higher elapsed time and COST poor customer buying. All scenarios cannot be automated
3. Customized test cases are applicable for Complex data load and hetero generous data load. Pros: Mitigation of business risks. Cons: Good business knowledge required
Challenges in Data Warehouse testing
- Data Warehouse Testing needs to deal with large volumes of data
- Complex transformation logic necessitates high skill levels to design tests with optimal coverage
- Higher Cost of Quality as the impact of defect slippage is high
- Scattered sources of input data and different frequencies of updates
- High turnaround time leading to delay in time to market
- Defining the scope of testing is very important in every project
- In DWH testing projects, estimation is a big challenge
- How to ensure data consistency during transfer from flat file format to dimensional and Identification of Negative Test Scenarios
- Execution time, Huge error logs, Test result verification, Domain Knowledge and Knowledge on SQL
- System Constraints:- Database Size, Slow performance and automation issues
- Environmental Setup Challenges :- Setting up of Source Systems, Staging Areas and DWH and Setting up ETL mappings for the DWH
- Critical Success factors of data warehouse testing
- The right data populated in the database?
- Is the data good enough for the customer to make a strategic business decision
- The current business scenarios have to be tested.
- Future scenarios based on the business goals of the company need to be envisaged and tested.
- Quantity/Size of the data migrated
- One of the major issues in testing of migration project is the size of the data involved, which runs to millions of records especially if historical data load is present.
 
 
Thank you for providing helpful information. I will must be share your blog in my close friends and others who are needed.
ReplyDeleteChicago food grade warehouse
Thanks for the much helpful Bert Bevis...
DeletePlease let me know if you need of any help and can even undertake some projects .Thank you.
DeleteYour company is one of the most reliable data warehouse services companies which works according to the functions and requirements of the data warehouse management.
ReplyDelete
ReplyDeleteNice information, this is will helpfull a lot, Thank for sharing, Keep do posting i like to follow this informatica online training
informatica online course
informatica bdm training
informatica developer training
It is amazing to visit your site. Thanks for sharing this information, this is useful to me...
ReplyDeleteETL Testing Training Hyderabad
ETL Testing Online Course
Really interesting blog you shared here. Thanks for sharing it.
ReplyDeleteERP Software Companies