Posts

BIG DATA TESTING POINT OF VIEW

1.   Overview Organizations are adopting "Big Data" as their Data Analytics solution, they are finding it difficult to define a robust testing strategy and setting up an optimal test environment for Big Data. This is mostly due to the lack of knowledge and understanding on Big Data testing. Big Data involves processing of huge volume of structured/unstructured data across different nodes using languages such as "Map-reduce", "Hive" and "Pig".   A robust testing strategy needs to be defined well in advance in order to ensure that the functional and non-functional requirements are met and that the data conforms to acceptable quality.   In this document we intend to define recommended test approaches in order to test Big data Projects. 2.   Definition We are living in the data age. Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years a...

How to avoid problematic and unwanted connections while refreshing the Connections in DVO Repository

As we know we can refresh the folder list, the Sources, Targets folders and connections in DVO repository and however whole process will consume lot of time for refresh to happen. In order to save time we can explicitly follow certain steps to skip the unwanted connections being refreshed from Powercenter repository into the DVO repository. Here are the steps we can follow to do so: §   Close the Data Validation Option Client §   Create a text file and name it as ‘ExcludeConns.txt’ §   Add the file to the root directory where Data Validation Option is installed. For example, add the ‘ExcludeConns.txt’ file to the following directory: C:\Program Files\Informatica9.5\DVO952 §   Edit the file and just add the name of the connections which were to be excluded from importing\refreshing §   Open the Data Validation Option Client §   Refresh the connections again Note: 1)     We can run the pmrep listConnections command to ...

Maintain Multiple Data Validation Option instances

Maintain Multiple Informatica DVO instances ======================================== We can create multiple instances of Data Validation Option such as Dev instance, SIT Instance or  Prod instance with a single DVO installation. By Default Data Validation Option (DVO) will be installed in C:\Program Files\DVOSoft Directory and Data Validation Option configuration files will be stored in the C:\Documents and Settings\ \DataValidator directory. If you want to maintain multiple instances of DVO (Dev, QA, and Prod), we can do the following: Step 1: Delete the contents of the C:\Documents and Settings\ \DataValidator\* directory. Step 2 : Create Dev, QA and Prod folder in the same directory. C:\Documents and Settings\ \DataValidator\Dev C:\Documents and Settings\ \DataValidator\Qa C:\Documents and Settings\ \DataValidator\Prod Step 3 : Start the DataValidator with the following command: DataValidator.exe “C:\Documents and Settings\ \DataValidator\Dev” Dat...

Enabling debug logging in DVO

HOW TO: Enable debug logging in DVO ================================ To enable debug logging in Data Validator (DVO), edit the log4j.properties file under the DVO installation directory. To increase the level of debugging, change the parameters from info to debug. Steps: ==== Go to C:\Program Files (x86)\DVOSoft914\config directory Edit the log4j.properties file Change log4j.rootLogger from info to debug; log4j.rootLogger=debug, stdout Save and close the file

Dimensional Modelling in DWH

Image
Data Warehouse Dimensional Modelling (Types of Schemas) Four types of schemas Star Schema: A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table. Snow Flake Schema: A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions. Galaxy Schema: Galaxy schema contains many fact tables with some common dimensions (conformed dimensions).  This schema is a combination of many data marts. Fact Constellation Schema: The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city...