Surrogate Key



A surrogate key may also be called as:
  • synthetic key
  • entity identifier
  • system-generated key
  • database sequence number
  • fact less key
  • technical key
  • arbitrary unique identifier
-It is the system generated key which cannot be edited by the user.
-It is the primary key of the dimension table in warehouse.
-It is nothing but the sequence which generates the numbers for primary key column
.

Surrogate keys are system generated keys. They are integers.

Surrogate keys are extremely useful when having type 2 data (i.e. storing historical information)
For ex: Consider one has a table in which a person and his location are stored. Now when his location is changed and we want to keep a historical record of the same it is stored with a surrogate key that will help us to uniquely identify the record. This is also a reason that OLTP keys are not used in the warehouse and a separate dimension or surrogate key is maintained
.

Surrogate keys are the system generated keys and used to eliminate the redundancy in the primary keys. So they are always integer numbers.
Generate surrogate key
Goal: Fill in a data warehouse dimension table with data which comes from different source systems and assign a unique record identifier (surrogate key) to each record.
Scenario overview and details   : To illustrate this example, we will use two made up sources of information to provide data about customer’s dimension. Each extract contains customer records with a business key (natural key) assigned to it.

In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.

Surrogate key generation mechanism may vary depending on the requirements; however the inputs and outputs usually fit into the design shown below:

Inputs:
- an input represented by an extract from the source system
- data warehouse table reference for identifying the existing records
- maximum key lookup

Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records 


How to test the surrogate key?

usually test with the following way

1. total count = max(sgk) - min(sgk) +1

2. select sgk from table group by sgk having count(*)>1; 


Comments

  1. Hi mahesh , can you explain what is the above two syntax do -

    1. total count = max(sgk) - min(sgk) +1
    2. select sgk from table group by sgk having count(*)>1;

    ReplyDelete

Post a Comment

Popular posts from this blog

Data Migration Testing Startegy

Informatica DVO

ETL Testing