Surrogate Key
A surrogate key
may also be called as:
- synthetic key
- entity identifier
- system-generated key
- database sequence number
- fact less key
- technical key
- arbitrary unique identifier
-It
is the system generated key which cannot be edited by the user.
-It is the primary key of the dimension table in warehouse.
-It is nothing but the sequence which generates the numbers for primary key column.
-It is the primary key of the dimension table in warehouse.
-It is nothing but the sequence which generates the numbers for primary key column.
Surrogate
keys are system generated keys. They are integers.
Surrogate keys are extremely useful when having type 2 data (i.e. storing historical information)
For ex: Consider one has a table in which a person and his location are stored. Now when his location is changed and we want to keep a historical record of the same it is stored with a surrogate key that will help us to uniquely identify the record. This is also a reason that OLTP keys are not used in the warehouse and a separate dimension or surrogate key is maintained.
Surrogate keys are extremely useful when having type 2 data (i.e. storing historical information)
For ex: Consider one has a table in which a person and his location are stored. Now when his location is changed and we want to keep a historical record of the same it is stored with a surrogate key that will help us to uniquely identify the record. This is also a reason that OLTP keys are not used in the warehouse and a separate dimension or surrogate key is maintained.
Surrogate
keys are the system generated keys and used to eliminate the redundancy in the
primary keys. So they are always integer numbers.
Generate surrogate key
Goal: Fill in a data warehouse
dimension table with data which comes from different source systems and assign
a unique record identifier (surrogate key) to each record.
Scenario
overview and details : To illustrate this
example, we will use two made up sources of information to provide data about customer’s
dimension. Each extract contains customer records with a business key (natural
key) assigned to it.
In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.
Surrogate key generation mechanism may vary depending on the requirements; however the inputs and outputs usually fit into the design shown below:
In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.
A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.
Surrogate key generation mechanism may vary depending on the requirements; however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input represented by an extract from the source system
- data warehouse table reference for identifying the existing records
- maximum key lookup
Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new records
How to test the surrogate key?
usually test with the following way
1. total count = max(sgk) - min(sgk) +1
2. select sgk from table group by sgk having count(*)>1;
1. total count = max(sgk) - min(sgk) +1
2. select sgk from table group by sgk having count(*)>1;
Hi mahesh , can you explain what is the above two syntax do -
ReplyDelete1. total count = max(sgk) - min(sgk) +1
2. select sgk from table group by sgk having count(*)>1;