Generating Unique Id in Distributed Environment in high Scale:
Recently I was working on a project which requires unique id in a distributed environment which we used as a primary key to store in databases. In a single server, it is easy to generate unique id like Oracle uses sequence(increment counter for next id ) in SQL auto increment primary key column in tables.
In SQL we can do it while creation of the table.
CREATE TABLE example (
primary_key AUTOINCREMENT PRIMARY KEY,
...
);
In Oracle, we use sequence while inserting in table.
CREATE SEQUENCE seq_example
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10;INSERT INTO example (primary_key)
VALUES (seq_example.nextval);
In a single server, it's pretty easy to generate a primary key, In a distributed environment, it becomes a problem because key should be unique in all the nodes. Let’s see how can we do it in a distributed environment.
There are a couple of approaches which has pros and cons both so let’s go through one by one.
Database Ticket Servers:
These are the Centralized Auto increment servers which response with unique ids when requested from the nodes. The problem with these kinds of nodes is a single point of failure because all the nodes are dependent on this server if it fails then all nodes will not able to process further.
UUID:
UUIDs are 128-bit hexadecimal numbers that are globally unique. The chances of the same UUID getting generated twice is a negligible or very very less probability for collisions UUID contains a reference to the network address of the host that generated the UUID, a timestamp (a record of the precise time of a transaction), and some randomly generated component.
According to Wikipedia, regarding the probability of duplicates in random UUIDs:
Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owned 600 million UUIDs.
- UUID’ s does not require coordination between different nodes and can be generated independently.
But the problem with UUID is very big in size and does not index well so while indexing it will take more size which effects query performance.
Twitter Snowflake:
Twitter snowflake is a dedicated network service for generating 64-bit unique IDs at high scale with some simple guarantees.
The IDs are made up of the following components:
- Epoch timestamp in millisecond precision — 41 bits (gives us 69 years with a custom epoch)
- Machine id — 10 bits (gives us up to 1024 machines)
- Sequence number — 12 bits (A local counter per machine that rolls over every 4096)
- The extra 1 bit is reserved for future purposes.
So the id which is generated by this is 64bit which solves the problems of size and latency issues but also introduces one problem for maintaining extra servers.
That’s it
Happy Learning.