• Skip to main content
  • Skip to primary sidebar
BMA

BeMyAficionado

Inspire Affection

Replication – How Is It Done In Distributed Systems?

February 3, 2022 by varunshrivastava Leave a Comment

Replication simply means keeping a copy of the same data on multiple machines connected via a network. These machines could be present in different parts of the world and not necessarily in the same perimeter.

There are many benefits of having this redundancy that you will see later in this article.

The objective is this article is to provide some insights into different ways this data replication is done in a distributed system. As you will see this replication is not easy and there are tons of tradeoffs that engineers have to make along the way depending on different systems use case.

Table of Contents

    • Why Replication Is Important?
    • What’s So Difficult?
  • Leaders and Followers
    • How can we make all replicas process every write?
    • Synchronous VS Asynchronous Replication
      • Advantage
      • Disadvantage
  • Conclusion

Why Replication Is Important?

There are so many reasons for replicating the data. Some of the important ones are:

  • To Increase Availability. There could be times when parts of your system would stop working (hardware failure, network failure, hazards, etc… etc…) and having a replication helps the system to working and stay available.
  • To Reduce Latency. Replication is also done to keep the data close to the users so that the time required for data to reach user’s machine is minimal.
  • To Increase Read Throughput. Replication is done to scale out number of machine that can serve read queries to increase the throughput of the system.

What’s So Difficult?

You may ask, well, it’s just replication, what could be so difficult.

This is a good question. Replication in itself is not very difficult, it is the nature of data.

If the data is static and non-changing then replication would be so much simple. But in the real-world, data is something that keeps on changing and maintaining that change across multiple servers is something that requires a lot of thinking.

The following three are the most popular techniques/algorithms to do replication:

  • Single-Leader Replication
  • Multi-Leader Replication
  • Leaderless Replication

They all have various pros and cons and are used depending on the type of system use-case.

Leaders and Followers

I defined replication as storing copies of the same data on multiple machines. But then the inevitable question arises – How do we ensure that all the data is copied successfully across all servers/replicas?

The answer is simple – every write will be needed to be processed by each replica, otherwise, the replicas won’t contain the same data and will become useless.

Then comes the question of How.

How can we make all replicas process every write?

There are many ways to do this, but the most common solution that I know of is leader-based replication.

The process goes as follows:

  • One of the replica is designated the leader. Every request from the client is sent to the leader. The leader first writes the data to its own storage (local storage).
  • The other replicas are called followers. Now as soon as the leader finishes writing data to its local storage, it sends the data to all its followers as part of the replication log or change stream (a subject in itself :D). Then each follower takes this replication log and apply the changes accordingly in the same order they were processed by the leader.
  • All the read queries made by the clients now can be served from either leader or replicas but the writes will only be processed by the leader. In other words, the followers are read-only for the outside world.
Leader Based Master Slave Replication
Reference: Martin Kleppman’s Designing Data-Intensive Applications

Another obvious question that comes after looking at this image is – Is this replication taking place synchronously or asynchronously?

Let’s tackle this question now.

Synchronous VS Asynchronous Replication

Let’s walk through the above image and see what happens next when the user updates the username.

The update request is sent from the client reaches to the leader replica. Upon receiving the request, the leader writes the data to its local storage and then forwards the data change to its followers. Eventually, the leader notifies the client that the data update was successful.

Let’s map out the communication that is taking place. In the following example image (time flows from top-to-bottom), the replication to follower 1 is synchronous. Synchronous means that the leader waits for the response to come back from follower 1 before reporting success to the user and before making the commit (making writes visible to other users).

On the other hand, follower 2 is asynchronous, the leader sends the message but doesn’t wait for a response from the follower.

Leader based replicas with one sync and one async
Leader based replicas with one sync and one async

As you can see in the diagram that the response from follower 2 takes a substantial delay. Usually, it’s very fast (less than a second) but there is no guarantee. There could be scenarios where the follower falls behind the leader by several minutes. It depends on the network conditions, load on the server and many different factors.

There are both advantages and disadvantages with the above setup.

Advantage

The benefit of having a synchronous replica is that we can be sure that there will be at least one follower with the most up-to-date data. Since every write is made to the leader and the follower in the same request. This guarantees consistency as well, if suddenly a leader fails there will be at least one replica with the up-to-date data. There is no data loss.

Disadvantage

The leader now has to wait for the follower to finish the writing of the data. There could be scenarios when there is a network partition and the follower is not accessible. In that case, the writes will not be processed and it will put the entire system to block writes until the follower is available again. So it is crucial that synchronous follower is up all the time with leader replica.

Usually, asynchronous replications are used in the distributed system. In that case, if the leader fails then any write that has not been replicated successfully will be lost even after responding to the client that their data has been persisted successfully.

Does it sound like a good option to weaken durability for availability?

Well, this is where the engineers have to make the trade-offs depending on the use case of the system.

Sometimes, it is very important to have data durability then availability. Then synchronous replication makes more sense. And other times asynchronous replication makes more sense.

Conclusion

Data replication is a centre of continuous research in distributed systems. How to make the systems available and consistent at the same time. CAP theorem comes into the picture but still, we can strive for the best. This was a small introductory article on replication. There are many challenges that come up with Replication like setting up a new follower, handling outages, follower failure, leader failure, leader election etc… but it’s very exciting to understand and implement these algorithms and tweak them around to see different results 🙂

The major difference between a thing that might go wrong and a thing that cannot possible go wrong is that when a thing that cannot possible go wrong goes wrong it usually turns out to be impossible to get at or repair.

~ Douglas Adams

Let me know your thoughts on the same.

Related

Filed Under: Programming, Technology Tagged With: distributed-systems, replication

Primary Sidebar

Subscribe to Blog via Email

Do you enjoy the content? Feel free to leave your email with me to receive new content straight to your inbox. I'm an engineer, you can trust me :)

Join 874 other subscribers

Latest Podcasts

Recent Posts

  • Is The Cosmos a Vast Computation?
  • Building Semantic Search for E-commerce Using Product Embeddings and OpenSearch
  • Leader Election with ZooKeeper: Simplifying Distributed Systems Management
  • AWS Serverless Event Driven Data Ingestion from Multiple and Diverse Sources
  • A Step-by-Step Guide to Deploy a Static Website with CloudFront and S3 Using CDK Behind A Custom Domain

Recent Comments

  • Varun Shrivastava on Deploy Lambda Function and API Gateway With Terraform
  • Vaibhav Shrivastava on Deploy Lambda Function and API Gateway With Terraform
  • Varun Shrivastava on Should Girls Wear Short Clothes?
  • D on Should Girls Wear Short Clothes?
  • disqus_X5PikVsRAg on Basic Calculator Leetcode Problem Using Object-Oriented Programming In Java

Categories

  • Blogging
  • Cooking
  • Fashion
  • Finance & Money
  • Programming
  • Reviews
  • Software Quality Assurance
  • Technology
  • Travelling
  • Tutorials
  • Web Hosting
  • Wordpress N SEO

Archives

  • November 2024
  • September 2024
  • July 2024
  • April 2024
  • February 2024
  • November 2023
  • June 2023
  • May 2023
  • April 2023
  • August 2022
  • May 2022
  • April 2022
  • February 2022
  • January 2022
  • November 2021
  • September 2021
  • August 2021
  • June 2021
  • May 2021
  • April 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • February 2020
  • December 2019
  • November 2019
  • October 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • January 2019
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • July 2018
  • June 2018
  • May 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016

Tags

Affordable Hosting (4) algorithms (4) amazon (3) aoc-2020 (7) believe in yourself (4) best (4) database (4) earn money blogging (5) education (4) elementary sorting algorithms (4) experience (3) fashion (4) finance (6) Financial Freedom (7) food (7) friends (3) goals (5) google (5) india (10) indian cuisine (5) indian education system (4) java (16) life (16) life changing (4) love (4) make money (3) microservices (9) motivation (4) oops (4) podcast (6) poor education system (4) principles of microservices (5) problem-solving (7) programmer (5) programming (28) python (5) reality (3) seo (6) spring (3) success (10) success factor (4) technology (4) top 5 (7) typescript (3) wordpress (7)

Copyright © 2025 · Be My Aficionado · WordPress · Log in

Go to mobile version