Skip to main content

Command Palette

Search for a command to run...

Sharding

What if Read Replication isn't enough...

Published
1 min read
Sharding
T

Just a guy who loves to write code and watch anime.

This is the second article of my Distributed Systems series. If you haven't read the previous one, here: Read Replication.

Sharding

What if things get too tough for the leader database (referring back to read replication)?

Writing and distributing the data to all the follower databases, here we've to come up with a different solution.

Sharding is splitting up the workload into multiple read replicas. For an instance, if we have users with names that begin with the letters A to Z, we want to have 4 different replicas, responsible for a different set of users. One replica for example would be responsible for users whose names begin with A to D.

Complexity & Limitations

Sharding fits nicely for key-value stores, but what if you're not a key-value store?

If you don't luck out on being able to shard every single query, life can get a little hard, to the point Sharding just won't work for you (then you'd have to look for something else).

Visualizations

Screenshot from 2022-09-02 16-17-22.png

S

Nice and concise article. I would like to add a few points:

  • Before sharding, we should definitely consider vertical partitioning.

  • Sharding fits nicely for key-value stores, but what if you're not a key-value store?

Actually, it doesn't matter. We just need to generate a hash from the given key(s) to a range [0,numberOfShards-1]. For example, you can have a table with a composite Primary Key as (employeeId (int), department (string)). Even in the above scenario, we can hash it to an integer and then take modulo with numberOfShards.

Important ⚠️: We need to think a bit about the hash function and numberOfShards value and ensure that data gets evenly spread.

  • In case you've queries which require reading from multiple shards very often, then you need to redesign the data model. Or maybe perform some data duplication. It's solely an issue with access patterns and data modelling and not precisely with sharding.
  • Important: Because of sharding, transactions won't work if we're writing across multiple shards.