Why we need exponential backoff with jitter when retrying

Why we need exponential backoff with jitter when retrying

Table of contents

The problem

You want to retry an operation e.g. a network request. If you spam requests, you will overload the server and end up getting rate limited (429 too many requests).

This leads us to exponential backoff first. For every retry, we wait longer.

Note: You shouldn't retry for every type of error. If it's a 4xx error, it's a bad request, so retrying the same request won't help.

You can think of this as multiplying the wait time by a constant factor.

  • 1st retry: 1 second

  • 2nd retry: 2 seconds

  • 3rd retry: 4 seconds

  • 4th retry: 8 seconds

However, this is not enough. It's good, but we could do better.

If you think about it, if we have 1000 clients all make API requests at the same time, they will all retry at the same time.

  • 1st retry: 1 second - 1000 requests

  • 2nd retry: 2 seconds - 1000 requests

  • 3rd retry: 4 seconds - 1000 requests

  • 4th retry: 8 seconds - 1000 requests

This creates a new traffic spike which can overwhelm the server. Mind you, it could easily be more than 1000 requests at the same time in the real world.

This problem is called the thundering herd problem. It's a weird naming but stems from nature where herds stampede when they hear thunder. Stampede means a large crowd rushes to a place. Anyways, the naming isn't important here lol.

The solution to not retry at the same time is to add jitter. Jitter means add an element of randomness to the retry times. One request might retry after 1.5 seconds, while another might retry after 2.5 seconds.

Code

// Types for our retry configuration
interface RetryConfig {
  // Base delay in milliseconds e.g. 1000ms = 1 second
  baseDelay: number;

  // Maximum delay in milliseconds e.g. 30000ms = 30 seconds
  maxDelay: number;

  // Maximum number of retry attempts e.g. 5
  maxRetries: number;

  // Factor to determine jitter range (0-1) e.g. 0.2
  jitterFactor: number;
}

// Calculate delay with exponential backoff and jitter
function calculateBackoffDelay(attempt: number, config: RetryConfig): number {
  // Calculate exponential backoff
  // Here Math.min is used to ensure we don't exceed maxDelay
  // This is where we calculate the exponential delay without jitter
  // config.baseDelay * Math.pow(2, attempt) is the key
  // e.g. attempt 2: 1000 * 2^2 = 4000ms = 4 seconds
  // Math.pow will be 2^{attempt}
  // For 4th attempt: 2^4 = 16 -> 2 * 2 * 2 * 2 = 16
  // This ensures the delay grows exponentially
  const exponentialDelay = Math.min(
    config.maxDelay,
    config.baseDelay * Math.pow(2, attempt)
  );

  // Add random jitter
  // e.g. 4000ms * 0.2 = 800ms
  // Math.random() generates a random floating point number between 0 and 1 (will never be 1)
  // e.g. 0.5 * 800ms = 400ms
  // This ensures that the jitter is a random amount between 0 and 800ms
  const jitterRange = exponentialDelay * config.jitterFactor;
  const jitter = Math.random() * jitterRange;

  // Add the jitter to the exponential delay
  // e.g. 4000ms + 400ms = 4400ms
  // For every retry this would be different
  return Math.floor(exponentialDelay + jitter);
}

async function retryWithBackoff<T>(
  operation: () => Promise<T>,
  config: RetryConfig
): Promise<T> {
  // Keep track of how many attempts we've made
  let attempt = 0;

  while (true) {
    try {
      // To be clear, return stops the loop
      return await operation();
    } catch (error) {
      attempt++;

      // If we've made too many attempts, throw an error
      if (attempt >= config.maxRetries) {
        throw new Error(`Failed after ${attempt} attempts: ${error}`);
      }

      // Calculate the delay
      const delay = calculateBackoffDelay(attempt, config);
      console.log(`Attempt ${attempt} failed. Retrying in ${delay}ms...`);

      // Wait for the delay
      // Here we can't use .then or just setTimeout
      // Because they don't "wait", mimicing a sync flow
      // "await" makes it wait before we dive into the next retry
      await new Promise((resolve) => setTimeout(resolve, delay));
    }
  }
}

// Example usage
async function main() {
  const retryConfig: RetryConfig = {
    baseDelay: 1000, // Start with 1 second delay
    maxDelay: 30000, // Max 30 seconds delay
    maxRetries: 5, // Try up to 5 times

    // Moderate jitter factor is around 0.2-0.3
    // Depending on your use case, you might want to tweak this
    // Higher jitter factor means more randomization
    jitterFactor: 0.2, // Add up to 20% random jitter
  };

  try {
    const result = await retryWithBackoff(makeApiRequest, retryConfig);
    console.log("Final result:", result);
  } catch (error) {
    console.error("All retry attempts failed:", error);
  }
}