Systems Design Data Engineering Backend DDIA
29 min read
Foundations of Data Systems Explained-- DDIA (1/3)

Introduction

Modern applications handle more data, serve more users, and must run across more machines than ever before. But what makes a data system truly reliable, scalable, and maintainable?

Designing Data-Intensive Applications by Martin Kleppmann offers a deep exploration of how data systems work under the hood. It is not just a book about databases β€” it is a blueprint for building systems that can grow, adapt, and perform under pressure.

Part I of the book, β€œFoundations of Data Systems,” sets the stage by introducing the fundamental building blocks of backend infrastructure. This part is about understanding the core trade-offs and architectural decisions behind every storage engine, query language, and data format we choose.

This article distills the key lessons from Chapters 1 to 4:

  • What reliability and scalability really mean in practice
  • How different data models affect flexibility and performance
  • The internal workings of storage engines like B-Trees and LSM-Trees
  • How data encoding and schema evolution shape the future of your system

Whether you’re an engineer designing your first database schema or an architect managing distributed pipelines, these foundations will give you the vocabulary, mental models, and examples to build better systems.

Let’s dive in.


Chapter 1: Reliable, Scalable, and Maintainable Applications

Reliability

A reliable system continues to function correctly even when hardware or software fails. Reliability is about minimizing downtime and data loss, and ensuring the system can recover gracefully from unexpected events.

Key Strategies:

  • Redundancy: Replicate data and services across nodes, racks, or regions to avoid single points of failure.
  • Fault Isolation: Design so that a failure in one component does not cascade to others (e.g., circuit breakers, bulkheads).
  • Automated Recovery: Use backups, snapshots, and automated failover to restore service quickly.
  • Idempotence: Ensure operations can be safely retried without side effects.

Redundancy and Failover

User
 |
 v
Load Balancer
 |         |
 v         v
App1     App2
  \       /
   v     v
  Database Cluster
   |   |   |
   v   v   v
 Rep1 Rep2 Rep3

Real-World Example:

  • Netflix replicates its services and data across multiple AWS regions to remain available during outages.
  • Google Spanner uses multi-region replication and TrueTime for global consistency.

Scalability

Scalability is the ability of a system to handle increased load by adding resources. It’s not just about handling more users, but also about maintaining performance as demand grows.

Types of Scaling:

  • Vertical Scaling: Add more power (CPU, RAM) to a single machine.
  • Horizontal Scaling: Add more machines to distribute the load (preferred for large systems).

Horizontal vs Vertical Scaling

Vertical Scaling:
  [App Server]
       |
   [Database]

Horizontal Scaling:
  [App1]   [App2]   [App3]
      \     |     /
      [Shared Database]

Scalability Metrics:

  • Throughput: Requests per second the system can handle.
  • Latency: Time taken to process a request (use percentiles, e.g., 95th, 99th).
  • Error Rate: Percentage of failed requests.

Scaling Approaches

ApproachProsCons
Vertical ScalingSimple, no code changesHardware limits, downtime
Horizontal ScalingNo single point of failureMore complex, needs sharding

Real-World Example:

  • Instagram uses Redis and Memcached to cache frequent queries and reduce pressure on its databases.
  • Amazon DynamoDB automatically shards data across many servers for massive scale.

Maintainability

Maintainability is about how easily a system can be changed, debugged, and extended. Systems must evolve as requirements change, so maintainability is crucial for long-term success.

Best Practices:

  • Modular Design: Break systems into independent, well-defined components.
  • Observability: Use logs, metrics, and traces to understand system behavior.
  • Automation: Automate testing, deployment, and recovery to reduce human error.
  • Documentation: Keep code and system documentation up to date.

Observability Stack

[Application]
   |   |   |
   |   |   +---> [Tracing]
   |   +-------> [Metrics]
   +-----------> [Logging]
         |   |   |
         +---+---+
             |
     [Monitoring Dashboard]

Maintainability Practices

PracticeBenefitExample Tool
Modular DesignEasier to change/testMicroservices, SOA
ObservabilityFaster debuggingPrometheus, Grafana
AutomationFewer manual errorsCI/CD, Ansible
DocumentationOnboarding, knowledge sharingWikis, READMEs

Real-World Example:

  • Google SRE teams invest in automation and observability to reduce toil and improve system clarity.
  • Facebook uses automated canary releases and rollbacks to maintain reliability during deployments.

Chapter 1 Key Takeaways

This chapter sets the stage for understanding modern data systems by introducing three essential attributes:

  • Reliability: The system should continue to function correctly even when hardware fails, software has bugs, or human operators make mistakes. Techniques like replication, fault tolerance, and idempotence help mitigate risks.

  • Scalability: A system should handle growthβ€”whether in data volume, traffic, or complexityβ€”without suffering performance degradation. This requires clear definitions of load and performance, plus design strategies like partitioning and caching.

  • Maintainability: Systems evolve, and maintainable systems make this evolution easy. Characteristics of maintainability include good observability, clear abstractions, and the ability to change behavior with minimal risk or downtime.

Together, these principles create a framework for evaluating and designing robust, efficient, and adaptable systems.


Chapter 2: Data Models and Query Languages

Relational Model Versus Document Model

Relational databases store data in structured tables with fixed schemas, while document databases (like MongoDB) use nested, schema-less JSON documents.

Relational Example:

-- Orders and Customers in SQL
SELECT orders.id, customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id;

Document Example:

{
  "order_id": 1001,
  "customer": {
    "id": "cust123",
    "name": "Alice"
  },
  "items": [
    {"product": "Widget", "qty": 2},
    {"product": "Gadget", "qty": 1}
  ]
}

Relational vs Document Model

Relational:
+-------------+      +---------------+
| Orders      |      | Customers     |
|-------------|      |---------------|
| id          |----->| id            |
| customer_id |      | name          |
| ...         |      | ...           |
+-------------+      +---------------+

Document:
+----------------------+
| Order Document       |
|----------------------|
| order_id             |
| customer:            |
|   id                 |
|   name               |
| items: [             |
|   {product, qty},    |
|   ...                |
| ]                    |
+----------------------+

Relational vs Document

FeatureRelational DBsDocument DBs
SchemaFixed, normalizedDynamic, flexible
QueriesDeclarative (SQL)Embedded (MongoQL)
RelationshipsJOINsEmbedded or referenced
ScalabilityManual shardingBuilt-in sharding
Use CaseOLTP, analyticsContent, catalogs

Real-World Example:

  • MongoDB is used for content management systems and catalogs where flexible schemas are needed.
  • PostgreSQL and MySQL are used for financial and transactional systems requiring strong consistency.

The Birth of NoSQL

NoSQL databases emerged to meet the needs of large-scale applications where traditional relational databases struggled:

  • Amazon Dynamo: key-value store with eventual consistency
  • Google Bigtable: wide-column store
  • MongoDB: document store with dynamic schemas

NoSQL systems prioritized horizontal scalability, high availability, and flexible schemas.


The Object-Relational Mismatch

Relational tables are flat, but application code often uses deeply nested objects. This mismatch leads to inefficiencies:

  • ORMs try to bridge the gap but can be leaky abstractions
  • Document databases naturally align with object-oriented code

Many-to-One and Many-to-Many Relationships

Relational databases use normalized tables and JOINs:

SELECT * FROM books
JOIN book_authors ON books.id = book_authors.book_id
JOIN authors ON book_authors.author_id = authors.id;

In document databases, you can embed data:

{
  "title": "DDIA",
  "authors": [
    {"id": 1, "name": "Kleppmann"},
    {"id": 2, "name": "Other"}
  ]
}

Use embedding for static data; reference for large or frequently updated sub-objects.


Are Document Databases Repeating History?

Hierarchical databases from the 1960s (like IMS) stored data in tree structures, which made querying rigid and brittle. Document databases risk similar pitfalls:

  • Efficient for fixed access patterns
  • Less composable and general-purpose than relational models

The relational model won due to its data independence and declarative power.


Relational Versus Document Databases Today

Relational and document databases each have strengths and trade-offs, and neither is a universal solution. Relational databases are built around normalization, which avoids redundancy and supports flexible, ad hoc queries using JOINs. This makes them ideal for applications where relationships between entities are complex or frequently queried in different ways (e.g., financial systems, analytics).

Document databases, on the other hand, store data as nested documents (often JSON), making them a natural fit for aggregate-oriented dataβ€”where you typically fetch or update an entire object at once (e.g., a blog post with comments, a user profile with settings). Embedding related data inside a document can improve read performance and reduce the need for joins, but can also lead to duplication and update anomalies if the same data appears in many places.

Embedding vs Referencing:

  • Embed data when it is accessed together and does not change independently (e.g., order items inside an order).
  • Reference data when it is large, shared, or updated independently (e.g., user profiles referenced from orders).

Limitations:

  • Document databases are less suited for queries that need to join across many collections or require flexible, ad hoc reporting. They work best when access patterns are well known and data can be grouped into aggregates.
  • Relational databases can be harder to scale horizontally, but their query flexibility and data integrity features remain unmatched for many use cases.

Modern Practice: Many organizations use both models: relational databases for core transactional data, and document or NoSQL stores for content, logs, or user-generated data. The choice depends on your application’s data access patterns, consistency needs, and scalability requirements.


Query Languages for Data

Query languages allow users to retrieve and manipulate data. The most common is SQL, but others exist for different models.

Declarative Queries on the Web

Declarative query languages let you specify what data you want, not how to get it. The database or engine figures out the best way to execute your request. This is in contrast to imperative code, where you specify the exact steps.

SQL (Relational):

SELECT * FROM users WHERE age > 30;

MongoDB (Document):

db.users.find({ age: { $gt: 30 } })

GraphQL (APIs):

query {
  user(id: "1") {
    name
    posts { title }
  }
}

SPARQL (RDF):

SELECT ?book ?author WHERE {
  ?book <http://purl.org/dc/elements/1.1/creator> ?author
}

Declarative queries are powerful because they allow the underlying system to optimize execution, use indexes, and parallelize work. This abstraction is a key reason for the success of SQL and similar languages.

MapReduce Querying

MapReduce is a programming model for processing large datasets in parallel across many machines. It was popularized by Google and implemented in open source by Hadoop.

  • Map step: Each input record is processed to produce key-value pairs.
  • Reduce step: All values for the same key are combined.

Example: Word Count

# Map step: emit (word, 1) for each word in a document
for word in document.split():
    emit(word, 1)

# Reduce step: sum all counts for each word
for word, counts in grouped_by_word:
    emit(word, sum(counts))

MapReduce is less expressive than SQL, but it scales to massive datasets and is fault-tolerant. Many modern data platforms (like Spark) build on these ideas, but offer more flexible APIs.

Query Languages

ModelLanguageExample SyntaxUse Case
RelationalSQLSELECT … FROM …OLTP, analytics
DocumentMongoQLdb.coll.find({…})Content, catalogs
GraphCypherMATCH (a)-[]->(b)Social, recommendations
RDFSPARQLSELECT … WHERE …Semantic web, research

Graph-Like Data Models

Graph databases represent entities as nodes and relationships as edges. This model is ideal for use cases like social networks, recommendation engines, and fraud detection, where relationships are first-class citizens and traversal is frequent.

Property Graphs

Property Graph is a graph model where nodes and edges can have arbitrary key-value properties. Used in Neo4j and similar systems.A property graph consists of:

  • Nodes: entities (e.g., users, products)
  • Edges: relationships (e.g., follows, purchased)
  • Properties: key-value pairs on both nodes and edges
(Alice)---FRIEND--->(Bob)---FRIEND--->(Carol)
  |
  +---PURCHASED--->(Widget)

This diagram shows a property graph: Alice is friends with Bob, who is friends with Carol. Alice also purchased a Widget. Nodes represent entities, and edges represent relationships. This structure is ideal for social networks or recommendations.

Property graphs are used when you need to model and query complex, interconnected data with rich relationships.

The Cypher Query Language

Cypher is a declarative graph query language used in Neo4j. It allows expressive pattern matching and traversal.

Example: Find Alice’s friends

MATCH (a:User)-[:FRIEND]->(b:User)
WHERE a.name = 'Alice'
RETURN b.name;

This Cypher query finds all users who are friends with Alice. Cypher’s pattern-matching syntax makes it easy to express traversals in a graph database.

Cypher is ideal for queries that follow paths or patterns in a graph, such as recommendations or social connections.

Graph Queries in SQL

SQL is not naturally suited for recursive or deep relationship traversals, but recursive Common Table Expressions (CTEs) can simulate it.

Example: Find all friends-of-friends

WITH RECURSIVE friends(id, depth) AS (
  SELECT id, 0 FROM users WHERE id = 1
  UNION
  SELECT u.id, depth + 1
  FROM users u
  JOIN friendships f ON u.id = f.friend_id
  JOIN friends fr ON f.user_id = fr.id
)
SELECT * FROM friends;

This SQL example uses a recursive CTE to find all friends-of-friends starting from user 1. While possible, such queries are more verbose and less efficient than in a native graph database.

This approach works for limited depth traversal but lacks graph-native optimizations.

Triple-Stores and SPARQL

Triple stores (e.g., Apache Jena, Virtuoso) are used in semantic web contexts. Data is stored as RDF triples: subject, predicate, object.

RDF (Resource Description Framework): A standard for representing data as triples (subject, predicate, object), widely used in the semantic web.

Example RDF triple:

<Bob> <knows> <Alice>

This triple expresses that Bob knows Alice, using the subject-predicate-object format common in semantic web and knowledge graph systems.

SPARQL is a query language for RDF data, designed to retrieve and manipulate data stored in Resource Description Framework format.

SPARQL query:

SELECT ?friend WHERE {
  <Bob> <knows> ?friend
}

This SPARQL query retrieves all friends of Bob from an RDF triple store.

Triple-stores and SPARQL are used for knowledge graphs, linked data, and applications that require flexible, schema-less relationships.

The Foundation: Datalog

Datalog is a declarative logic programming language, similar to Prolog, that is especially good for expressing recursive relationships in data.

Example: Ancestor relationship

ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

This Datalog program defines the ancestor relationship recursively: X is an ancestor of Y if X is a parent of Y, or if X is a parent of Z and Z is an ancestor of Y. Datalog is valued for its clarity and ability to express recursive relationships concisely.

Modern graph engines and some databases use Datalog-like languages for advanced queries.


Chapter 2 Key Takeaways

  • Relational, document, graph, and RDF models serve different needs
  • Choose based on data structure, query flexibility, and performance
  • No model is superior β€” all are trade-offs in modeling and scaling
FeatureRelational (SQL)Document (NoSQL)Graph (Property/RDF)
SchemaFixed, normalizedFlexible, schema-lessFlexible, typed labels
Query LanguageSQLMongoQL, etc.Cypher, SPARQL, Datalog
Relationship ModelingJOINsEmbedding/ReferenceNative edge traversal
Best Use CaseOLTP, financeCMS, catalogs, APIsSocial networks, graphs
ScalingVertical/horizontalHorizontalHorizontal
ComplexityMediumLowHigh
FlexibilityMediumHighVery High

Understanding data models deeply helps you choose the right tool and scale confidently.


Chapter 3: Storage and Retrieval

How do databases store data on disk and retrieve it when queried?

This chapter dives into the internals of storage enginesβ€”the part of a database that handles storing data to disk and reading it back. Understanding these fundamentals helps you choose the right database for your workload and debug performance issues.


Data Structures That Power Your Database

At the simplest level, a database could be just two bash functions:

#!/bin/bash

db_set () {
    echo "$1,$2" >> database
}

db_get () {
    grep "^$1," database | sed -e "s/^$1,//" | tail -n 1
}

This append-only log has excellent write performance (O(1)) but terrible read performance (O(n)). Real databases use sophisticated data structures to balance these trade-offs.

Hash Indexes

Hash indexes use in-memory hash maps to store byte offsets where values are stored on disk. This is the approach used by Bitcask (Riak’s default storage engine).

How Hash Indexes Work:

In-Memory Hash Map:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Key    β”‚   Offset   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ "user1"  β”‚     0      β”‚
β”‚ "user2"  β”‚    64      β”‚
β”‚ "user3"  β”‚   128      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Disk (Append-Only Log):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 0: user1,{id:1,name:"Alice"}        β”‚
β”‚ 64: user2,{id:2,name:"Bob"}         β”‚
β”‚ 128: user3,{id:3,name:"Charlie"}    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation Details:

Write Process:

  1. Append new record to end of log file
  2. Update hash map with new offset
  3. If updating existing key, just update offset (old data becomes dead)

Read Process:

  1. Look up key in hash map (O(1))
  2. Seek to offset on disk
  3. Read value

Compaction: Periodically merge log segments, keeping only the latest value for each key:

Before Compaction:
Segment 1: user1,v1 | user2,v1 | user1,v2
Segment 2: user3,v1 | user1,v3 | user2,v2

After Compaction:
Merged: user1,v3 | user2,v2 | user3,v1

Hash Index Trade-offs:

ProsCons
βœ… Fast writes (append-only)❌ Hash map must fit in memory
βœ… Fast reads (one disk seek)❌ No range queries
βœ… Simple crash recovery❌ Poor memory locality
βœ… Built-in compression via compaction❌ Hash collisions can degrade performance

Real-world usage: Riak (Bitcask), some Redis persistence modes

SSTables and LSM-Trees

Sorted String Tables (SSTables) improve upon hash indexes by keeping keys sorted within each segment. This enables range queries and more efficient merging.

SSTable Structure:

SSTable File:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Header: min_key=alice, max_key=zoe      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
│ Index: alice→0, bob→32, charlie→64      │
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Data:                                   β”‚
β”‚   alice,{age:25,city:"NYC"}             β”‚
β”‚   bob,{age:30,city:"SF"}                β”‚
β”‚   charlie,{age:28,city:"LA"}            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Log-Structured Merge Trees (LSM-Trees):

LSM-Trees use SSTables as building blocks in a multi-level architecture:

LSM-Tree Architecture:
                    Writes
                      ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ MemTable (Red-Black Tree in memory)    β”‚ ← Fast writes
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      ↓ flush (when full)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level 0 SSTables (on disk)             β”‚ ← May overlap
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                      ↓ compact
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level 1 SSTables (non-overlapping)     β”‚ ← Sorted runs
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                      ↓ compact
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Level 2+ SSTables (larger, sorted)     β”‚ ← Long-term storage
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LSM-Tree Operations:

Write Path:

  1. Write to Write-Ahead Log (WAL) for durability
  2. Insert into MemTable (in-memory sorted structure)
  3. When MemTable fills, flush to Level 0 as SSTable
  4. Background compaction merges SSTables between levels

Read Path:

  1. Check MemTable first
  2. Check Level 0 SSTables (newest to oldest)
  3. Check higher levels using sparse index
  4. Use Bloom filters to avoid unnecessary disk reads

Compaction Strategy:

  • Size-tiered: Merge SSTables of similar size
  • Leveled: Maintain non-overlapping ranges per level (used by LevelDB/RocksDB)

LSM-Tree Trade-offs:

AdvantagesDisadvantages
βœ… High write throughput❌ Read amplification (check multiple SSTables)
βœ… Good compression ratios❌ Write amplification (compaction overhead)
βœ… Range query support❌ Compaction can impact performance
βœ… Handles high write volumes❌ More complex than B-Trees

Real-world usage: Cassandra, HBase, LevelDB, RocksDB, Elasticsearch

B-Trees

B-Trees are the most widely used indexing structure, found in almost all relational databases and many non-relational ones.

B-Tree Structure:

B-Trees maintain balance by ensuring all leaf nodes are at the same depth:

B-Tree Example (branching factor = 3):

                    [    J    ]           ← Root (level 2)
                   /           \
              [  C   F  ]    [  P   S  ]  ← Internal nodes (level 1)
             /    |    \    /    |    \
        [A,B]  [D,E]  [G,H,I] [K,L,M] [Q,R] [T,U,V] ← Leaf nodes (level 0)

Key Properties:

  • Each node has n keys and n+1 child pointers
  • Keys within a node are sorted
  • All leaves are at the same level
  • Each node is typically one disk page (4KB-64KB)

B-Tree Operations:

Search Algorithm:

def search(node, key):
    if node.is_leaf():
        return node.find(key)
    
    i = 0
    while i < len(node.keys) and key > node.keys[i]:
        i += 1
    
    return search(node.children[i], key)

Insert with Node Splitting:

Before Insert (node full):
[A, C, E, G]

Insert D:
Split into: [A, C] and [E, G]
Promote: D to parent

Result:
     [D]
   /     \
[A, C]  [E, G]

Range Query:

SELECT * FROM users WHERE age BETWEEN 25 AND 35;
  1. Navigate to leaf containing age=25
  2. Scan leaves left-to-right until age>35
  3. Follow leaf-to-leaf pointers (siblings)

B-Tree Optimizations:

Copy-on-Write (MVCC): Instead of updating pages in-place, create new page versions:

Version 1: [A, B, C] 
              ↓ update B
Version 2: [A, B', C] (new page)

Write-Ahead Logging: Record changes before applying them:

WAL: "Update page 42: B β†’ B'"
Then: Apply change to page 42

Comparing B-Trees and LSM-Trees:

This is one of the most important trade-offs in database design:

Performance Characteristics:

Workload TypeB-TreesLSM-TreesWinner
Read-heavyFast (1-2 seeks)Slower (multiple SSTables)B-Trees
Write-heavySlower (random I/O)Fast (sequential writes)LSM-Trees
Range scansExcellentGoodB-Trees
Point lookupsVery goodGood (with Bloom filters)B-Trees

Resource Usage:

ResourceB-TreesLSM-Trees
Write amplification2x (WAL + page)5-10x (compaction)
Read amplification1x5-10x (multiple levels)
Space amplificationLowMedium (fragmentation during compaction)
CPU usageLowerHigher (compaction)

Operational Considerations:

B-Trees:

  • More predictable performance
  • Simpler to reason about
  • Better for OLTP workloads
  • Used by: PostgreSQL, MySQL, SQL Server, Oracle

LSM-Trees:

  • Better write throughput
  • More complex tuning (compaction strategies)
  • Better for analytics/logging workloads
  • Used by: Cassandra, HBase, ClickHouse, Elasticsearch

Other Indexing Structures:

R-Trees (Spatial Indexes): For geographic and multi-dimensional data:

Geographic Bounding Boxes:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ A     β”Œβ”€β”€β”€β”€β”€β”           β”‚
β”‚   β”Œβ”€β”€β”€β”‚  B  │───┐       β”‚
β”‚   β”‚   β””β”€β”€β”€β”€β”€β”˜   β”‚   C   β”‚
β”‚   β”‚      D      β”‚       β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Use cases: PostGIS, MongoDB geospatial queries, game engines

Full-text Search Indexes: Inverted indexes map terms to documents:

Term β†’ Document IDs
"database" β†’ [1, 3, 7, 12]
"storage" β†’ [1, 5, 8]
"engine" β†’ [3, 7, 9, 15]

Use cases: Elasticsearch, Solr, database FTS features

Memory-only Structures: Skip Lists: Probabilistic alternative to B-Trees

Level 3:     1 ──────────────→ 9
Level 2:     1 ─────→ 6 ─────→ 9
Level 1:     1 ──→ 4 β†’ 6 ──→ 8 β†’ 9
Level 0:     1 β†’ 2 β†’ 4 β†’ 6 β†’ 7 β†’ 8 β†’ 9

Use cases: Redis sorted sets, LevelDB MemTable


Transaction Processing or Analytics?

Modern applications typically need both Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) capabilities.

OLTP Characteristics

-- Typical OLTP queries (simple, fast)
INSERT INTO orders (customer_id, product_id, quantity) 
VALUES (12345, 'widget-A', 2);

UPDATE inventory SET quantity = quantity - 2 
WHERE product_id = 'widget-A';

SELECT * FROM customers WHERE customer_id = 12345;

Properties:

  • Small number of records per query
  • Random access patterns
  • Interactive response times (< 100ms)
  • High concurrency

OLAP Characteristics

-- Typical OLAP query (complex, analytical)
SELECT 
  region,
  product_category,
  SUM(revenue) as total_revenue,
  AVG(customer_satisfaction) as avg_satisfaction
FROM sales_fact sf
JOIN dim_product dp ON sf.product_id = dp.product_id
JOIN dim_customer dc ON sf.customer_id = dc.customer_id
JOIN dim_time dt ON sf.date_id = dt.date_id
WHERE dt.year = 2024 AND dt.quarter = 'Q1'
GROUP BY region, product_category
ORDER BY total_revenue DESC;

Properties:

  • Scan millions of records
  • Sequential access patterns
  • Batch processing (seconds to hours)
  • Read-heavy workload

OLTP vs OLAP Comparison

AspectOLTPOLAP
Main patternRead/write individual recordsAggregate over large datasets
Query complexitySimpleComplex with JOINs
Dataset sizeGB to TBTB to PB
BottleneckRandom I/O, lock contentionBandwidth, CPU
Data freshnessReal-timeMinutes to hours delay
ExamplesE-commerce checkout, bankingBusiness intelligence, reporting

Data Warehousing

Since OLTP and OLAP have different requirements, many organizations use separate systems:

OLTP Systems β†’ ETL Pipeline β†’ Data Warehouse (OLAP)
     ↓              ↓              ↓
  PostgreSQL    Apache Airflow    Snowflake
  MySQL         dbt               BigQuery  
  MongoDB       Spark             Redshift

ETL vs ELT:

Extract-Transform-Load (ETL):

Source DB β†’ Transform (clean, aggregate) β†’ Load into warehouse

Extract-Load-Transform (ELT):

Source DB β†’ Load raw data β†’ Transform in warehouse

Modern cloud warehouses favor ELT because they can handle transformations at scale.

Stars and Snowflakes: Schemas for Analytics

Fact and Dimension Tables:

  • Fact table: Contains measurements/metrics (sales amount, quantity)
  • Dimension tables: Contain descriptive attributes (customer info, product details)

Star Schema:

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ dim_product β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ dim_customerβ”‚    β”‚   dim_time  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚                  β”‚
        β”Œβ”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”
        β”‚      sales_fact       β”‚ ← Central fact table
        β””β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜
           β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
    β”‚ dim_store   β”‚    β”‚ dim_promotionβ”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Snowflake Schema: Like star schema but dimension tables are normalized into sub-dimensions:

dim_product β†’ dim_category β†’ dim_department

Trade-offs:

  • Star: Simpler queries, some redundancy
  • Snowflake: Normalized, more complex JOINs

Column-Oriented Storage

Row-oriented storage stores entire records together, while column-oriented storage groups values from the same column.

Row-oriented (traditional):

Disk blocks:
Block 1: [Row1: ID=1,Name="Alice",Age=25,City="NYC"]
         [Row2: ID=2,Name="Bob",Age=30,City="SF"]
Block 2: [Row3: ID=3,Name="Carol",Age=28,City="LA"]
         [Row4: ID=4,Name="Dave",Age=35,City="NYC"]

Column-oriented:

Disk blocks:
ID column:   [1, 2, 3, 4, 5, 6, ...]
Name column: ["Alice", "Bob", "Carol", "Dave", ...]
Age column:  [25, 30, 28, 35, 42, 29, ...]
City column: ["NYC", "SF", "LA", "NYC", "SF", ...]

Column Compression

Similar values compress extremely well:

Before compression:
City: ["NYC", "SF", "LA", "NYC", "SF", "NYC", "LA", "SF"]

Run-length encoding:
NYCΓ—1, SFΓ—1, LAΓ—1, NYCΓ—1, SFΓ—1, NYCΓ—1, LAΓ—1, SFΓ—1

Dictionary encoding:
Dictionary: {0: "NYC", 1: "SF", 2: "LA"}
Values: [0, 1, 2, 0, 1, 0, 2, 1]

Bit-packed:
"NYC": 10100100 (positions where NYC appears)
"SF":  01010001 
"LA":  00101010

Sort Order in Column Storage

Unlike row stores, column stores can use different sort orders for different queries:

Sorted by (date, product_id):
Date:       [2024-01-01, 2024-01-01, 2024-01-02, ...]
Product:    [widget-A, widget-B, widget-A, ...]
Revenue:    [100, 150, 120, ...]

Alternative sort by (product_id, date):
Product:    [widget-A, widget-A, widget-B, ...]  
Date:       [2024-01-01, 2024-01-02, 2024-01-01, ...]
Revenue:    [100, 120, 150, ...]

Benefits of sorted order:

  • Enables efficient range queries
  • Improves compression (similar values grouped together)
  • Allows for early termination in aggregations

Challenge: Different queries benefit from different sort orders.

Solutions:

  • Store multiple sorted copies
  • Use techniques like late materialization
  • Maintain separate indexes for different access patterns

Writing to Column-Oriented Storage

Column stores optimize for reads, making writes more challenging:

LSM-Tree approach (used by ClickHouse):

  1. Buffer writes in row-oriented format
  2. Periodically convert to columnar format
  3. Merge with existing column files

Batch loading:

ETL Pipeline β†’ Staging (row format) β†’ Batch convert β†’ Column store

Most column stores are optimized for append-only workloads rather than transactional updates.

Aggregation: Data Cubes and Materialized Views

Materialized View: Pre-computed query results stored as tables:

-- Create materialized view
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT 
  DATE_TRUNC('month', order_date) as month,
  product_category,
  SUM(revenue) as total_revenue
FROM sales_fact sf
JOIN dim_product dp ON sf.product_id = dp.product_id
GROUP BY month, product_category;

Data Cube (OLAP Cube): Pre-aggregated data across multiple dimensions:

3D Cube: Time Γ— Product Γ— Region
- Total sales by month
- Total sales by product  
- Total sales by region
- Total sales by month + product
- Total sales by month + region
- Total sales by product + region
- Total sales by month + product + region

Trade-offs:

  • βœ… Lightning-fast queries for pre-computed aggregates
  • ❌ Storage overhead (exponential with dimensions)
  • ❌ Complex to maintain with data updates

Chapter 3 Key Takeaways

Storage Engine Trade-offs:

  • Hash indexes: Fast writes and reads, but no range queries and limited by memory
  • LSM-Trees: Excellent for write-heavy workloads, good compression
  • B-Trees: Superior for read-heavy workloads, excellent range queries

OLTP vs OLAP:

  • OLTP: Transaction processing, normalized schemas, real-time operations
  • OLAP: Analytics, star/snowflake schemas, batch processing with complex aggregations

Column Storage Advantages:

  • Dramatic compression improvements
  • Vectorized processing capabilities
  • Excellent for analytical queries
  • Challenges with writes and different sort orders

The Right Tool for the Job:

  • Choose storage engines based on your workload characteristics
  • Consider read/write ratios, query patterns, and consistency requirements
  • Modern systems often combine multiple storage technologies

Chapter 4: Encoding and Evolution

How do you handle data format changes when applications and databases are running in production?

This chapter addresses one of the most practical challenges in system design: evolving your data formats without breaking existing services. Whether you’re adding new fields to a JSON API or migrating between database schemas, understanding encoding and evolution is crucial for building resilient systems.

Formats for Encoding Data

Language-Specific Formats

Many programming languages have built-in support for encoding in-memory objects:

// Java serialization
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutputStream out = new ObjectOutputStream(bos);
out.writeObject(myObject);
byte[] serialized = bos.toByteArray();
# Python pickle
import pickle
serialized = pickle.dumps(my_object)
my_object = pickle.loads(serialized)

Problems with language-specific formats:

  • ❌ Tied to a particular programming language
  • ❌ Poor performance and bloated encoding
  • ❌ Security vulnerabilities (arbitrary code execution)
  • ❌ No forward/backward compatibility guarantees

JSON, XML, and Binary Variants

JSON Example:

{
  "userName": "Martin",
  "favoriteNumber": 1337,
  "interests": ["daydreaming", "hacking"]
}

XML Equivalent:

<Person>
  <userName>Martin</userName>
  <favoriteNumber>1337</favoriteNumber>
  <interests>
    <li>daydreaming</li>
    <li>hacking</li>
  </interests>
</Person>

JSON/XML Trade-offs:

AspectJSONXMLBinary Formats
Human readableβœ…βœ…βŒ
Compact⚠️ Medium❌ Verboseβœ…
Schema supportβŒβœ…βœ…
Numbers precision❌ Ambiguous❌ Text-basedβœ… Typed
Unicode supportβœ…βœ…βœ…
Parsing speed⚠️ Medium❌ Slowβœ… Fast

JSON/XML Problems:

  • Number ambiguity: No distinction between integers and floats
  • Large size: Repeated field names, verbose syntax
  • No schema: Can’t enforce data structure

Binary Encoding:

MessagePack example (binary JSON):

JSON (81 bytes):
{"userName": "Martin", "favoriteNumber": 1337, "interests": ["daydreaming", "hacking"]}

MessagePack (66 bytes):
83 a8 userName a6 Martin ad favoriteNumber cd 05 39 a9 interests 92 ab daydreaming a7 hacking

Saves ~18% space by eliminating repeated field names and using binary encoding for numbers.

Thrift and Protocol Buffers

Schema-driven binary encoding formats that generate code for multiple languages.

Protocol Buffers (Protobuf) Example:

Schema definition (.proto file):

syntax = "proto3";

message Person {
  string user_name = 1;
  int64 favorite_number = 2;
  repeated string interests = 3;
}

Generated code usage:

# Python
person = Person()
person.user_name = "Martin"
person.favorite_number = 1337
person.interests.extend(["daydreaming", "hacking"])

# Serialize to binary
binary_data = person.SerializeToString()

# Deserialize
person2 = Person()
person2.ParseFromString(binary_data)

Binary Encoding Details:

Protobuf binary format:

Field 1 (user_name): tag=1, type=string
Field 2 (favorite_number): tag=2, type=varint  
Field 3 (interests): tag=3, type=string (repeated)

Binary: 0a 06 4d 61 72 74 69 6e 10 b9 0a 1a 0b 64 61 79 64 72 65 61 6d 69 6e 67 1a 07 68 61 63 6b 69 6e 67

Key insight: Field names are not storedβ€”only field tags (numbers).

Thrift vs Protobuf Comparison:

FeatureThriftProtocol Buffers
Binary formatsBinaryProtocol, CompactProtocolSingle format
Schema evolutionGoodExcellent
Language supportFewer languagesMany languages
RPC frameworkBuilt-ingRPC (separate)
Optional/requiredExplicitAll optional (proto3)
Default valuesSpecified in schemaType defaults

Avro

Avro takes a different approach: it embeds the schema with the data, enabling better schema evolution.

Avro Schema Example:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {"name": "userName", "type": "string"},
    {"name": "favoriteNumber", "type": ["null", "long"], "default": null},
    {"name": "interests", "type": {"type": "array", "items": "string"}}
  ]
}

Writer vs Reader Schema:

Key innovation: Avro supports different schemas for writing and reading:

Writer Schema (v1):           Reader Schema (v2):
- userName (string)           - userName (string)  
- favoriteNumber (long)       - favoriteNumber (long)
                             - email (string, default: null)

The Avro library resolves differences between schemas automatically.

Schema Evolution in Avro:

Forward compatibility (new schema can read old data):

// Old schema
{"name": "favoriteNumber", "type": "long"}

// New schema (more permissive)
{"name": "favoriteNumber", "type": ["null", "long"], "default": null}

Backward compatibility (old schema can read new data):

// New schema
{"name": "email", "type": "string", "default": ""}

// Old schema ignores unknown fields

Avro vs Protobuf/Thrift:

FeatureAvroProtobuf/Thrift
Schema storageWith dataSeparately
Field identificationField namesTag numbers
Code generationOptionalRequired
Schema evolutionVery flexibleGood
CompactnessHighHigh
Dynamic typingβœ…βŒ

The Merits of Schemas

Advantages of Schema-Based Formats:

  1. Compactness: Binary encoding with no field names
  2. Documentation: Schema serves as always up-to-date documentation
  3. Validation: Catch errors at write time, not read time
  4. Code generation: Generate statically typed data structures
  5. Evolution: Maintain compatibility across versions

Schema Evolution Patterns:

Safe changes:

  • Add optional fields with defaults
  • Remove optional fields
  • Change field order (if using field names/tags)

Unsafe changes:

  • Change field types (e.g., int32 β†’ string)
  • Rename fields (without aliases)
  • Remove required fields
  • Add required fields without defaults

Example evolution sequence:

v1: {id, name}
v2: {id, name, email?}        # Add optional email
v3: {id, name, email?, age?}  # Add optional age  
v4: {id, name, email, age?}   # Make email required (unsafe!)

Modes of Dataflow

1. Dataflow Through Databases

The problem: Applications change faster than databases.

Timeline:
T1: App v1 writes data using schema v1
T2: App v2 deployed, can read v1 data?
T3: App v2 writes data using schema v2  
T4: App v1 still running, can read v2 data?

Solution strategies:

  • Migrations: Transform all existing data to new schema
  • Multi-version: Support reading multiple schema versions
  • Additive changes: Only add optional fields

Example database evolution:

-- v1 schema
CREATE TABLE users (id INT, name VARCHAR(50));

-- v2 migration  
ALTER TABLE users ADD COLUMN email VARCHAR(100) DEFAULT NULL;

-- v3 migration
ALTER TABLE users ADD COLUMN preferences JSON DEFAULT '{}';

2. Dataflow Through Services: REST and RPC

Service communication patterns:

REST (JSON over HTTP):
GET /api/users/123
Content-Type: application/json

{
  "id": 123,
  "name": "Alice",
  "email": "alice@example.com"
}
RPC (binary protocol):
UserService.getUser(user_id=123) β†’ User object

Web Services Comparison:

AspectRESTSOAPgRPC
ProtocolHTTP/JSONHTTP/XMLHTTP/2 + Protobuf
SchemaInformal/OpenAPIWSDL.proto files
Code generationOptionalRequiredGenerated stubs
PerformanceMediumLowHigh
DebuggingEasyHardMedium
Browser supportNativeComplexLimited

Remote Procedure Calls (RPC):

RPC tries to make network calls look like local function calls:

# Local call
user = get_user(123)

# RPC call (looks the same!)  
user = user_service.get_user(123)

Problems with RPC:

  • Network failures: Local calls don’t fail randomly
  • Latency: Network calls are ~1M times slower
  • Serialization: Objects don’t always serialize cleanly
  • Versioning: Harder to evolve than local APIs

RPC frameworks:

  • gRPC (Google): High performance, HTTP/2, streaming
  • Thrift: Cross-language, used at Facebook/Meta
  • Finagle (Twitter): Scala-based, circuit breakers
  • JSON-RPC: Simple, HTTP-based

Data Encoding and Evolution for RPC:

Versioning strategies:

  1. URL versioning:
/api/v1/users/123
/api/v2/users/123
  1. Header versioning:
Accept: application/vnd.api+json;version=2
  1. Content negotiation:
Accept: application/vnd.user.v2+json

Backward/forward compatibility:

  • Old clients must work with new servers
  • New clients should work with old servers
  • Use optional fields and sensible defaults

3. Message-Passing Dataflow

Asynchronous message passing combines benefits of RPC and databases:

Producer β†’ Message Broker β†’ Consumer
         ↑                 ↓
   (fire and forget)   (process later)

Message Brokers:

Popular message brokers:

  • Apache Kafka: High-throughput, persistent logs
  • RabbitMQ: Flexible routing, AMQP protocol
  • Amazon SQS: Managed queue service
  • Redis Pub/Sub: In-memory, simple

Message Broker Benefits:

BenefitDescription
ReliabilityBroker acts as buffer against outages
DecouplingProducers/consumers evolve independently
FlexibilityOne message β†’ multiple consumers
ScalabilityHorizontal scaling of consumers

Actor Model:

Actors are computational entities that communicate via messages:

Actor A ──message──→ Actor B
        ←─responseβ”€β”˜

Actor properties:

  • Encapsulated state (no shared memory)
  • Asynchronous message passing
  • Location transparency (local or remote)

Frameworks: Akka (JVM), Orleans (.NET), Erlang/Elixir

Message Encoding in Distributed Actors:

Evolution challenges:

  • Rolling deployments mean mixed actor versions
  • Message formats must be backward/forward compatible
  • Consider using schemas (Protobuf, Avro) for actor messages
%% Erlang actor message evolution
%% v1 message
{user_created, UserId, Name}

%% v2 message (backward compatible)
{user_created, UserId, Name, Email}

Chapter 4 Key Takeaways

Encoding Format Selection:

  • Language-specific formats: Avoid for cross-language compatibility
  • JSON/XML: Human-readable but verbose and ambiguous for numbers
  • Binary formats (Protobuf, Thrift, Avro): Compact, fast, schema-driven

Schema Evolution Strategies:

  • Forward compatibility: New code can read old data
  • Backward compatibility: Old code can read new data
  • Best practices: Add optional fields, use defaults, avoid breaking changes

Dataflow Patterns:

  • Database dataflow: Handle schema migrations carefully, support multiple versions
  • Service dataflow: Use versioning strategies (URL, headers, content negotiation)
  • Message-passing: Enables loose coupling, supports schema evolution through brokers

The Evolution Challenge: Schema evolution is not just a technical problemβ€”it’s an organizational one. Plan for change from day one, use schemas that support evolution, and design systems that can gracefully handle mixed versions during deployments.


Part I Summary

Key takeaways from Part I:

  1. Reliability, Scalability, Maintainability are the fundamental goals of good system design
  2. Data models (relational, document, graph) shape how you think about problems
  3. Storage engines (B-Trees vs LSM-Trees) have different performance characteristics
  4. Encoding formats and schema evolution enable systems to change over time
  5. Dataflow patterns (databases, services, messages) each have different consistency and performance properties

Technology selection framework:

  • Understand your workload characteristics (read/write ratio, consistency needs)
  • Choose data models that match your access patterns
  • Select storage engines based on performance requirements
  • Plan for schema evolution from day one
  • Design dataflow to match your consistency and availability needs

Looking ahead to Part II: The foundations covered here prepare you for distributed systems challenges: replication, partitioning, transactions, and the difficulties of maintaining consistency across multiple machines.

β€œThe future is already hereβ€”it’s just not evenly distributed.” Understanding these foundations helps you build systems that can adapt and scale as requirements change.

Systems Design Data Engineering Backend DDIA