THE ARCHITECTURAL MISMATCH
Why Legacy Systems Fail at Scale
Modern databases are marvels of engineering, but they were designed to solve a different problem. Forcing immutable, hyperscale data into these systems creates a fundamental architectural conflict that is the source of all performance and cost issues.
The Two-Engine Problem
Every traditional database is comprised of two distinct parts: a Storage Engine that physically reads and writes data, and a Data Processing Engine that gives the database its "type" (SQL, Vector, Graph, etc.).
Crucially, these two engines are locked together, sharing the same memory space. This tight coupling is by design, as it's necessary for the complex transactional logic (ACID compliance, locking) that legacy use cases require. However, for simple, high-throughput reads on immutable data, this shared architecture becomes a massive bottleneck. The Data Processing Engine dictates how data is read, forcing an inefficient, multi-step process for every query.
The Multi-Round SerDes Tax
This locked architecture creates a hidden, multi-round tax on every query, known as Serialization/Deserialization (SerDes).
Round 1: Deserializing the File
First, the system must read and deserialize the entire data file (e.g., Parquet, ORC) from storage into memory. This is a brute-force operation that consumes enormous CPU and memory resources, regardless of how few records you actually need.
Round 2: Deserializing the Records
Once the file is in memory, the Data Processing Engine must then deserialize the individual records within it to apply its logic (e.g., the `WHERE` clause in a SQL query). Only after these two costly rounds of SerDes can the system finally filter the data. This two-step process is the primary driver of cost and latency in modern data platforms.