LSM Tree NoSQL database - ByteByteGo

![](http://www.youtube.com/watch?v=I6jB0nM9SKU) This video explains the Log-Structured Merge (LSM) tree, a data structure used in NoSQL databases like Cassandra to achieve fast write speeds. Here's a breakdown of the key concepts: - **Comparison with B-trees:** Unlike relational databases that often use B-trees (optimized for reads but slower for updates due to random I/O), LSM trees prioritize write speed [00:41](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=41). - **Writing Data:** Incoming writes are first stored in an in-memory table called a memtable, which is sorted by key [01:09](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=69). When the memtable is full, it's written to disk as an immutable Sorted String Table (SSTable) using fast, sequential I/O [01:25](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=85), [01:33](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=93). - **Updates and Deletions:** Updates create new entries in the latest SSTable rather than overwriting old ones [01:59](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=119). Deletions are marked with a "tombstone" in the newest SSTable [02:18](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=138). - **Reading Data:** To find data, the system checks the memtable first, then the most recent SSTable, and continues through older SSTables. Searches within SSTables are efficient because they are sorted [02:41](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=161), [02:55](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=175). - **Compaction:** Over time, numerous SSTables can slow down reads and consume space with outdated entries. A background process merges and compacts these SSTables, removing old data and reducing the number of files to check for reads [03:05](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=185), [03:30](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=210). This is similar to the merge sort algorithm and organizes SSTables into levels, forming the "tree" structure [03:47](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=227), [04:44](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=284). - **Compaction Strategies:** Different strategies exist, such as size-tier compaction (better for writes) and leveled compaction (better for reads). Poor tuning of compaction can hurt performance [04:51](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=291), [05:27](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=327). - **Optimizations for Reads:** To improve read performance, production systems use techniques like summary tables (to skip unnecessary disk block searches) and Bloom filters (to quickly check if a key might exist in a level, reducing I/O for non-existent keys) [06:09](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=369), [06:30](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=390). - **Conclusion:** LSM tree-based NoSQL databases can achieve high write throughput with careful tuning, particularly of the compaction process [07:00](http://www.youtube.com/watch?v=I6jB0nM9SKU&t=420).