- Official link: https://15445.courses.cs.cmu.edu/fall2023/project1/
- Time spent: 11 days
- Leaderboard: 40 (2024-10-16)
This project consists of three subtasks: implementing the LRU algorithm, the Disk Scheduler, and a Buffer Pool. The first two tasks took five days, and the last one took another five days. Overall, the main challenge was understanding the algorithms and class relationships—the function comments are quite helpful, and if you just follow the comments, you’ll definitely pass SampleTest
.
First is the LRU-K Replacer, which requires implementing an LRU algorithm, but only compares the time of the K-th most recent access; the element with the earliest K-th access is evicted. If all elements have fewer than K accesses, the one with the earliest last access is evicted. This algorithm uses a doubly linked list to store each node’s access history and a hash table to map frame_id
to nodes. Once you understand this, implementation is straightforward.
The second task is DiskScheduler
. This part is relatively easy—DiskScheduler
uses a worker thread to process read/write requests serially, with a queue to store requests. This section taught me how to use std::promise
.
The last part, of course, is BufferPoolManager
, which took me several days 🙈. The reason was that I didn’t understand what Page
and Frame
actually were, or what Page* pages_;
was storing. I managed to score 74 on Gradescore without understanding what frame_id
was—pretty ridiculous 😂. To clarify: a Frame is simply the location in memory where a Page is stored; in code, the index of the Page* pages_
array is the frame_id
, and that’s it. The Page itself can be anywhere—on disk or in memory. If you want to load a Page into memory, you store it in one of the slots in the pages_
array—in other words, you select a frame in memory and place the page there. You can see this in the BufferPoolManager
constructor when initializing free_list_
: the maximum frame_id
is pool_size_ - 1
, matching the size of pages_
in BufferPoolManager
. Once you understand this, implementation is much easier if you follow the comments.
Leaderboard
I made some simple optimizations recently, which bumped my leaderboard rank from 497 to 40. Here’s a record of what I did:
- 2024-07-21 #Rank 497 - No optimization
- 2024-08-14 #Rank 267 - optim evict: The original
LRUKLNode
could record history, but had no fixed size. I added anAddHistory
method; when adding a history record, if the existing records exceedk_
, Ipop_back
the last one. This way, when searching for the K-th history record, you can just usenode.history_.back()
. - 2024-08-31 #Rank 35 - add multiple io queue: Previously,
DiskScheduler
only had a singlerequest_queue_
, which meant the consumer thread blocked the queue after each request. I splitrequest_queue_
intobucket
queues, each with its own lock. When enqueuing a request, the queue and lock index ispage_id % bucket
. By reducing lock granularity, my rank improved significantly.
Comments
预览: