This repository contains a high-performance, custom-built Two-Level Segregated Fit (TLSF) memory allocator implementation written in C++. The TLSF algorithm is optimized for low-latency, deterministic memory management, making it ideal for real-time and embedded systems.
- Design Overview
- Core Operations
- Key Features
- Performance Metrics
- Prerequisites
- Build Instructions
- Usage Example
- License
- Acknowledgements
The TLSF allocator manages a pre-allocated memory pool, organizing it into a series of blocks. Each block is prefixed with a header and suffixed with a footer to store metadata such as size and allocation status. The memory pool itself is bounded by unallocatable sentinel blocks at the beginning and end, which act as boundary markers.
The entire memory pool is defined by special blocks at the beginning and end, which are not allocatable. These act as sentinel blocks or boundary markers for the allocator.
The diagram above shows the structure of the start-of-pool marker.
The diagram above shows the structure of the end-of-pool marker.
When memory is requested, the allocator finds or creates a block of the required size. This block is what the user receives a pointer to. A typical allocated block has the following structure:
- Tlsf Header & Footer: These contain critical metadata about the block, such as its size and state (allocated or free).
- User Area: This is the actual memory that the user's application can use.
- Padding: Padding is often added to ensure that the header, footer, and user area are correctly aligned in memory. This is crucial for performance and preventing issues on certain architectures.
The core of the TLSF allocator's speed is its O(1) lookup system, which relies on a two-level hierarchy of bitmaps and free lists. The system is based on the find-first-set (FFS) operation, which efficiently locates the first set bit in a bitmask.
To quickly find a free block of a suitable size, the TLSF allocator uses two bitmaps. The Bitmaps are at the core of the allocator's performance. The indices we get from them are used to get the allocatable free block in O(1) time.
-
First-Level (FL) Bitmap: A bit is set if at least one free block exists in the corresponding size class.
-
Second-Level (SL) Bitmap: For each first-level group, a second bitmap tracks free blocks in more granular subclasses.
By using a find-first-set (FFS) operation on these bitmaps, the allocator can locate the appropriate free list in constant time, allowing for extremely fast allocation and deallocation.
The actual free blocks are maintained within doubly linked lists, and the head of each list is stored in a 2D array of pointers. The indices derived from the FL and SL bitmaps are used to directly access the head of the correct free list. The indices (FL and SL) are calculated such that the free block at the head of the chosen doubly linked list is guaranteed to be equal to or greater than the required size, enabling a direct, O(1) constant-time block selection without the need to traverse the list.
-
Calculate required (FL,SL) indices from the requested size.
-
Use the bitmaps to find the smallest available free block.
-
Remove the block from its free list.
-
Mark it as allocated.
-
Split the block if it's much larger than the request, returning the remainder to the free lists.
-
Mark the block as free.
-
Check adjacent blocks for a free state.
-
If an adjacent block is free, coalesce (merge) them into one larger free block to reduce external fragmentation.
-
Insert the new, possibly larger, block into the appropriate free list based on its new size.
- Fast Allocation: The two-level segregated list structure allows for quick searching of free blocks.
- Hierarchical Free Lists: Uses a two-level bitmap and a 2D array of pointers to quickly locate free blocks based on their size.
- Minimal Fragmentation: Adjacent free blocks are automatically coalesced (merged) to reduce external fragmentation.
- Custom Memory Management: Handles memory blocks with custom headers and footers to manage block metadata, including size and status.
Benchmarking was done using Google Benchmark and you can see the code used to bench mark in benchmark folder.
Comparing this TLSF allocator against the standard library malloc() (the system allocator) is not a fair comparison due to fundamental architectural differences:
-
Scope of Management: This TLSF implementation manages its own pre-allocated memory pool and avoids all operating system overhead (like kernel system calls) during run-time allocation.
-
System Interaction: The standard malloc() is designed for general-purpose high throughput and often has non-deterministic latency. While most small allocations are handled rapidly in user-space,
malloc()must periodically rely on system calls (such asbrk/sbrkormmap) to acquire memory from the OS. These kernel operations cause significant, unpredictable overhead that a pre-allocated pool-based allocator is designed to eliminate. -
Conclusion: If this TLSF allocator were forced to make system calls for some memory request, its performance would also be substantially slower. The benefit of this allocator lies in its O(1) low-latency guarantees within its managed pool.
A fair comparison would be comparing it against other TLSF allocators or similar dedicated pool-based, low-latency allocators.
The performance metrics above were generated on GitHub-hosted runners (ubuntu-latest). These environments use shared, virtualized hardware and are subject to varying background loads.
Therefore, these results are not suitable for measuring absolute performance. They should only be used to monitor performance trends and detect significant regressions (slowdowns) introduced by new code changes.
If you want more accurate benchmarking you can clone this repo and build it and run the benchmarking on your machine and generate the graphs. I will add the instructions to run the benchmarking and to generate the performance graph later.
To build and run this project, you will need the following tools and dependencies installed on your system.
- C++ Toolchain: A modern C++ compiler supporting the C++17 standard or newer (e.g., GCC, Clang, or MSVC).
- Build System: CMake (Minimum Required Version: 3.14) is used as the meta-build system to generate native build files.
To demonstrate the allocator's functionality and correctness, the entire test suite can be executed locally using CTest.
- Clone the Repository:
git clone https://github.com/aka411/tlsf-memory-allocator.git cd tlsf-memory-allocator - Build the Project: (This compiles the allocator, the test executable, and the demo executable)
cmake -B build cmake --build build
- Run All Tests: (This executes the tests using CTest)
All tests must pass for a successful validation.
cd build ctest --verbose
To quickly see the allocator's public API in action and verify its performance claims (e.g.,
- Ensure you have completed the Build the Project step above.
- Execute the Demo:
The demo will print performance metrics to the console.
./build/bin/demo-app
Here is a simple example demonstrating how to initialize the allocator and use it.
#include "tlsf.h"
#include <iostream>
int main()
{
TlsfAllocator tlsfAllocator(1024);// Get 1KB of memory for the pool
void* ptr = tlsfAllocator.allocate(200); // get a pointer to block of size 200 bytes
if(ptr != nullptr)
{
std::cout<< "Successfully allocated 200 bytes of memory"<<std::endl;
tlsfAllocator.deallocate(ptr);
}
else
{
std::cout<< "Failed to allocat 200 bytes of memory"<<std::endl;
}
std::cout << "Press Enter key to Exit" << std::endl;
std::cin.get();// wait for user to press enter to avoid exiting fast
return 0;
}
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.
Thanks for these awesome resources that were used during the development.
-
TLSF: a New Dynamic Memory Allocator for Real-Time Systems: This is the original research paper by M. Masmano; I. Ripoll; A. Crespo; J. Real that introduced the Two-Level Segregated Fit algorithm, providing the foundational concepts for O(1) constant-time memory management.
-
Two-Level Segregated Fit Memory Allocator (Rice Fields): Provided a detailed explanation and implementation guidance on the two-level binning system, including the linear-log bucketing approach.




