Malloc Implementation

  • Head manager used by the program

  • Library of functions used by the C programming language for dynamic memory allocation

  • Interface to sbrk() and mmap()

  • If we need more memory from the kernel we use sbrk() or mmap()

    • Break sbrk() and mmap() memory allocation into smaller chuncks

  • Easily ported to other languages


  • Each malloc implementation contains functions such as:

    • malloc(): Allocates a chunk of memory

    • realloc(): Decreases or increases amount of space allocated

    • free(): Frees the previously allocated chuck

    • calloc(): Initializes data as all o's

      • Specify an array of N elements, each with a defined size


  • Doug Lea's malloc implementation

  • Used by many Linux variants as the primary memory allocator chunks from freelists

  • The unlink() function removes chunks from a doubly linked list

  • The frontlink function inserts new chunks into a doubly linked list

  • unlink() is called by free() when an adjacent chunk is also unused:

    • Perform coalescing

    • "Holding Hands"

    • Then frontlink() is called to reinsert


What is paging?

  • Process of allowing indirect memory mapping

  • Linear addressing is mapped into fixed-sized pages:

    • Most commonly 4KB

    • Pages mapped into page tables with up to 1,024 entries

      • Page tables mapped into page directories

      • Page directories can hold p to 1,024 page tables

    • Linear address maps to page directory, table, and page offset

    • Translation Lookaside Buffers (TLB) hold frequently used page tables and entries

  • Page file may not be needed or used on 64-bit systems

  • Context switching and the Process Control (PCB)

    • Register values for each process are stored in the PCB and loaded during context switching.

Paging vs Swap space

  • Non-recently accessed pages are copied to disk

  • Page Faults

    • Occurs when the system attempts to access an address of memory at has been paged to disk

  • Relatively time-consuming

  • Swapping an entire process

  • Windows Memory Optimization

    • Pages out unused memory

#define unlink(P, BK, FD){\
FD = P->fd;\
/*FD = the pointer stored at chunck +8 */
BK = P -> bk; \
/* BK = the pointer stored at chunck + 12 */
FD->bk = BK; \
/* At FD + 12 write BK to set new bk pointer */
BK -> fd = FD; \
/* At BK +8 write FD to set new fd pointer */

Object Files

These are the sections we need to care about inside an elf binary, there are more sections than this you can see them with tools like readelf and objdump

  • Code Segment:

    • Fixed-size segment containing code, the compiled executable code segment, it hold instructions, it should not be writeable but it needs to be readable and executable.

  • Data Segments:

  • Fixed-sized segment containing initialized variables, it should be readable not writeable or executable.

  • BSS Segment:

    • Fixed-sized segment containing uninitialized variables, has to be writeable because initializes all of your variables

  • Heap:

    • Segment for dynamic and/or large memory requests, it needs to be writeable but it should not be executable because that will give potential for shellcode execution

  • Stack Segment:

    • Should be far away from the heap because they don't play nicely together because they're both dynamic and both need to be able to grow so we don't want them to collide,

    • Procedure stack for the process, it should be writeable but it should not be executable because that's why we have things like Data Execution Protection (DEP)


  • There are 128 bins and each of these bins have a doubly linked free-list associated with it with dlmalloc

  • Sorted by size:

    • < 512 bytes kept in a large number of small bins

    • > 512 bytes kept in remaining larger bins

  • Fastbins, are used for speed to optimize allocation. On windows we call these lookaside lists.

    • Small size up to 80 bytes

  • Never merged

  • Singly linked

    • No backward pointers

Bin Indexing

  • As stated in the malloc.c source code:

  • Indexing

  • Bins for sizes < 512 bytes contain chunks of all the same size, space 8 bytes apart. Larger bins are approximately logarithmically spaced:

    • 64 bins of size 8

    • 32 bins of size 64

    • 16 bins of size 512

    • 8 bins of size 4096

    • 4 bins of size 32768

    • 2 bins of size 26144

    • 1 bins of size what's left

The Wilderness

  • Chunck bordering the highest memory address:

    • Heaps grow up towards the stack * Call sbrk() to increase size and remains contiguous

  • The mmap() function can be used for noncontiguous requests:

    • Creation of new arenas

    • Threaded programs include multiple arenas


  • Based on dlmalloc and written by Wolfgram Gloger

  • Designed to support multiple threads

    • fork() versus threads

  • Original ptmalloc version publishes s part of glibc-2.3.x

  • ptmalloc(3) is the current version, altough, ptmalloc(2) is most common

tcmalloc and jemalloc

  • Thread-caching malloc(tcmalloc)

    • Developed by Google, as part of Google Performance Tools

    • A high-speed memory allocator

    • Has a heap checker to check for C++ memory leaks

  • Jason Evan's malloc (jemalloc):

    • Replaced phkmalloc on FreeBSD

    • Used by the Firefox browser, Facebook

    • Multithreading support

    • Each arena gets it's own processor

Tool: ltrace

  • Tool to intercept and record library calls

  • Freeware under the GNU Public License

  • Similar to the tool strace

    • strace is the successor to ltrace; however, ltrace is easier to read in some use cases

  • Useful for locating calls for memory allocations