Malloc Implementation
Head manager used by the program
Library of functions used by the C programming language for dynamic memory allocation
Interface to sbrk() and mmap()
If we need more memory from the kernel we use sbrk() or mmap()
Break sbrk() and mmap() memory allocation into smaller chuncks
Easily ported to other languages
Maloc
Each malloc implementation contains functions such as:
malloc(): Allocates a chunk of memory
realloc(): Decreases or increases amount of space allocated
free(): Frees the previously allocated chuck
calloc(): Initializes data as all o's
Specify an array of N elements, each with a defined size
Doug Lea's malloc implementation
Used by many Linux variants as the primary memory allocator chunks from freelists
The unlink() function removes chunks from a doubly linked list
The frontlink function inserts new chunks into a doubly linked list
unlink() is called by free() when an adjacent chunk is also unused:
Perform coalescing
"Holding Hands"
Then frontlink() is called to reinsert
What is paging?
Process of allowing indirect memory mapping
Linear addressing is mapped into fixed-sized pages:
Most commonly 4KB
Pages mapped into page tables with up to 1,024 entries
Page tables mapped into page directories
Page directories can hold p to 1,024 page tables
Linear address maps to page directory, table, and page offset
Translation Lookaside Buffers (TLB) hold frequently used page tables and entries
Page file may not be needed or used on 64-bit systems
Context switching and the Process Control (PCB)
Register values for each process are stored in the PCB and loaded during context switching.
Non-recently accessed pages are copied to disk
Page Faults
Occurs when the system attempts to access an address of memory at has been paged to disk
Relatively time-consuming
Swapping an entire process
Windows Memory Optimization
Pages out unused memory
#define unlink(P, BK, FD){\FD = P->fd;\/*FD = the pointer stored at chunck +8 */BK = P -> bk; \/* BK = the pointer stored at chunck + 12 */FD->bk = BK; \/* At FD + 12 write BK to set new bk pointer */BK -> fd = FD; \/* At BK +8 write FD to set new fd pointer */}
These are the sections we need to care about inside an elf binary, there are more sections than this you can see them with tools like readelf and objdump
Code Segment:
Fixed-size segment containing code, the compiled executable code segment, it hold instructions, it should not be writeable but it needs to be readable and executable.
Data Segments:
Fixed-sized segment containing initialized variables, it should be readable not writeable or executable.
BSS Segment:
Fixed-sized segment containing uninitialized variables, has to be writeable because initializes all of your variables
Heap:
Segment for dynamic and/or large memory requests, it needs to be writeable but it should not be executable because that will give potential for shellcode execution
Stack Segment:
Should be far away from the heap because they don't play nicely together because they're both dynamic and both need to be able to grow so we don't want them to collide,
Procedure stack for the process, it should be writeable but it should not be executable because that's why we have things like Data Execution Protection (DEP)
There are 128 bins and each of these bins have a doubly linked free-list associated with it with dlmalloc
Sorted by size:
< 512 bytes kept in a large number of small bins
> 512 bytes kept in remaining larger bins
Fastbins, are used for speed to optimize allocation. On windows we call these lookaside lists.
Small size up to 80 bytes
Never merged
Singly linked
No backward pointers
As stated in the malloc.c source code:
Indexing
Bins for sizes < 512 bytes contain chunks of all the same size, space 8 bytes apart. Larger bins are approximately logarithmically spaced:
64 bins of size 8
32 bins of size 64
16 bins of size 512
8 bins of size 4096
4 bins of size 32768
2 bins of size 26144
1 bins of size what's left
Chunck bordering the highest memory address:
Heaps grow up towards the stack * Call sbrk() to increase size and remains contiguous
The mmap() function can be used for noncontiguous requests:
Creation of new arenas
Threaded programs include multiple arenas
Based on dlmalloc and written by Wolfgram Gloger
Designed to support multiple threads
fork() versus threads
Original ptmalloc version publishes s part of glibc-2.3.x
ptmalloc(3) is the current version, altough, ptmalloc(2) is most common
Thread-caching malloc(tcmalloc)
Developed by Google, as part of Google Performance Tools
A high-speed memory allocator
Has a heap checker to check for C++ memory leaks
Jason Evan's malloc (jemalloc):
Replaced phkmalloc on FreeBSD
Used by the Firefox browser, Facebook
Multithreading support
Each arena gets it's own processor
Tool to intercept and record library calls
Freeware under the GNU Public License
Similar to the tool strace
strace is the successor to ltrace; however, ltrace is easier to read in some use cases
Useful for locating calls for memory allocations