Buffer Overflow

Buffer overruns related to the C programming language data integrity model were first recognized as early 1973. The first well know exploit of this vulnerability occurred in 1988 when the well documented and infamous Internet Worm shutdown over 6000 systems in just a few short hours , utilizing an unchecked buffer initialized by he gets() function call in the finger daemon process. Despite this history and simple preventative methods the buffer overflow continues to be a significant and prominent computer security concern even today. For example , buffer overflow problems are implicated in top ten vulnerabilities .Resident and lingering buffer overruns left in program code are often attributed to “lazy” , “ sloppy“ or  “uncaring” programmers, or to modern compilers that fail to perform integrity or bounds checking on their source input or machine output instructions. However, these views may be a bit too simplistic. The root of the problem may run deeper than that. For instance, despite their prevalence today, the buffer overflow vulnerability and attack remain only loosely and informally documented in the literature. From this one might conclude the problem is generally not well understood. This may explain why overrun vulnerabilities continue to appear in new software applications.

Some clarification is necessary here. Indeed, the buffer overflow and exploit problem are well known. Unfortunately, “well known” and “well understood” are often two entirely different views of the same thing. For Instance, almost every book article or white paper worth its salt that focuses on computer security mentions the buffer overflows vulnerability and their enabling factors. They normally even site preventions or defenses against them. However they typically avoid discussing or describing the intricate details or complex mechanisms of their cause and their manipulative use in terms that can easily be understood by the novice or inexperienced programmer, system administrator or computer security practitioner or principal.


Buffer overflow vulnerabilities are often attributed to the combined effects of the permissions security features of the UNIX operating system and defects in the C programming language. Such was the premise assumed by Nathan Smith in his 1997 study “Stack Smashing Vulnerabilities in the UNIX Operating System.” However, the vulnerability is not just limited to C or UNIX. Indeed, both DilDog and David Litchfield demonstrated the exploit could be used effectively against the Windows NT kernel in their respective papers “The Tao Of Windows Buffer Overflow” and “Exploiting Windows NT4 Buffer Overruns.” The paper “Windows NT Buffer Overflow’s Start to Finish” shows the problem present and able within the MFC as well. Hence, this paper views the buffer overflow as a language and an operating system independent problem. Microsoft Corporation defines the buffer overflow:

A buffer overflow attack is an attack in which a malicious user exploits an unchecked buffer in a program and overwrites the program code with their-own data. If the program code is overwritten with new executable code, the effect is to change the program’s operation as dictated by the attacker. If overwritten with other data, the likely effect is to cause the program to crash.

Since Microsoft has produced their fair share of buffer overflow vulnerabilities over the years, they should be well versed on the problem and we should not doubt the validity of this description. Thus, we will use this as our official working definition. But what exactly does this mean? And, how do we get there?

To answer these questions adequately, we must begin by looking at the high level language code and the resultant machine code at the most basic level of the “hardware chain”, and investigate how the 80386 processor architecture manages and utilizes memory.

A buffer overflow is very much like pouring ten ounces of water in a glass designed to hold eight ounces. Obviously, when this happens, the water overflows the rim of the glass, spilling out somewhere and creating a mess. Here, the glass represents the buffer6 and the water represents application or user data. Let’s look a simple C/C++ code snippet that overruns a buffer

void authenticate (void)


char buffer1[8];

int i;

for ( i = 0; i < 16; i++)


buffer1[i] = ‘x’



In this function we have a buffer capable of holding eight ASCII characters. Assuming we are on a 32-bit system, this means 16 bytes of memory have been allocated to the buffer. We then place the buffer in an initialization loop and forcefeed 15 “x” characters into it through programming error. Obviously they are not all going to fit, and nine of them must spill over into some other memory area like the water overflows its glass. Notice there is no code in this function to check the bounds of the array or to prevent this programming error from occurring. Under most conditions, the overrun of a buffer does not present a security problem in itself. Typically, a segmentation fault will occur and the program will terminate with a core dump. The buffer overflow itself really is that simple. As we shall soon see though, identifying and exploiting the vulnerability complicates matters very quickly .

 Types of Buffer Overflows

The literature defines the “Stack” and the “Heap” as the two primary types of buffer overrun situations . The stack overflow has two basic variations. One type involves overwriting ( and thus changing ) security sensitive variables or control flags stored in memory adjacent to the unchecked buffer. The most common type of stack overflow involves the overwriting of function pointers that can be used to change program flow or gain elevated privileges within the operating system environment. The more complex heap overrun involves dynamic  memory allocations, or memory allocated at run time by an application.

Structure and Management of Program Memory

Any application or program can logically be divided into the two basic parts of text and data. Text is the actual read-only program code in machine-readable format, and data is the information that the text operates on as it executes instructions. Text data resides in the lower areas of a processes memory allocation. Several instances of the same program can share this memory area .

UntitledData, in turn, can be divided into the three logical parts of static, stack, and heap data. The distinction between these types is dependent on when and how the memory is allocated, and where it is stored or located. When an executable is first loaded by the operating system, the text segment is loaded into memory first. The data segments then follow. Figure 1 demonstrates these relationships. Static data, located above and adjacent to the text data, is pre-known information whose storage space is compiled into the program. This memory area is normally reserved for global variables and static C++ class members. Static data can be in either an initialized or uninitialized state. Heap data, located above and adjacent to static data, is allocated at runtime by the C language functions malloc() and calloc(), and by the C++ new operator. The heap grows up from a lower memory address to a higher memory address.

The stack is an actual data structure in memory, accessed in LIFO (last-in, firstout) order. This memory segment, located above and adjacent to heap data, grows down from a higher memory address to a lower memory address. Like heap data, stack data is also allocated at runtime. The stack is like a “scratch pad” that temporarily holds a function’s parameters and local variables, as well as the return address for the next instruction to be executed. This return address is of prime importance as it represents executable code sitting on the stack waiting for its turn to execute.


Although buffer overflow is a nightmare  for any system administrator, we must bear in mind that ALL such buffer overflow attacks are very preventable, and the “disease” that allows them to persist certainly may be eradicated in the future. However, an effective vaccine must first be developed .