In this series of articles I am going to introduce a method of security that is rarely employed by individuals or companies, but provides great protection from both local and remote exploits. It is extremely good because it renders automated attack code, and even previously written hacking tools, useless. It has been in the arsenal for many years, but since the method used to perform it requires compiling the operating system kernel and applications from source, it has been impossible for closed-source operating systems like Windows, and difficult for open-source, such as Linux. However, newer tools are making it much easier to perform.
To understand how it works, consider the following maze and how it might be possible to program a robot to navigate it.
First a simple command to move ten feet forward which simply turns the wheel motors on for a given number of seconds. This can be called “MF” for move forward. A second command would be called “Turn Right” and would consist of turning the wheel motors on in such a way for a given number of seconds to rotate the robot right ninety degrees.
This can be called “TR” along with a similar one called “TL” for turn left. Now assuming each square is ten feet both in length and width to get the robot through the maze to stand in front of the green wall if it starts on the red dot and faces down would be:
MF MF TL MF TL MF TR MF TR MF MF TL MF MF
We now have a method to allow the robot to navigate all obstacles. Those simple commands get the robot through the maze to stand in front of the green wall. Suppose however, we change a single wall in the maze. It now looks like this
With even that simple change our algorithm to solve it fails. The algorithm not only depends on the maze being the same, but the robot being in the same spot and facing the same direction.
Suppose though we wanted to make sure we could solve the first maze no matter which direction the robot was facing. As long as we get multiple tries with the maze resetting between them, this algorithm works but we need to change our starting conditions.
Instead we do four attempts. First let’s call the current algorithm SOLVE. The we do the following as a main routine to solve the maze
TR TR SOLVE TR TR TR SOLVE
Notice that each time we try the solution we turn to right one, two, and three times. This eventually will orient the robot correctly for the solution to work with a maximum of three failed attempts if the robot starts facing left. Hacker tools often try multiple times changing offset values from a known list or simply incrementing through a set of values trying to find the correct one. Each wrong attempt might cause the process to crash, forcing the OS to restart it.
To finish the system we simply need to have a program to execute the commands and the data necessary to drive the routine. We would have to store in the robot’s computer memory the program and the data. It might be something like this table:
Now the computer would simply start running the code stored at row 4 and the robot would begin. At each attempt to solve the maze using SOLVE if it fails, the robot is reset to its original position and another attempt can be made. In this fashion, multiple attempts can be made allowing the SOLVE routine to eventually solve the problem. Of course, computers only use numbers so instead of calling the subroutines by number we will instead call them by number. Location 4 contains the main program which calls all the others. MF is stored at 0, TR at 1, TL at 3, and SOLVE at 3. The data the main routine will use to know which order to call the subroutines is stored at location 5.
Now we have the basis for a simple program to solve the first maze regardless of which direction the robot is facing. The whole system is an analogy for a computer program with the code and data stored in the table. Though very simplified, it represents a program in memory of a computer, and in this case if we think of the maze as an exploit tool to a vulnerability, we have a program that can use that vulnerability to break into a system.
A program calls functions both in external and internal libraries and the locations of these routines can all be calculated from an offset from some known location in the program. It is relative to a given position, perhaps 4,187 bytes forward or 3,209 bytes backwards. If the location of the code is known, data can be changed to break the system in such a way that execution jumps to an unintended location in the original program but instead to the location of the exploit code. There are several special locations for data a program uses including the call stack to store the return addresses in which to jump after execution of subroutines. Such techniques as stack buffer overflows work because this stack is also used to store parameters, variables scoped to the current function (local), and current instance (the this pointer). The call stack has many uses but knowing the exact state of this stack is important for a hacker to the call-return system of return-oriented programming (ROP). Knowing precisely where to place data on this system by causing an overrun is the key to such exploints. Code is simply specialized data to a computer system.
It all becomes very complex since code is local to the current program but also references code in the operating system and external libraries. External library code must be loaded into memory (or if already loaded the address looked up) and these addresses must be put in the program as it is loaded–called a “fixup.” The operating system must maintain a table of these routines in order to reuse them when executing programs. Like programs in memory, specialized files reside on a permanent storage device that contain not only the code for the program, but also all the data required to perform all the fixups to properly modify all external and relative internal addresses a program needs. Since a program can be loaded anywhere in memory (for modern operating systems) many addresses need to use a fixup. This process is called linking and is done dynamically as the operating system prepares the program for execution. The key takeaway is that for many hacking techniques these numbers must be known or found to switch execution from the intended code (the operating system or application code) to the hacker’s code. Very much like the maze, knowing exactly what to do in advance allowed the hacker to create code that could exploit a vulnerability in a system to compromise it. Like the program to solve the maze it is entirely dependent on the maze never changing.
Since all releases of a particular application or operating system are exactly the same, the code that can break one, can break all identical versions. If the above maze is exactly the same, the algorithm will always solve it, but even one change to the maze disables the program. If all the addresses that need fixed up were different for every system the program ran on, a hacker could only design a program to break a single computer. In other words, if each computer had its own maze it would be impossible for a hacker to create a program (at least a reasonably simple one) that could solve the maze. So, can we each have our own maze?
The answer is yes. For open source, this is fairly easy, simply compiling the code with a different version of the compiler is often enough. Reconfiguring the options of the program to exclude features not required for the system (support for hardware not present, network protocols not used, unnecessary features, etc.) or to include features not included in the default. All of these will create different link locations, a different location of the special data areas (call stack, heap), leading to different offsets in executable (for those more technical this means different values in the procedure link table and global offsets table in an ELF file for example). This will make a new maze by building an executable file unique to your system.
If you’re using a source distribution of Linux such as Gentoo, LFS, or my own Toucan Linux Project, you already have a unique kernel, unique libraries, and unique applications. If you’re using a binary distribution such as Ubtunu, RedHat, SUSE, almost all others, or Windows, you’re using the same maze that everyone else is using. Even recompiling the entire distribution will yield the same binaries because all options have already been chosen. You can install kernel source, modify options, and recompile it to at least protect the OS kernel.
But what about software that is closed source? In that case a program specifically designed to read the executable file format, examine the code, and rewrite it to use different offsets, switch the use of registers, examine and randomize the link tables. This is known as a binary rewriter. Many exist including DynamoRIO, Multiverse, Zipr++, RL-bin, and RevARM for Linux on X86-64 and ARM architectures. These can be used for anything from observation of a program operating for performance profiling, but also hardening. They are not easy to use but are fully capable of doing the job, but only in expert hands.For these you’ll need a senior developer, but because it requires specialized knowledge of executable file formats, dynamic linking, and assembly language, it’s a rare developer that can handle the work. These developers work in building developer toolchains and are very hard to come by.
In 2015 a company decided to address this issue and create an easy to use–actually automatic–binary rewriter with the purpose of modifying an executable file to make the program unique in terms of ROP metrics (offsets, registers, jump tables, etc.) explicitly for security hardening. The company is Polyverse and they are busy building a system to polymorph code to make it unique without changing functionality while minimizing performance degradation. Using their service you can build your own maze, one that is unique to your company and can be refreshed every 24 hours. Having your own unique polymorphic applications and operating system is great, but having it change daily is very important as well. Why? We’ll talk about what
mechanisms modern kernels are using to combat ROP attacks and how hackers have responded to bypass these mechanisms. We’ll also talk about what is needed in the future to solve these types of attacks, which are the majority of attacks we find in the wild .