Sunday 28 June 2020

Thread-stack isolation on RTEMS

After isolating memory blocks on RTEMS with ARMv7-A MMU, it is time we isolate thread stacks.

But, before we do that, there are some concepts about threads on RTEMS that we should understand.

Stack Allocation - In RTEMS, the stack allocation mechanism is user-configurable, this means that the user can have a custom mechanism for allocating stacks for their BSP. This is done by defining CONFIGURE_TASK_ALLOCATOR_INIT with the allocation function in the application code.

Context Initialization - Each thread has its own set of registers( stack pointer, program counter) and relevant attributes ( thread id ) for its execution context that need to be assigned to it during thread initialization. In RTEMS, context initialization is done through cpu-specific _CPU_Context_Initialize() function. The Context_Control structure stores the thread register and related attributes, it is initialized by a call to _CPU_Context_Iniitalize().

Context_Control structure

Context switch and restoration - In any context switch procedure we need to save the register state of the thread including the stack pointer and the program counter of the executing thread and then switch to the heir thread by loading the program counter with the address of the heir thread stack pointer. In RTEMS this is done through CPU-specific assembly code. During context-switch, we save the registers of the executing thread, load the register values of the heir thread and switch to _Thread_Do_Dispatch() by loading the program counter with the pointer to the handler function. For restoration, we simply restore the register details by loading them from the 'R0' register and similarly branch to _Thread_Handler() function.

Context-Switching Code

Isolating Thread Stacks -

Setting memory attributes dynamically - We have already isolated memory blocks in the previous post by setting different access permission to different memory regions. Now we need to provide a high-level mechanism to isolate these blocks so that the same framework can be used across all the architectures. The cpukit/include/rtems/score/memorymanagement.h provides the set of APIs that can be defined for the target architecture for setting memory attributes of address space. Here, we will see its implementation for ARMv7 MMU

Flag Translation - At the high-level implementation we need to pass generic memory attributes to memory_attributes_set() function. These attributes are translated to the architecture-specific implementation for the defined BSP architecture i.e. Suppose we pass the 'READ/WRITE' flag to the function, at the architecture level it needs to translate to the bit 'combination and position' of the access-permission bit in the control register. This is why we map the high-level flags to the low-level implementation by defining the memory_translate_flags() for the target architecture. You can check the implementation for ARM MMU here

Allocating protected stacks - As we had discussed above, in RTEMS, we can have a custom stack allocation mechanism. This will be useful to us, as we can utilize this feature to allocate stack with specified memory permissions. Another utility of this feature, as we will see further ahead in the discussion, is that we can register the allocated stack to a chain (doubly linked-list) for tracking them effectively. We define our custom stack allocation mechanism in bsp/shrared/start/stackalloc.c. After we have defined this, the user should always configure the application as discussed earlier to allocate protected stacks.

Tracking protected stacks - We must keep track of all the allocated stacks and their memory access attributes because, during context switch, we need to unset the memory attribute of the current stack and set the memory attributes of the heir stacks, now this can be done only when we have a track of the thread stack attribute that is currently executing and the heir thread stack.

Protected stack attributes - Every allocated stack has some attributes that we need to track for setting memory attributes for the stack space. These include the stack size, stack address, access flags, execution status, and in the case of stack-sharing, shared stack attributes. The stack-management APIs are declared in cpukit/include/rtems/score/stackmanagement.h. The stack management structure looks like this -

Adding allocated stacks to a chain - A very simple way of tracking stacks is by adding each allocated stack to a linked list. In RTEMS, we already have chains that are implemented as a doubly-linked list. We can set the current_stack attribute to 'true' for the most recently allocated stack and set all the other nodes to 'false' and append the stack attribute structure to the list. This way we can keep track of all the allocated stacks and their allocation status. The implementation of adding stack attributes to a chain can be found here.

Context initialization of protected stacks - We must register/initialize the stack attributes of a particular to its Context_Control structure because the members of this structure are saved and restored during a context switch. We call prot_stack_context_initialize() from _CPU_Context_Iniitalize() register the stack attributes to the control structure.

Context switching of protected stacks - For switching context of protected stacks, we follow the pre-existing model in RTEMS. We save the relevant registers and attributes and call the Thread_Do_Dispatch() function by loading the program counter with the address of the function. The only difference is that, for protected stacks, we call the prot_stack_context_switch function, which unsets the current memory attributes, from the assembly code by passing the stack attribute structure as a parameter. We load this parameter to the 'R0' register through 'LDR' instruction by specifying the proper offset into the context control structure.

Context restoration - We follow the same approach, as that with switching, with context restoration. We restore the relevant registers and attributes and call the Thread_Handler() function by loading the program counter with the address of the function. The difference here is that we call prot_stack_context_restore that sets the memory attributes of the thread stack and marks the current_stack attribute as 'true'. We pass the stack attribute to the function by loading this parameter to the 'R0' register through 'LDR' instruction by specifying the proper offset into the context control structure.

This completes our thread isolation implementation. Clone and build this repo for trying out the implementation with various cases where you try to access the stack address of a dormant thread from an executing thread. Be ready to have the OS throw exceptions your way!

Note - This implementation is only tested for POSIX threads, classical RTEMS threads have not yet been tested and the implementation may leak memory when trying to isolate them.

Saturday 27 June 2020

Isolating two blocks of memory and living to tell the tale!

In the last post, we discussed some of the high-level ideas of thread-stack protection, and details of the implementation using an MMU. This time, we will be isolating blocks of memory using MMU on RTEMS. But first things first

What do we mean by isolating two blocks of memory?

Suppose we have two blocks of memory A and B. We want to have both read and write operations on A and read-only operations for B. Now if we can implement a system using an MMU where we can change the values in block A but as soon as we try to write to block B, we get an exception, we have isolated two blocks of memory in the sense that we would have assigned different access permission to these blocks and an operation that works on block A will raise an exception for block B.

What do we plan to achieve by isolating memory blocks?

Isolating memory blocks will be one of the fundamental stepping stones on the way to thread-stack isolation. If we think about it, thread-stack protection in its most watered-down form is isolating memory blocks(The memory blocks, in this case, being the stack address space) from each other.

Isolating these memory blocks will give us a framework upon which we can implement other complexities of thread-stack isolation(stack allocation, context-switching, etc.)

Now, with that out of the way, we should focus on the details of the implementation and start getting our hands dirty!

Choice of processor architecture -

We will implement our memory isolation on ARM-based MMU, in particular the ARMv7-A. There are two reasons for this choice. Some of the famous boards (Beaglebones, Raspberrypi, Xilinx-Zynq) are based on this processor. More importantly, RTEMS already has support for initialization, page table, and page entry setup for ARMv7-A MMU. This means we don't have to code everything from scratch and simply utilize the existing support to isolate the memory blocks.

Paging levels -

At this point, we need to understand the concept of levels of translation that a virtual memory address goes through so that it represents an address in the physical memory.

By setting up page table entries appropriately, we can have memory regions of a 'particular type' that range in size from 16 MB to 4 KB. So why do we need memory regions of such varying sizes?

There are some cases where we need a large chunk of memory (eg. heap region) to have the same memory access permissions and at other times we need fine control over small memory regions (eg. thread stacks).

The sections and supersections need only one level of paging, whereas the small and large page addresses are translated using two levels of paging.

ARMv7-A MMU configuration for memory isolation-
Let us first understand what all configurations we need to set up for an MMU to isolate memory blocks -

Initialising the MMU( Duh! )
Setting up the page table entries for accessing physical memory
Assigning proper access permissions in the page table entries for memory operations

A key player in controlling the MMU is the CP15 register-set, the coprocessor15 controls cache configuration, and management, system performance, and more importantly the memory management unit. All the registers that control MMU configuration belong to the CP15 register set.

In the previous post, we had gained a general idea of accessing memory and setting up their access permission using access flags in an MMU. Now let us look at the ARMv7-A specific details of performing those operations.

Page table base setup - The page table base address in the v7-A MMU is stored in either the TTBR( Translation Table Base Register )0 or TTBR1 register. The choice is made setting the bits[0:2] of the TTBCR ( TTB Control Register ). As you may have observed we can set the bits in various permutations to select the registers in different manners, but that is a topic for another day😉. Right now, we will set the bits to 0 which means TTBR0 has the address of the page table base.

Page table entries setup - For filling up the entries in a page table, we first have to determine the page size for the memory region. In v7-A architecture, depending upon the size of the region that is addressed by a single page table we have supersections (16 MB regions), sections (1 MB), large pages (64 KB), and small pages (4 KB). We will take 1 MB regions as they are easier to implement than small/large page tables but cannot take up all the address space like the supersections. The page table descriptor or entry format for a v7-A MMU is -

For our aim to isolate we need to focus only on the 'Section base address' bits, AP (Access-permission) bits, and bits[2:0]. As we discussed above, we need only a single level translation for sections.
We set bits[2:0] as [10] for section translation, AP bits are set to 01 for region A (read/write permission) and 11 for region B (read-only permission).
Bits[31:20] of the page table entry provides the section base address and the offset is provided by the first 20 bits of the VA. The translation flow for a section looks something like this -

Memory isolation on RTEMS -

We need to understand some important ARM v7-A MMU implementation related concepts in RTEMS to isolate memory blocks -

MMU initialization - The MMU initialization code is almost the same for all ARMv7-A supporting BSPs in RTEMS. The bsps/arm/include/bsp/arm-cp15-start.h has arm_cp15_start_setup_mmu_and_cache() and arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache() which start the mmu and setup the page tables for the specified memory regions.
Specifying memory regions - Each BSP has its own specified address space where various data (.bss, .txt, etc.) should be placed. These regions also need to have their own access permissions this is specified by the mmu_config_table[]. In our case, we will be using the zynq BSP

The ARMV7_CP15_START_DEFAULT_SECTIONS has the address space details of the default sections like .bss, .txt, rodata.

This table is passed into the arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache() which sets up the translation table entry for the specified memory regions.

Linker defines - Now we have defined the memory attributes of various address spaces, but how do we actually place the default sections into the defined memory regions? This is done with the help of the linker defines and the linker scripts. The linker define changes the defined memory regions into linker symbols

The linker script places these symbols into the specified memory regions.

Now we just have to allocate two memory regions, place them in the mmu_config_table with the specified permission and pass them into arm_cp15_start_setup_translation_table_and_enable_mmu_and_cache(). When we try to write to memory region B with the above configuration, we get this -

We first set the permission for region A as read/write and then read-only for region B. We successfully complete write operation on region B, but as soon as we try to write to region B, the OS throws an exception .

Thread stack protection in RTEMS

Sunday 28 June 2020

Thread-stack isolation on RTEMS

Isolating Thread Stacks -

Saturday 27 June 2020

Isolating two blocks of memory and living to tell the tale!

Blog Archive