In the previous post, we discussed a primitive mechanism for isolating thread-stacks. The discussed mechanism has some inherent flaws which will be discussed along with its solution in this post.
Broadly there are two flaws with the previous implementation -
Memory entries being set for 1Mb sections - The ARMv7 MMU implementation for changing the memory entries is defined for 1Mb sections, this causes issues. Suppose we have two thread stacks T1 and T2 if the application writer does not explicitly state the stack size, RTEMS allocates 8K bytes to a stack. Now, on switching from T1 to T2 we set the memory entries of T1 and unset of that of T2, the problem is, we are actually unsetting memory attributes of the entire 1Mb section which may have global R/W data that is used by T2. This will cause unnecessary fatal exceptions whenever we try to access global data from T2.Solution - The solution to the problem is pretty simple, but as we will see the implementation poses some subtle problems. We should set/unset the memory entries for only those regions that contain the thread stack, i.e. if we have stacks of size 8K then we should set/unset memory entires of these regions only. This requires finer grain control, we have to have multilevel (2-levels for 4K pages) page tables. RTEMS, in fact, provides support for 2 level page tables.
The problem lies in the fact that for Xilinx-zynq BSP, the translation table base is set at 0x100000 by the linker script and extends up to 0x104000 for section-based pages (16K in size). Although for small pages it will extend up to 0x504000 (4.16Mb in size) this will possibly conflict with other data regions(.txt, .bss, etc.) that are placed in this address space and setting up of translation table for smaller pages will fail. This is a BSP specific problem and depends on how the linker script sets-up the address space of a particular BSP. We will thus have to change the linker script to place the translation table entries in an address space where it does not cause conflict with other memory regions. We actually can take help( switch to ?) from the realview_pbx_a9 BSP which already supports 4K pages to modify the linker script according to our needs. Here is a snippet -
Tailoring our linker script according to the above snippet solves our problem and now we can set up translation tables for 4K pages. Now we can set/unset memory entries for our thread stacks without worrying about other memory regions, or maybe not 😏?
Allocated stacks are not page-aligned - As discussed in the previous post we use a custom stack allocator, that is defined from the application, to allocate thread stacks from the workspace and set the memory entries of the stack. The stacks allocated from the workspace are not page-aligned, where we consider 4K pages. In practice, this means that the stack address is, for example, 0xfbf9b70 instead of 0xfbf90000. How is this a problem for us?
When we set the memory entries for 4K pages the entries are set per page, i.e we have E1 entry for 0xfbf9000-..a000 and E2 entry for 0xfbfa000-..b000. Now when we get stack address from the workspace it is possible that we have stack S1 that ranges from 0xfbf9b70 to 0xfbfbb70 (8K size) and S2 ranges from 0xfbf7b60 to 0xfbf9b60. So when we unset the memory entries of S2 (which begins at 0xfbf9b70) during context switch and set the entries of S1( which ends at 0xfbf9b60) we end up setting the memory entries for the entire 0xfbfa000-..b000 (as entries are set per page). This leaves a part of the stack S1 still mapped in and we do not achieve perfect stack isolation.
Solution - Since the memory entries are set per-page, if we allocate page-aligned stacks we will be able to perfectly set/unset memory entries of only the required region. In RTEMS we can allocate byte aligned memory using Heap_Allocate_aligned_with_boundary(). We set the alignment to 4096 as we want 4K aligned address. Note that this allocation is done in the custom stack allocator.
No comments:
Post a Comment