Not many developers write Assembly code because it could be a daunting task, but those who do enjoy it writing. Assembly code is very close to the low-level programming language that is directly executed by the system processors. The assembly code is written in the assembly from, as a binary data, and with the help of processor manual, we specify the instruction that later encoded into bytes of data. The Disassembly process is the opposite of assembly, in this, the byte code parses back to the assembly instructions.
There are many types of processor architectures and each one has a different set of instructions, and the processor is only able to execute its own specific instruction sets. But if you wish to run a code of one processor architecture in another you would require an emulator that translates the code of one processor architecture to another so it could be executed. In reverse engineering and testing of devices like routers we require assembling, disassembling, and emulating of code for different architectures.
In this Python tutorial, we will be using Keystone engine, Capestone Engine, and Unicorn Engine Python frameworks to assemble, disassemble, and emulate ARM assembly code. These three Python frameworks are capable of handling different processor architecture including x86, MIPS, SPARC, MIPS, and many more.
Install Libraries The Keystone Python package is a multi-architecture assembler framework, and to install it use the following pip command
pip install keystone-engine
The capstone Python package is a disassembly engine. and it can be installed using the following Python pip command.
pip install capstone
The unicorn Python package is a multi-architecture CPU emulator framework, and it is compatible with keystone and capstone frameworks.
pip install unicorn
Assembling ARM
As we have discussed above that for this Python tutorial we will be using the ARM assembly code, and for assembling the ARM we will be using the sum of two numbers ARM code.
Import modules
So let's first get started with importing all the important modules
# unicorn module to emulate ARM code
from unicorn import Uc, UC_ARCH_ARM, UC_MODE_ARM, UcError
# Access Register R0, R1 and R2 for ARM
from unicorn.arm_const import UC_ARM_REG_R0, UC_ARM_REG_R1, UC_ARM_REG_R2
# Keystone module to assemble ARM code
from keystone import Ks, KS_ARCH_ARM, KS_MODE_ARM, KsError
Now let's write the ARM code to add two numbers, with
r1
and
r2
registers, and store the output in
r0
register.
ARM_CODE = """
mov r1, 200 // Move the number 200 into register r1
mov r2, 40 // Move the number 40 into register r2
add r0, r1, r2 // Add r0 and r1 and store the result in r0
"""
Now let's assemble the above ARM code to bytecode using keystone module methods.
print("Assembling Process begin/....")
try:
# Initialize the KS object for ARM architecture
ks_obj = Ks(KS_ARCH_ARM, KS_MODE_ARM)
# Assemble the Above ARM_CODE
arm_arr_int_bytes, number_of_instructions = ks_obj.asm(ARM_CODE)
# convert the arm__arr_int__bytes to bytes
arm_bytecode = bytes(arm_arr_int_bytes)
print("Assembling Process complete successfully")
print("Total Number of instructions:",number_of_instructions)
print("The ARM bytecode is:", arm_bytecode, "\n")
except KsError as error_msg:
print("Assembling Process failed")
print(f"KS Error{error_msg}" )
exit("Check the ARM code again")
The keystone
asm()
method assembles the ARM code and returns the Array of Integer bytecode
arm_arr_int_bytes
and the total number of assembled instructions
number_of_instructions
. As of now, we have the bytecode for our ARM code
arm_bytecode
now we can execute or emulate our ARM processor with the Python unicorn emulator methods.
#initial memory address for emulation
Initial_ADDRESS = 0x1000000
print("Emulating Process begin")
try:
# Initialize the Unicorn emulator for ARM architecture
uc_em = Uc(UC_ARCH_ARM, UC_MODE_ARM)
# Initialize 1MB memory for emulation 1024 KBytes
uc_em.mem_map(Initial_ADDRESS, 1024)
# set arm bytecode in the initial address
uc_em.mem_write(Initial_ADDRESS, arm_bytecode)
# start the emulation for full arm byte code
uc_em.emu_start(Initial_ADDRESS, Initial_ADDRESS + len(arm_bytecode))
print("Emulation Process Completed")
# access the R0 Register
r0 = uc_em.reg_read(UC_ARM_REG_R0)
print("The Value stored in the R0(r1+r2) register is:", r0)
except UcError as error_msg:
print(f"Emulating Process failed {error_msg}")
The above code will emulate the ARM bytecode within 2MB of memory. And when you execute the above codes you will see the following output.
Output
Assembling Process begin/....
Assembling Process complete successfully
Total Number of instructions: 5
The ARM bytecode is: b'\xc8\x10\xa0\xe3( \xa0\xe3\x02\x00\x81\xe0'
Emulating Process begin
Emulation Process Completed
The Value stored in the R0(r1+r2) register is: 240
From the above output, you can see that the
r0
register store the value of
240
which is the sum of
r1 200
and
r2
40
values.
Disassembling in Python
By far we have only discussed how can we use the Python keystone engine and unicorn frameworks to assemble ARM code and emulate it using Python. Now let's disassemble the ARM byte code back to the Assembly code using the Python capstone framework. In the above Python Assembling ARM code example, we assemble a Sum of two numbers ARM code into byte code, now we will be using that same byte code and parse back it to the assembly ARM instruction code.
The ARM bytecode is: b'\xc8\x10\xa0\xe3( \xa0\xe3\x02\x00\x81\xe0'
#python program to disassemble the arm byte code
from capstone import *
from capstone.arm import *
CODE = b'\xc8\x10\xa0\xe3( \xa0\xe3\x02\x00\x81\xe0'
#initialize cpstone object
md = Cs(CS_ARCH_ARM, CS_MODE_ARM)
for instruction in md.disasm(CODE, 0x1000):
print(instruction.mnemonic, instruction.op_str)
Output
mov r1, #0xc8
mov r2, #0x28
add r0, r1, r2
From this above Code, you can see that first, we initialize the capstone object
md
with
Cs(CS_ARCH_ARM, CS_MODE_ARM)
that initializes the object with ARM Architecture and mode. Then using the
md.disasm(CODE, 0x1000)
statement we disassemble the ARM_Byte_Code and print all the instructions. And from the output, you can see that this is the same ARM code instruction to add two numbers with
r1
and
r2
registers and store that value in
r0
register.
Conclusion
In this Python tutorial, we only walk you through the assembling, disassembling, and emulating of ARM Assembly code in Python. But with the help of keystone, Capstone, and Unicorn engine Python frameworks you can work with different processors Architectures and their own instruction sets.
People are also reading:
- Plotly in Python
- How to check the size of a file using Python?
- SYN Flooding Attack in Python
- How to Extract All PDF Links in Python?
- VADER in Python
- Python Mouse Module
- How to Build a WiFi Scanner in Python?
- SIFT Feature Extraction in Python
- Python Google Page Ranking
- How to Delete Emails in Python?
Leave a Comment on this Post