Showing posts with label dex. Show all posts
Showing posts with label dex. Show all posts

Wednesday, July 1, 2009

New DEX assembler/disassembler pair

Dedexer got competition! Not one but immediately two - smali and baksmali is a DEX assembler/disassembler pair. The programs are brand new and - like Dedexer - they are also based on a Jasmin-like format. I tried them very shortly, the disassembler - baksmali - worked correctly and indeed produced an output similar to dedexer but I could not compile the output of baksmali back to DEX format with the assembler - smali. The disassembler does not handle ODEX files either.

The initiative is indeed valuable, however, and I encourage every reverse engineer out there give the new tools a try.

Tuesday, February 3, 2009

Optimized DEX files

One of the less publicized ability of the Dalvik virtual machine is that it is able to execute "optimized" DEX files (or ODEX files). "Optimizing" a DEX file speeds up its execution but also ties it to the hardware platform on which the optimization was performed. ODEX optimization relies on unsafe bytecode instructions. Unsafe instructions are much faster to execute, on the other hand, malicious use of these instructions can crash the virtual machine. Dalvik has an additional layer of security, process separation and file permissions of the underlying Linux platform so this price is not that high. It is important to note, however, that much of the security normally associated to the Java virtual machine e.g. in J2ME is now provided by the underlying Linux platform.

I remember the resistance when I proposed unsafe instruction for garbage collection performance optimization on mobile platforms. The critics argued that 1. the Java bytecode is not allowed to be modified and 2. the bytecode safety must be preserved at all costs. Seemingly, Dalvik designers have done away with both ortodoxies.

The optimization is carried out on the target platform, by the target platform's virtual machine. The command to invoke the optimizer is called dexopt. There is no point in invoking this command on the emulator as it would create an optimized DEX that can be executed only on the emulator. The most important transformations are the replacement of the method indexes with vtable indexes and the field indexes with memory offsets. So instead of invoking a method by name, it is invoked by vtable index using a set of special instructions not specified in the Dalvik instruction set.

For example:

invoke-virtual-quick {v1,v2},vtable #0x3b

(Note: if anyone reads this who is knowledgeable in the way the vtable index is computed, please, drop me a comment or mail :-))

Also, for fields:

iget-object-quick v0,v2,[obj+0x100]

The dedexer tool was updated and now provides a limited support for ODEX files. Sadly, I was not able to figure out, how to calculate method or field names out of vtable indexes and byte offsets which clearly limits the usefulness of the disassembled ODEX sources. When I was at it, I also implemented debug information processing that generates line number and local variable information like this:

.var 0 is intent Landroid/content/Intent; from l144ac to l144c8
const-class v3,com/android/vending/SubCategoryListActivity
const-string v2,"android.intent.action.VIEW"
.line 180

The Dalvik opcode list was also updated with the description of the "quick" instructions.

Friday, January 9, 2009

Disassembling DEX files

One of the most remarkable features of the Dalvik virtual machine (the workhorse under the Android system) is that it does not use Java bytecode. Instead, a homegrown format called DEX was introduced and not even the bytecode instructions are the same as Java bytecode instructions. There was some discussion whether this makes Dalvik a Java virtual machine at all. My personal opinion is that this is a religious and legal dispute. Dalvik opcodes are clearly designed to support only the Java language. Compiling programs to Dalvik bytecode written in a language other than Java is certainly possible, as it was demonstrated with Java but neither the Java bytecode, nor the Dalvik bytecode makes any effort to support any language other than Java. This is in contrast with the .Net virtual machine where at least a claim has been made that the VM supports multiple languages - even though there are always limitations in any virtual machine that prevents running a particular language on a particular virtual machine.

Android comes with a disassembler called dexdump. The location of this tool is not intuitive, it runs on the Linux platform that hosts Android. Launch the emulator, and issue the following commands:

adb shell
dexdump

In order to use the tool, one has to move the DEX file to the Android platform (e.g. adb push in case of the emulator). Then one can say:

dexdump -d classes.dex

The output of this tool is not very easy to use, however. Take for example the bytecode compiled from the following switch statement.


000418: 2b02 0c00 0000 |0000: packed-switch v2, 0000000c // +0000000c
00041e: 12f0 |0003: const/4 v0, #int -1 // #ff
000420: 0f00 |0004: return v0
000422: 1220 |0005: const/4 v0, #int 2 // #2
000424: 28fe |0006: goto 0004 // -0002
000426: 1250 |0007: const/4 v0, #int 5 // #5
000428: 28fc |0008: goto 0004 // -0004
00042a: 1260 |0009: const/4 v0, #int 6 // #6
00042c: 28fa |000a: goto 0004 // -0006
00042e: 0000 |000b: nop // spacer
000430: 0001 0300 faff ffff 0500 0000 0700 ... |000c: packed-switch-data (10 units)


The jump table used by the packed-switch instruction is not disassembled at all, it is not even dumped entirely. The same problem applies to fill-array-data tables and there are further restrictions.

I decided therefore to create a more comfortable disassembler and here is the first cut.

Access the dedexer project's page on SourceForge.

This tool is easier to use than dexdump for many reasons. For starter, it is a standard Java program that runs on the usual JVMs. Its format is much more readable and is familiar to those who know the Jasmin syntax. For example the previous fragment is disassembled like this by dedexer:


.method public calc1(I)I
packed-switch v2,0
ps418_422 ; case 0
ps418_426 ; case 1
ps418_42a ; case 2
default: ps418_default
ps418_default:
const/4 v0,15
l420:
return v0
ps418_422:
const/4 v0,2
goto l420
ps418_426:
const/4 v0,5
goto l420
ps418_42a:
const/4 v0,6
goto l420
nop
.end method


In addition, individual file is created for each class, along with the directory structure representing the package structure.

This is not a full decompiler, however. One has to know the Dalvik opcodes in order to work with the tool. This opcode list has been extended and maintained as dedexer was developed and is now in sync with the disassembler. You will see some unknown opcodes in the list. I have not encountered those instructions "out in the wild" and the disassembler does not recognize them either. If you see any of those, send me the DEX file so that I can analyse it!

This is a simple tool and is not without limitations. The most painful one is that the tool does not process the debug and annotation information in the DEX file. Array data dump could also be better. I am sure that the feature most people would like to see is a bridge toward Java class files but that is far away. Jasmin will be able to generate Java class files once the backward conversion from Dalvik opcodes to Java bytecode is provided but that's a complex task so don't hold your breath. The condition I set for myself as release condition is that the tool is able to disassemble the DEX file in framework.jar. It is able to, so I guess, the tool may be of use for others too. Enjoy!

Saturday, December 13, 2008

The Dalvik opcodes

I wanted to continue with my adventures with the Android test framework but I ran into some troubles. With pre-1.0 SDKs my solution was simple in these cases: take apart the SDK's android.jar and decomplile the relevant classes. In 1.0 SDK, however, all the classes in android.jar are just stubs, at least in the version on the PC filesystem. The real classes are in DEX format, on the emulated device's file system.

That's sad news because the DEX format is not particularly well documented. More exactly: undocumented. There are some descriptions floating on the Internet but they are obsolete and inaccurate. Conveniently, the dx tool in Android SDK has some less used options that effectively document this format.

Dx is the utility that turns Java class files into DEX files. Every Android developer uses it regularly, although not everybody may be aware of its existence because the tool is invoked by automatically generated make/ant files. Dx has an option that dumps the content of the DEX file in human-readable format while generating the DEX file. This is the batch script I use to get that dump:

set BASEDIR=
javac %1\*.java
dx --dex --verbose --verbose-dump --dump-to=%BASEDIR%\%1\dexdump.txt --output=%BASEDIR%\%1\classes.dex %BASEDIR%\%1

Put your Java files into a subdirectory (e.g. test1) and invoke the batch script with the name of the directory. Beside the familiar classes.dex, dexdump.txt will be generated. This dump file is so verbose that reverse engineering of the DEX file format becomes something of a feasible project.

With using some test Java classes, the official Android opcode list and a lot of time, my first step was to document the Android opcodes. This is the bytecode the Dalvik virtual machine uses instead of the Java bytecode. If you are familiar with Java bytecode, you will see that the opcode set is pretty similar. Significant difference is that the Dalvik opcodes are register-based while Java bytecode is stack-based.

Click here to access the Dalvik opcode list.

The next step will be to put together a DEX disassembler. That will take some time, see you in 2009 with that!