Saturday, December 13, 2008

The Dalvik opcodes

I wanted to continue with my adventures with the Android test framework but I ran into some troubles. With pre-1.0 SDKs my solution was simple in these cases: take apart the SDK's android.jar and decomplile the relevant classes. In 1.0 SDK, however, all the classes in android.jar are just stubs, at least in the version on the PC filesystem. The real classes are in DEX format, on the emulated device's file system.

That's sad news because the DEX format is not particularly well documented. More exactly: undocumented. There are some descriptions floating on the Internet but they are obsolete and inaccurate. Conveniently, the dx tool in Android SDK has some less used options that effectively document this format.

Dx is the utility that turns Java class files into DEX files. Every Android developer uses it regularly, although not everybody may be aware of its existence because the tool is invoked by automatically generated make/ant files. Dx has an option that dumps the content of the DEX file in human-readable format while generating the DEX file. This is the batch script I use to get that dump:

set BASEDIR=
javac %1\*.java
dx --dex --verbose --verbose-dump --dump-to=%BASEDIR%\%1\dexdump.txt --output=%BASEDIR%\%1\classes.dex %BASEDIR%\%1

Put your Java files into a subdirectory (e.g. test1) and invoke the batch script with the name of the directory. Beside the familiar classes.dex, dexdump.txt will be generated. This dump file is so verbose that reverse engineering of the DEX file format becomes something of a feasible project.

With using some test Java classes, the official Android opcode list and a lot of time, my first step was to document the Android opcodes. This is the bytecode the Dalvik virtual machine uses instead of the Java bytecode. If you are familiar with Java bytecode, you will see that the opcode set is pretty similar. Significant difference is that the Dalvik opcodes are register-based while Java bytecode is stack-based.

Click here to access the Dalvik opcode list.

The next step will be to put together a DEX disassembler. That will take some time, see you in 2009 with that!

13 comments:

Anonymous said...

Gabor, it's cool :) Thank you. I have an idea to research opcodes and dalvik format, nut I have no time now, because i have examinations :-D

Anonymous said...

Hmm, if you need source for android.jar, why not just get it from http://source.android.com ?

Although, we would certainly need dex disassembler... for hacking :)

Maybe it is simpler to convert back from dex to java and then use java disassemblers?

Gabor Paller said...

Anonymous said...:
"Maybe it is simpler to convert back from dex to java and then use java disassemblers?"

I had this idea too and later on we may try it. The conversion between register-based and stack-based code is not trivial, however. It may be that it is not so complex after all (as the register-based code is already converted from stack-based code, so the conversion back is not the general case) but it definitely need some research.

Gabor Paller said...

Maxim Yudin said...:
"Thank you. I have an idea to research opcodes and dalvik format, nut I have no time now, because i have examinations :-D"

Excuses, excuses ... :-)

Fuxoft said...

If you download the Android sources from source.android.com, there is rather nice documentation of opcodes, dex file format and other VM stuff in dalvik/docs

Anonymous said...

have you ever try to use classes extending from LaunchPerformanceBase, such as HelloWorldLaunchPerformance, NotepadLaunchPerformance, ContactLaunchPerformance? I met some problem when including ServiceManagerNative..

strazz said...

interesting stuff :) I've posted some information on my blog about dex bytecodes and what not. actually working on a seudo-decompiler...

though dexdump works perfectly as is, the only problem is it takes a little "guesswork" to properly reverse the dexdumps into actual java. if you want to chat more about this hit me up or check out the blog; http://www.strazzere.com/blog/

Gabor Paller said...

Hello, strazzere!

I am making a good progress with the disassembler, I see no reason why it wouldn't be ready by early January. My approach is to emit a Jasmin-like output which is easier to read and we can build on Jasmin to create the Java class files one day. Then a Java decompiler like Jad can finish the job.

For real Java decompilation, the hairiest task I can see is the transformation of register-based DEX bytecode into stack-based standard Java bytecode. This task looks complex although I have not even started thinking on it.

I also left a comment on your site.

Anonymous said...

Oh My Dear! Where can i found the javac. I'm really amazed!

Anonymous said...

Hi,

I have this problem: Conversion to Dalvik format failed with error 1. That it because i have imported some libraries into android in order to use some classes of them.

I have followed your great post in order to convert these libraries in a dalvik format. I have finally obtained a new classes.dex from these libraries but how can i use the code inside of theirs? how can i import that to android?

Can you help me?

Thank you so much

Sinisha Djukic said...

Has anyone tried converting java -> dalvik bytecode on-the-fly?

Anonymous said...

Hi,

check the following article

http://xpandroid.blogspot.com/2009/11/how-to-crack-android-build-using.html

It's a good start.

Gabor Paller said...

The key is that you have to sign again the app.

The "hacked" app will come with your signature which can effectively break some applications.

Here are some reasons the application can break after having been signed again.