Sunday, October 18, 2009

Help needed

I decided to release a new version of dedexer but I am not satisfied. The Holy Grail I am chasing is the high-quality disassembly of ODEX files and I intended to use the hint received from Nenik. I extended the dedexer tool with data flow analysis so it now has knowledge about the types in Dalvik registers at any point of the execution of Android bytecode. If you ask nicely the new version of the tool (-r switch), it will even share this information with you. Now a decompiled method looks like this if this switch is used:

.method public (Ljava/lang/String;)V
.limit registers 4
; this: v2 (LLineReader;)
; parameter[0] : v3 (Ljava/lang/String;)
.catch java/io/IOException from lbba to lbda using lbdc
.line 18
invoke-direct {v2},java/lang/Object/ ; ()V
; v2 : LLineReader;
lbba:
.line 20
new-instance v0,java/io/FileInputStream
; v0 : Ljava/io/FileInputStream;
invoke-direct {v0,v3},java/io/FileInputStream/ ; (Ljava/lang/String;)V
; v0 : Ljava/io/FileInputStream; , v3 : Ljava/lang/String;
iput-object v0,v2,LineReader.fis Ljava/io/FileInputStream;
; v0 : Ljava/io/FileInputStream; , v2 : LLineReader;
.line 21
new-instance v0,java/io/BufferedInputStream
; v0 : Ljava/io/BufferedInputStream;
iget-object v1,v2,LineReader.fis Ljava/io/FileInputStream;
; v1 : Ljava/io/FileInputStream; , v2 : LLineReader;
invoke-direct {v0,v1},java/io/BufferedInputStream/ ; (Ljava/io/InputStream;)V
; v0 : Ljava/io/BufferedInputStream; , v1 : Ljava/io/FileInputStream;
iput-object v0,v2,LineReader.bis Ljava/io/BufferedInputStream;
; v0 : Ljava/io/BufferedInputStream; , v2 : LLineReader;
lbda:
.line 28
return-void
lbdc:
.line 23
move-exception v0
; v0 : Ljava/io/IOException;
goto lbda
.end method

Great then, but where is the invoke-quick disassembly? Well, erm, I ran into problems. First of all, I could not figure out the data structures that store the names of other ODEX files that this ODEX file depends on. They seem to be in some sort of data structure at the end of the ODEX file that stores the name of these files but its exact layout remains a mistery for me.

Second, in order to decode invoke-quick statements, iget-object-quick statements also need to be decoded because the type values they put into Dalvik registers are needed for the data flow analyser. The source of this instruction is known as an offset and the mapping of these offsets back to Java types.

I will try to progress with these problems, any help is appreciated.

And now some PR after the boring technical details.

I will make a short presentation about dedexer during the coming Android meetup in London. If you are interested about the tool and central London is accessible for you, let's see each other there.

11 comments:

danfuzz said...

In the Dalvik source, check out vm/analysis/DexOptimize.c to see how the dependency info is written. Look for "deps" in the file.

Gabor Paller said...

Thanks, danfuzz, I found what I needed.

Any insight on how the offsets for iget-object-quick are calculated?

danfuzz said...

Sorry, I didn't see your followup question until just now. But it sounds like you figured it all out nonetheless. Great work!

Gabor Paller said...

Thanks, Dan, now it's your turn with the GC. :-)

danfuzz said...

Eep? I see the smiley, but I'm not sure if you have a real question (that I've similarly failed to see yet).

Gabor Paller said...

The point I was trying to make on the last slide of my presentation (last blog post) is that Dalvik's biggest weakness is the mark&sweep GC that the public version has (maybe there is already a better GC in Android but it is not visible from the outside world).

danfuzz said...

I thought that might be what you meant. In any case, yes, the existing heap code is not exactly the epitome of technical sophistication, and we are now working on a major revision of it. As always, I hope will get to see the light of day sooner rather than later. However (also as always) I can make no promises about a delivery date.

Gabor Paller said...

You are probably aware of the Jikes RVM.

Jikes RVM

There are a lot of neat tricks in that software. Particularly in the heap/GC code.

IMHO, the success or the failure of the Android effort depends on the VM. It has to improve both in execution speed and in memory management if Android is to succeed.

Gabor Paller said...

Hah, what happened with the link?
http://jikesrvm.org/

danfuzz said...

Yep, I've actually been aware of the JikesRVM project since back when it was called JalapeƱo. Indeed, they've done some interesting stuff. I didn't think the project was actively being developed anymore, but I see that I was mistaken. Thanks for the (re)pointer!

Gabor Paller said...

Well, at least in 2005 it was alive and well when I did that explicit deallocation project.
I liked the Jikes RVM pluggable GC architecture and that impressive collection of GCs to try.