ObjectWeb Consortium
Print

Advanced Search - Powered by Google

  Consortium     Activities     Projects     Forge     Events 

ASM



ASMDEX
· Home
· Download
· Mailing Lists
· License
· History


About
· Users
· Team
· Contacts

ASMDEX primer

Introduction to the ASMDEX Bytecode Framework

What is ASMDEX ?

ASMDEX is a bytecode manipulation library as ASM but it handles the DEX bytecode used by Android executables. Only the core library and and a tool to convert bytecode to code generating it (asmdexifier) are available. The underlying principle while developing ASMDEX was to keep it very similar to ASM to ease the cost of porting tools done for Oracle Java bytecode to Android bytecode.

Differences between ASM and ASMDEX

Although the underlying principle was to keep ASMDEX close to ASM, there are still a lot of differences that the user must keep in mind :

  1. The code unit is the application and not the class. An Android application is a ZIP archive as a JAR file but its content is a set of resources (most of them are compiled XML files) and a single code file that contains all the application code. The name of this file is classes.dex
  2. There is a single constant pool in classes.dex which is organized as a set of sorted tables. Any modification in a class may disturb this sorting and may have an impact on the complete code unit.
    Although ASM follows the visitor pattern while building an application, this is not in fact a single pass process but a two pass one where the second pass is used to compute the actual indices of items identified during the first pass.
  3. The Dalvik virtual machine that executes the bytecode is a register based machine and not a stack based machine.
    • Method arguments, local variables but also intermediate computations and parameters of called methods are all put in registers in the stack frame.
    • Method arguments are put in the registers with the highest numbers.
    • Although formally indistinguishable there are at least four class of registers:
      • the 16th first registers are available for most operations, especially if the operation requires several registers.
      • The next limit is 256. Most unary operations use this limit because instructions are coded on 16 bits and the opcode takes 8 bits.
      • There is a small set of operation that can handle the potential 65536 different registers of a method. There is usually '16' in the name of the opcode.
    • There are two ways to specify the arguments of a method call:
      • If the number of arguments is small (5 or less), one can use any arbitrary combination of the first 16 registers.
      • Otherwise, it is necessary to use a slice of consecutive registers.
    Because of all the previous constraints, it can be difficult to introduce new intermediate computations. Even adding a new intermediate register is not easy because it must shift the method parameters and so it may move one of them out of the 16 registers limit that could be necessary for some operations.
  4. Finally there is a system constraint imposed by the virtual machine itself : because code is in fact a memory mapped structure comprising the whole code of the application, it is not possible to modify the code of a class dynamically. In fact, because of the second constraint, it would be hard to modify a single class anyway, without modifying the code of the other classes.

The last constraint limits the use of ASMDEX. Otherwise, the most severe programming constraint for the ASMDEX user is probably the third one. One way to get around is to try to introduce as few changes as possible and rather rely on external (but potentially generated) method to perform the real work. A future release of ASMDEX may provide a generic register allocation method visitor to simplify the development of transformations.

A simple example

Here is a canvas of a tool that logs some method calls according to a policy that identifies the methods to check. To keep the tutorial short and generic, we will not describe the class implementing the policy. We will not describe the generation of the log methods either.

The entry point

This is a very simple entry point. We consider the case where the classes.dex file has been extracted from the APK. You may want to perform the change in place. But in any case, you must sign your application again to install it.

public class AnnotateCalls {
	public static void main(String args[])  {
	FileOutputStream os = null;
	try {
			int api = Opcodes.ASM4;
			File inFile;
			File outFile;
			... // Argument validation
			AnnotRulesManager rm = ...; // Rules to apply
			ApplicationReader ar = new ApplicationReader(api, inFile);
			ApplicationWriter aw = new ApplicationWriter();
			ApplicationVisitor aa = new ApplicationAdapterAnnotateCalls(api, rm, aw);
			ar.accept(aa, 0);
			byte [] b = aw.toByteArray();
			os = new FileOutputStream(outFile);
			os.write(b);
		} catch (IOException e) { // recovery
		} finally { // cleanup }
	}
}

The application visitor

This is a new element specific to ASMDEX and represents the complete code unit (classes.dex). We use the end visitor to dump the new class. The system will take care of sorting the elements as they should be. logClassWriter relies on the rule manager to know which log methods it should define. You can use asmdexifier to define the template of your log method from an existing class.

public class ApplicationAdapterAnnotateCalls extends ApplicationVisitor {
 	final LogClassWriter logClassWriter;
 	final AnnotRulesManager ruleManager;
 	
 	public ApplicationAdapterAnnotateCalls(int api, AnnotRulesManager rm, ApplicationVisitor av) {
 		super(api, av);
 		ruleManager = rm;
 		logClassWriter = new LogClassWriter(rm,av);
 	}
 	
 	@Override
 	public ClassVisitor visitClass(int access, String name, String [] signature, String superName, String [] interfaces) {
 		ClassVisitor cv = av.visitClass(access, name, signature, superName, interfaces);
 		ClassAdapterAnnotateCalls ca = new ClassAdapterAnnotateCalls(api, ruleManager,cv);
 		return ca;
 	}
 	@Override
 	public void visitEnd() {
 		logClassWriter.addLogClass();
 		av.visitEnd();
 	}
 }

The class visitor

There is nothing interesting in this one. It delegates all the work to the method visitor.

public class ClassAdapterAnnotateCalls extends ClassVisitor {
 
 	final private AnnotRulesManager ruleManager;
 
 	public ClassAdapterAnnotateCalls(int api, AnnotRulesManager ruleManager, ClassVisitor cv) {
 		super(api, cv);
 		this.ruleManager = ruleManager;
 	}
 	
 	@Override
 	public MethodVisitor visitMethod(int access, String name, String desc, String[] signature,
 			String[] exceptions) {
 		MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions);
 		MethodAdapterAnnotateCalls ma = new MethodAdapterAnnotateCalls(api, ruleManager, mv);
 		return ma;
 	}
 }
 

The method visitor

This is where the real work is done. The method visiting method calls is redefined to generate a new method call followed by the original one when the method should be logged. Whether something must be done is in the result of the call to the rule manager. If this is a function, it should be understood as the name of a method in the generated log class. This log method is static and takes as many arguments as the controlled method.

  • Note that you cannot log a call to a constructor this way.
  • The syntax of method signature in Dalvik is different from the one used in Oracle JVM. It is coded as RA...A instead of (A...A)R where A stands for the type descriptor of an argument and R for the type descriptor of the result.
public class MethodAdapterAnnotateCalls extends MethodVisitor {
 
 	final private AnnotRulesManager ruleManager;
 
 	public MethodAdapterAnnotateCalls(int api, AnnotRulesManager ruleManager, MethodVisitor mv) {
 		super(api, mv);
 		this.ruleManager = ruleManager;
 	}
 	
 	@Override
 	public void visitMethodInsn(int opcode, java.lang.String owner, java.lang.String name, java.lang.String desc, int[] arguments) {
 		boolean isStatic;
 		String signature;
 		switch (opcode) {
 		case Opcodes.INSN_INVOKE_STATIC:
 		case Opcodes.INSN_INVOKE_STATIC_RANGE:
 			isStatic=true;
 			break;
 		default:
 			isStatic=false;
 		}
 		String logItName = ruleManager.log(owner,name,desc,isStatic);
 		if (logItName != null) {
 			if (isStatic) signature = "V" + MethodSignature.popType(desc);
 			else signature = "V" + owner + MethodSignature.popType(desc);
 			int opcodeStatic = (opcode < 0x74) ? Opcodes.INSN_INVOKE_STATIC : Opcodes.INSN_INVOKE_STATIC_RANGE; 
 			mv.visitMethodInsn(opcodeStatic, LogClassWriter.LOG_CLASSNAME, logItName, signature, arguments);
 		}
 		mv.visitMethodInsn(opcode, owner, name, desc, arguments);
 	}
 	
 }
 

We need to get the list of parameter's types without the result type and this is done by "poping" the result type from the representation. Here is the relevant code from the MethodSignature class:

public static String popType(String desc) {
    return desc.substring(nextTypePosition(desc,0));
}

public static int nextTypePosition(String desc, int pos) {
    while(desc.charAt(pos) == '[') pos ++;
    if (desc.charAt(pos) == 'L') pos = desc.indexOf(';', pos);
    pos ++;
    return pos;
}

Copyright © 1999-2009, OW2 Consortium | contact | webmaster | Last modified at 2016-12-23 11:07 AM