Wednesday, October 29, 2014

Android Application Designing for Performance

An Android application should be fast. Well, it's probably more accurate to say that it should be efficient. That is, it should execute as efficiently as possible in the mobile device environment, with its limited computing power and data storage, smaller screen, and constrained battery life.
In this blog I am providing best practices to follow to design for performance.

1. Avoid Creating Objects
Object creating in android is more expensive than creating object in Java. Try to avoid creating objects as possible as you do. More objects mean more work to GC. More works to GC means little "hiccups" in the user experience.

Generally speaking, avoid creating short-term temporary objects if you can. Fewer objects created mean less-frequent garbage collection, which has a direct impact on user experience.  If possible pool the objects, so it can be reused.

2.  Keep Blocking Operations Off the Main UI Thread
Use AsyncTask, Thread, IntentService, or Custom background service to do the heavy work. Use Loaders to simplify state management of long loading data, such as cursors. You cannot afford for your application to lag or freeze while some processing is going on. If an operation takes time and resources, offload that processing and perform it asynchronously so that your application remains responsive and the user can go about their business

3. Use Native Methods
 C/C++ code that easily runs 10-100x faster than doing the same thing in a Java loop.

4. Prefer Virtual Over Interface
Suppose you have a HashMap object. You can declare it as a HashMap or as a generic Map:

Map myMap1 = new HashMap();

HashMap myMap2 = new HashMap();
Which is better?

Conventional wisdom says that you should prefer Map, because it allows you to change the underlying implementation to anything that implements the Map interface. Conventional wisdom is correct for conventional programming, but isn't so great for embedded systems. Calling through an interface reference can take 2x longer than a virtual method call through a concrete reference.

If you have chosen a HashMap because it fits what you're doing, there is little value in calling it a Map. Given the availability of IDEs that refractor your code for you, there's not much value in calling it a Map even if you're not sure where the code is headed. (Again, though, public APIs are an exception: a good API usually trumps small performance concerns.)

5. Prefer Static Over Virtual
If you don't need to access an object's fields, make your method static. It can be called faster, because it doesn't require a virtual method table indirection. It's also good practice, because you can tell from the method signature that calling the method can't alter the object's state.

6. Avoid Internal Getters/Setters
In native languages like C++ it's common practice to use getters (e.g. i = getCount()) instead of accessing the field directly (i = mCount). This is an excellent habit for C++, because the compiler can usually inline the access, and if you need to restrict or debug field access you can add the code at any time.

On Android, this is a bad idea. Virtual method calls are expensive, much more so than instance field lookups. It's reasonable to follow common object-oriented programming practices and have getters and setters in the public interface, but within a class you should always access fields directly.

7. Declare Constants Final
Consider the following declaration at the top of a class:

static int intVal = 42;
static String strVal = "Hello, world!";
The compiler generates a class initializer method, called <clinit>, that is executed when the class is first used. The method stores the value 42 into intVal, and extracts a reference from the classfile string constant table for strVal. When these values are referenced later on, they are accessed with field lookups.

We can improve matters with the "final" keyword:
static final int intVal = 42;
static final String strVal = "Hello, world!";
The class no longer requires a <clinit> method, because the constants go into classfile static field initializers, which are handled directly by the VM. Code accessing intVal will use the integer value 42 directly, and accesses to strVal will use a relatively inexpensive "string constant" instruction instead of a field lookup.

Declaring a method or class "final" does not confer any immediate performance benefits, but it does allow certain optimizations. For example, if the compiler knows that a "getter" method can't be overridden by a sub-class, it can inline the method call.

You can also declare local variables final. However, this has no definitive performance benefits. For local variables, only use "final" if it makes the code clearer (or you have to, e.g. for use in an anonymous inner class).

8. Use Enhanced For Loop Syntax with Caution
The enhanced for loop (also sometimes known as "for-each" loop) can be used for collections that implement the Iterable interface. With these objects, an iterator is allocated to make interface calls to hasNext() and next(). With an ArrayList, you're better off walking through it directly, but for other collections the enhanced for loop syntax will be equivalent to explicit iterator usage.

Nevertheless, the following code shows an acceptable use of the enhanced for loop:


public class Foo {
    int mSplat;
    static Foo mArray[] = new Foo[27];

    public static void zero() {
        int sum = 0;
        for (int i = 0; i < mArray.length; i++) {
            sum += mArray[i].mSplat;
        }
    }

    public static void one() {
        int sum = 0;
        Foo[] localArray = mArray;
        int len = localArray.length;

        for (int i = 0; i < len; i++) {
            sum += localArray[i].mSplat;
        }
    }

    public static void two() {
        int sum = 0;
        for (Foo a: mArray) {
            sum += a.mSplat;
        }
    }
}

zero() retrieves the static field twice and gets the array length once for every iteration through the loop.

one() pulls everything out into local variables, avoiding the lookups.

two() uses the enhanced for loop syntax introduced in version 1.5 of the Java programming language. The code generated by the compiler takes care of copying the array reference and the array length to local variables, making it a good choice for walking through all elements of an array. It does generate an extra local load/store in the main loop (apparently preserving "a"), making it a teensy bit slower and 4 bytes longer than one().

To summarize all that a bit more clearly: enhanced for loop syntax performs well with arrays, but be cautious when using it with Iterable objects since there is additional object creation.

9. Avoid Enums
Enums are very convenient, but unfortunately can be painful when size and speed matter. For example, this:

public class Foo {
   public enum Shrubbery { GROUND, CRAWLING, HANGING }
}
turns into a 900 byte .class file (Foo$Shrubbery.class). On first use, the class initializer invokes the <init> method on objects representing each of the enumerated values. Each object gets its own static field, and the full set is stored in an array (a static field called "$VALUES"). That's a lot of code and data, just for three integers.

This:

Shrubbery shrub = Shrubbery.GROUND;
causes a static field lookup. If "GROUND" were a static final int, the compiler would treat it as a known constant and inline it.

The flip side, of course, is that with enums you get nicer APIs and some compile-time value checking. So, the usual trade-off applies: you should by all means use enums for public APIs, but try to avoid them when performance matters.

In some circumstances it can be helpful to get enum integer values through the ordinal() method. For example, replace:


for (int n = 0; n < list.size(); n++) {
    if (list.items[n].e == MyEnum.VAL_X)
       // do stuff 1
    else if (list.items[n].e == MyEnum.VAL_Y)
       // do stuff 2
}
with:

   int valX = MyEnum.VAL_X.ordinal();
   int valY = MyEnum.VAL_Y.ordinal();
   int count = list.size();
   MyItem items = list.items();

   for (int  n = 0; n < count; n++)
   {
        int  valItem = items[n].e.ordinal();

        if (valItem == valX)
          // do stuff 1
        else if (valItem == valY)
          // do stuff 2
   }

In some cases, this will be faster, though this is not guaranteed.


10. Use Package Scope with Inner Classes
Consider the following class definition:


public class Foo {
    private int mValue;

    public void run() {
        Inner in = new Inner();
        mValue = 27;
        in.stuff();
    }


    private void doStuff(int value) {
        System.out.println("Value is " + value);
    }

    private class Inner {
        void stuff() {
            Foo.this.doStuff(Foo.this.mValue);
        }
    }
}

The key things to note here are that we define an inner class (Foo$Inner) that directly accesses a private method and a private instance field in the outer class. This is legal, and the code prints "Value is 27" as expected.

The problem is that Foo$Inner is technically (behind the scenes) a totally separate class, which makes direct access to Foo's private members illegal. To bridge that gap, the compiler generates a couple of synthetic methods:


/*package*/ static int Foo.access$100(Foo foo) {
    return foo.mValue;
}
/*package*/ static void Foo.access$200(Foo foo, int value) {
    foo.doStuff(value);
}

The inner-class code calls these static methods whenever it needs to access the "mValue" field or invoke the "doStuff" method in the outer class. What this means is that the code above really boils down to a case where you're accessing member fields through accessor methods instead of directly. Earlier we talked about how accessors are slower than direct field accesses, so this is an example of a certain language idiom resulting in an "invisible" performance hit.

We can avoid this problem by declaring fields and methods accessed by inner classes to have package scope, rather than private scope. This runs faster and removes the overhead of the generated methods. (Unfortunately it also means the fields could be accessed directly by other classes in the same package, which runs counter to the standard OO practice of making all fields private. Once again, if you're designing a public API you might want to carefully consider using this optimization.)

11. Avoid Float
Before the release of the Pentium CPU, it was common for game authors to do as much as possible with integer math. With the Pentium, the floating point math co-processor became a built-in feature, and by interleaving integer and floating-point operations your game would actually go faster than it would with purely integer math. The common practice on desktop systems is to use floating point freely.

Unfortunately, embedded processors frequently do not have hardware floating point support, so all operations on "float" and "double" are performed in software. Some basic floating point operations can take on the order of a millisecond to complete.

Also, even for integers, some chips have hardware multiply but lack hardware divide. In such cases, integer division and modulus operations are performed in software — something to think about if you're designing a hash table or doing lots of math.

12. Some Sample Performance Numbers
To illustrate some of our ideas, here is a table listing the approximate run times for a few basic actions. Note that these values should NOT be taken as absolute numbers: they are a combination of CPU and wall clock time, and will change as improvements are made to the system. However, it is worth noting how these values apply relative to each other — for example, adding a member variable currently takes about four times as long as adding a local variable.

Action
Time
Add a local variable
1
Add a member variable
4
Call String.length()
5
Call empty static native method
5
Call empty static method
12
Call empty virtual method
12.5
Call empty interface method
15
Call Iterator:next() on a HashMap
165
Call put() on a HashMap
600
Inflate 1 View from XML
22,000
Inflate 1 LinearLayout containing 1 TextView
25,000
Inflate 1 LinearLayout containing 6 View objects
100,000
Inflate 1 LinearLayout containing 6 TextView objects
135,000
Launch an empty activity
3,000,000

13. Closing Notes
The best way to write good, efficient code for embedded systems is to understand what the code you write really does. If you really want to allocate an iterator, by all means use enhanced for loop syntax on a List; just make it a deliberate choice, not an inadvertent side effect.

Forewarned is forearmed! Know what you're getting into! Insert your favorite maxim here, but always think carefully about what your code is doing, and be on the lookout for ways to speed it up.