Wednesday, January 22, 2014

RenderScript in Android - why should you care?

Before 3.0, Android application development meant either using the SDK - which is a Java toolchain - or the NDK - which is a C/C++ toolchain based on the gcc compiler. The 3.0 version added a third option, RenderScript. RenderScript was positioned as an auxiliary toolchain for special tasks like image manipulation operations. In this post I will investigate whether it makes sense to use RenderScript for more mainstream coding problems.

If you search the web for RenderScript, you won't get too many hits. This is due to the fact that RenderScript is really just the Google name for the technology which is based on two powerful software packages: the clang compiler and the LLVM "virtual machine". Clang is a C/C++ front-end compiler and is a strong challenger to gcc. RenderScript code is processed by clang and is turned into LLVM bytecode. LLVM is not a virtual machine in a strong sense as it does not "execute" this bytecode. Rather, it provides a second compilation step when it turns the LLVM bytecode into machine code of the target processor.

Much like Android's new runtime, the ART, LLVM is an Ahead-of-Time compiler. The second compilation step either happens on the developer's machine or directly on the device. The first option means that the Clang-LLVM toolchain generates code for 3 processor families: ARM, MIPS and Intel so your APK file already contains native code generated from RenderScript code for these processors and no further compilation is needed on the device. RenderScript code may run on graphical co-processors and other DSPs, however. In this case the on-device LLVM compiler accesses the LLVM bytecode which is also packaged into the APK file and may generate code for these specialized processors at install time.

 So if you write RenderScript instead of native implementation with the traditional NDK toolchain, you get the performance of the native code, support of many processor architectures including more exotic GPUs and DSPs without any hassles, get a simpler integration with the Java code and a much smoother integration with the Eclipse developer environment.

RenderScript appeared in Android 3.0 and supporting Java packages reside under the android.renderscript hierarchy. This is called native RenderScript. Google, however, backported the feature to Android 2.2 (and higher) and this backported version is now the recommended way. Instead of being part of the platform, RenderScript support now resides in a support library and is under the package hierarchy. Functionality of android.renderscript and are largely similar.

Even though the RenderScript runtime is general-purpose, Google positions the technology as a solution for specific tasks. The RenderScript tutorial, for example, concentrates mainly on the filtering runtime that executes a RenderScript function - so called "kernel" - on a large number of data items. This assumes a certain class of algorithm, i.e. when the output of the algorithm depends solely on the input and not on some intermediate result of the computation. Simple convolution filters (e.g. the 2D filters used widely for image filtering) are such algorithms and therefore they are very suitable for parallel execution on multiple CPU cores. We will see, however, that RenderScript can be exploited to speed up algorithms that do not fit into this computation model.

In the next post I will present our benchmark program.


MichaelEGR said...

There is another option that needs mention in regard to parallel programming which is OpenCL. It has wide industry support at all levels from hardware vendors and is burgeoning on mobile / Android.

You definitely reference some serious concerns:
>This assumes a certain class of algorithm, i.e. when the output of the algorithm depends solely on the input and not on some intermediate result of the computation.

>We will see, however, that RenderScript can be exploited to speed up algorithms that do not fit into this computation model.

As things go Renderscript does not allow advanced parallel programming because it does not support features necessary to do advanced algorithm development in a work efficient manner. It can handle basic 2D processing of data, but anything more than that and things quickly go downhill. For instance it's impossible to do anything that requires parallel scan techniques which is the gateway to intermediate / advanced parallel algorithms.

Another concern is the amount of bugs inherent in the engineering process / culture surrounding Google and its roll out of features over time. Just check the Android issue tracker and tons of tools related issues are present for Renderscript and there are other hard bugs even in script intrinsics (issue # 61159 for instance). Fragmentation is alive and well within the confines of Renderscript.

As things go there is a bit of FUD coming from the Google side on the merits of Renderscript versus OpenCL. Google is suppressing OpenCL on its Nexus line of devices, but luckily various OEMs are exposing OpenCL on Android. Sony is one of those OEMs, but more will follow. All mobile GPU / SoC manufacturers support OpenCL, but it is up to the OEM to expose it. Sony devices are available now. I'm using the ODROID-XU as my OpenCL / Android test platform.

The next generation of mobile GPUs (Tegra K1 / Adreno 4xx) will support OpenCL to an even greater degree (even likely Cuda for NVidia SoCs). What I mean by that is current generation mobile GPUs have work group limitations, but the next generation referenced above will not have these limitations, so advanced algorithms will be possible via OpenCL on mobile. There of course is also full profile OpenGL 4.x and when it is imminently exposed on mobile there are also compute shaders.

If one is going to delve into low level parallelism Renderscript is a high risk solution that doesn't address the scope of problems one really wants to solve regarding parallel algorithm development whereas OpenCL is likely going to be the winner in the mid / long run.

Gabor Paller said...

Thanks for your comment, Michael, it is a very valuable insight. I agree with you that RenderScript parallelization features are naive, to put it lightly. It is just going to be one conclusion of this series of posts (of which only the first post was published).

My approach is much simpler. I will claim that even if you don't exploit parallelization, you may be better of than either in Java or in NDK.

MichaelEGR said...

I should mention that I want to see OpenCL and Renderscript widely available on Android.

Renderscript is an interesting experiment, but IMHO reveals more about the psyche / culture within Google / Android team than providing the solution the larger development community is demanding / needs.

There should be a spectrum of computation possibilities on Android; Renderscript & OpenCL.

> will claim that even if you don't exploit parallelization, you may be better of than either in Java or in NDK.

I am interested in seeing content that supports this claim as it seems like an extraordinary one which needs extraordinary evidence.

Doing multithreaded data crunching is not that hard in Java and the code one creates runs on the desktop and Android without a rewrite. In fact it's way easier to be able to do development on the desktop and instantly everything also runs on Android with no porting. The executor framework and atomic CAS operations are enough for most processing. If moving a lot of data between threads then utilizing the Disruptor or other customized queue implementation (not the ones in the java.util.concurrency API) lead to efficient Java / CPU based data crunching that runs everywhere.

It also depends. Another major limitation of Renderscript is that the developer can't control where computation occurs. Renderscript doesn't allow one to choose execution on the CPU or GPU which breaks a major rule in data oriented design which is "know thy data and process it accordingly".

Overwhelmingly in most low level computation tasks it's very important to be able to control where computation occurs depending on the immediate data set being operated on for that frame of execution which may change every time the task is computed.

Have a small data set then computing on the CPU is faster since there aren't data transfer costs which generally affect offloading small data sets to the GPU. If the data set is larger for the frame at hand then one explicitly wants to choose the GPU for computation as the data transfer costs are offset by the speed of computation.

Proper low level implementations of many algorithms should choose frame by frame a Java / NDK algorithm or GPU / highly parallel computation.

It should be noted that with OpenCL 2.0 there is the ability to share data between CPU / GPU / CL device which will reduce / eliminate the overhead of transferring small and large data sets.

I'm interested to read your follow up posts though because no one has really provided a post / proof that choosing Renderscript beats Java / NDK based CPU implementations for performance or more importantly matters of convenience. On speed there should be no real difference though I suppose it's a pain to compare since one doesn't know where Renderscript is operating for a comparison. I put forth that dealing with all the tools problems with Renderscript, and learning all its caveats including possible bugs / hard fragmentation is more difficult and less maintainable than working with Java / NDK based solutions. The maintainable angle regards current and future maintenance as developers come and go working on a project or for a company.

There are also cross-platform concerns such that it's time efficient to develop on the desktop for many data crunching tasks compared to dealing with Android and the app deployment / startup costs for testing apps. Let alone the ability to deploy an OpenCL based solution everywhere from desktop, Android, and iOS. A public OpenCL API on iOS is imminent. Not having to rewrite computation algorithms for each platform (desktop, iOS, Android, etc.) is the reason folks choose the NDK in the first place.

I may be pressed for time in the coming months, but perhaps I can offer Java and OpenCL based implementations of whatever your Renderscript demos will encompass. I already have a portable framework to plug in the computation at hand.