Tuesday, May 24, 2016

RenderScript Intrinsics

Posted by R. Jason Sams, Android RenderScript Tech Ld

RenderScript has a very powerful ability called Intrinsics. Intrinsics are built-in functions that perform well-defined operations often seen in processing. Intrinsics can be very helpful to you because they provide extremely high-performance implementations of standard functions with a minimal amount of .

RenderScript intrinsics will usually be the fastest possible way for a developer to perform these operations. We’ve worked closely with our partners to ensure that the intrinsics perform as fast as possible on their architectures — often far beyond anything that can be achieved in a eral-purpose language.

Table 1. RenderScript intrinsics and the operations they provide.


ScriptIntrinsicConvolve3x3, ScriptIntrinsicConvolve5x5
Performs a 3x3 or 5x5 convolution.

Performs a Gaussian blur. Supports grayscale and RGBA buffers and is used by the system framework for drop shadows.

Converts a YUV buffer to RGB. Often used to process camera data.

Applies a 4x4 color matrix to a buffer.

Blends two alloions in a variety of ways.

Applies a per-channel lookup table to a buffer.

Applies a color cube with interpolation to a buffer.

Your appliion can use one of these intrinsics with very little . For example, to perform a Gaussian blur, the appliion can do the following:

RenderScript rs = RenderScript.crte(theActivity);
ScriptIntrinsicBlur theIntrinsic = ScriptIntrinsicBlur.crte(mRS, Element.U8_4(rs));;
Alloion tmpIn = Alloion.crteFromBitmap(rs, inputBitmap);
Alloion tmpOut = Alloion.crteFromBitmap(rs, outputBitmap);

This example crtes a RenderScript context and a Blur intrinsic. It then uses the intrinsic to perform a Gaussian blur with a 25-pixel radius on the alloion. The default implementation of blur uses carefully hand-tuned assembly , but on some hardware it will instd use hand-tuned GPU .

What do developers get from the tuning that we’ve done? On the new 7, running that same 25-pixel radius Gaussian blur on a 1.6 megapixel takes about 176ms. A simpler intrinsic like the color matrix operation takes under 4ms. The intrinsics are typically 2-3x faster than a multithrded C implementation and often 10x+ faster than a Java implementation. Pretty good for eight lines of .

style="border:1px solid #ddd;border-radius: 6px;" />
Figure 1. Performance gains with RenderScript intrinsics, relative to equivalent multithrded C implementations.

Appliions that need additional functionality can mix these intrinsics with their own RenderScript kernels. An example of this would be an appliion that is taking camera preview data, converting it from YUV to RGB, adding a vignette effect, and uploading the final to a SurfaceView for display.

In this example, we’ve got a strm of data flowing between a source device (the camera) and an output device (the display) with a of possible processors along the way. Today, these operations can all run on the CPU, but as architectures become more advanced, using other processors becomes possible.

For example, the vignette operation can happen on a compute-capable GPU (like the ARM Mali T604 in the 10), while the YUV to RGB conversion could happen directly on the camera’s signal processor (ISP). Using these different processors could significantly improve power consumption and performance. As more these processors become available, future Android updates will enable RenderScript to run on these processors, and appliions written for RenderScript today will begin to make use of those processors transparently, without any additional work for developers.

Intrinsics provide developers a powerful tool they can leverage with minimal effort to achieve grt performance across a wide variety of hardware. They can be mixed and matched with eral purpose developer allowing grt flexibility in appliion design. So next time you have performance issues with manipulation, I hope you give them a look to see if they can help.
Join the discussion on

+Android Developers

No comments:

Post a Comment