tag:blogger.com,1999:blog-8214401912480503366.post3300567118479430729..comments2023-08-10T13:35:15.093+02:00Comments on My life with Android :-): RenderScript in Android - the benchmark programGabor Pallerhttp://www.blogger.com/profile/14307475522972458932noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-8214401912480503366.post-55516028533751778242016-06-29T08:09:24.555+02:002016-06-29T08:09:24.555+02:00hello sir, my name is candra. i'm about to stu...hello sir, my name is candra. i'm about to study mobile programming. i have to make speech to text apps that implemented dynamic time warping algorithm. could you please some give me advice about what i have to do first sir?<br />if do , here's my email sir candra07dinata06@gmail.com<br />thank you sirAnonymoushttps://www.blogger.com/profile/14661507996841276849noreply@blogger.comtag:blogger.com,1999:blog-8214401912480503366.post-77039957463244015772014-01-28T02:22:00.593+01:002014-01-28T02:22:00.593+01:00Indeed, the optimizations provided for the Java si...Indeed, the optimizations provided for the Java side are basic. Marking the method static will help slightly, but it must also be noted that the current execution environment of running the method once without any warm up is a bad micro test essentially. Things approach or best 70% (66-72% improvement) with a warm up before grabbing a result of the Java test. I haven't tested if loop unrolling and removing the Math. methods will help, but it should be noted that doing so would likely slow down a one time / single execution of the calculation method. <br /><br />As far as JNI is concerned one actually doesn't need to fiddle with the glue code. Using a build tool like jnigen which is a standalone effort part of libgdx one can easily build and integrate inline to the Java code native methods and automatically generate all platform resources and not just Android. This solution trumps any serial Renderscript option since it's cross-platform.<br /><br />I posit that a parallel CPU Java version will likely be twice as fast as the current Renderscript version. A parallel OpenCL version upwards of 5-10 times faster than Renderscript.MichaelEGRhttps://www.blogger.com/profile/02130116004495834444noreply@blogger.comtag:blogger.com,1999:blog-8214401912480503366.post-68837382480459807442014-01-27T21:42:05.421+01:002014-01-27T21:42:05.421+01:00Thanks for the contribution, Michael. I re-evaluat...Thanks for the contribution, Michael. I re-evaluated the benchmark results, <a href="http://mylifewithandroid.blogspot.hu/2014/01/renderscript-in-android-java.html" rel="nofollow">here are the new results, along with the updated benchmark program.</a> In short: RenderScript is still 2.3 times faster.<br /><br />With regards to the NDK: there's a long discussion between the GCC and Clang community, who generates faster code. The reality is that the two compilers are head-to-head when it comes to the efficiency of the generated code. Clang however offers much faster compilation time. NDK implementation is about as fast as RenderScript but it is much faster to write a RenderScript function. The reasons: Clang/LLVM is better integrated into the SDK toolchain (one reason is that Clang compiles so quickly) and the SDK automatically generates all the glue code while in case of NDK you have to fiddle with all the complexities of the JNI.<br /><br />Parallelization is another issue, I will return to that later.Gabor Pallerhttps://www.blogger.com/profile/02390936870056951146noreply@blogger.comtag:blogger.com,1999:blog-8214401912480503366.post-79428788199358487922014-01-27T13:28:29.901+01:002014-01-27T13:28:29.901+01:00Thanks for the follow up blog post.
Unfortunately...Thanks for the follow up blog post.<br /><br />Unfortunately, like I said extraordinary claims need extraordinary evidence and you didn't deliver with obviously unoptimized Java code. With a few minor tweaks without even getting really fancy the core Java method performing the calculation can be reduced to ~27.5% (3-4 times) the execution time of your original code. <br /><br />I'm not sure if you intentionally tried to make the Java code slow, but you raised some serious doubts by commenting out the System.arraycopy and putting in a for loop to copy data between two arrays mind you once again inside of an existing loop! Also, the completely unnecessary "signalCost" method which just is a Math.abs() call is called 3 times (twice in a nested for loop). You should know that calling methods in tight loops is extremely bad form for performance. <br /><br />Here is the modified method:<br /><br />http://pastebin.com/mfmHKkCb<br /><br />Since you mention the Renderscript code having a 4-5 times improvement over your poorly written Java example I posit that the ~8-10% improvement is simply not worth the loss of cross-platform compatibility and headache free maintenance. The code is shorter, clearer, doesn't require developers to bother with Renderscript and multiple files. If one wanted to improve upon it then evaluating the NDK / C++ for a native cross-platform solution is certainly worthy. There are ways of writing ones C++ code inline to Java code and this method would be a perfect example for such a solution. <br /><br />Of course the parallel solution is the real avenue to explore. And multithreaded Java code is not going to be that far behind C++ and once again given the data set to be calculated one would implement a multithreaded parallel CPU version of the algorithm and where available OpenCL to greatly speed it up choosing which one depending on the data size to process. Do this once and you have an algorithm that runs cross-platform everywhere. <br /><br />I am curious if you care to revisit your assumptions as otherwise it seems confirmation bias is in play at least for the moment.MichaelEGRhttps://www.blogger.com/profile/02130116004495834444noreply@blogger.com