
Posted by Santiago Aboy Solanes – Software program Engineer, Vladimír Marko – Software program Engineer
The Android Runtime (ART) staff has decreased compile time by 18% with out compromising the compiled code or any peak reminiscence regressions. This enchancment was a part of our 2025 initiative to enhance compile time with out sacrificing reminiscence utilization or the standard of the compiled code.
Optimizing compile-time pace is essential for ART. For instance, when just-in-time (JIT) compiling it immediately impacts the effectivity of functions and general machine efficiency. Quicker compilations scale back the time earlier than the optimizations kick in, resulting in a smoother and extra responsive consumer expertise. Moreover, for each JIT and ahead-of-time (AOT), enhancements in compile-time pace translate to decreased useful resource consumption through the compilation course of, benefiting battery life and machine thermals, particularly on lower-end units.
A few of these compile-time pace enhancements launched within the June 2025 Android launch, and the remainder can be out there within the end-of-year launch of Android. Moreover, all Android customers on variations 12 and above are eligible to obtain these enhancements by mainline updates.
Optimizing a compiler is all the time a sport of trade-offs. You possibly can’t simply get pace without spending a dime; it’s important to give one thing up. We set a really clear and difficult objective for ourselves: make the compiler sooner, however do it with out introducing reminiscence regressions and, crucially, with out degrading the standard of the code it produces. If the compiler is quicker however the apps run slower, we have failed.
The one useful resource we have been prepared to spend was our personal improvement time to dig deep, examine, and discover intelligent options that met these strict standards. Let’s take a better have a look at how we work to search out areas to enhance, in addition to discovering the correct options to the varied issues.
Discovering worthwhile potential optimizations
Earlier than you possibly can start to optimize a metric, you could have to have the ability to measure it. In any other case, you possibly can’t ever ensure should you improved it or not. Fortunately for us, compile time pace is pretty constant so long as you’re taking some precautions like utilizing the identical machine you utilize for measuring earlier than and after a change, and ensuring you don’t thermal throttle your machine. On prime of that, we even have deterministic measurements like compiler statistics that assist us perceive what’s happening beneath the hood.
For the reason that useful resource we have been sacrificing for these enhancements was our improvement time, we wished to have the ability to iterate as quick as we may. This meant that we grabbed a handful of consultant apps (a mixture of first-party apps, third-party apps, and the Android working system itself) to prototype options. Later, we verified that the ultimate implementation was price it with each handbook and automatic testing in a widespread method.
With that set of hand-picked apks we might set off a handbook compile domestically, get a profile of the compilation, and use pprof to visualise the place we’re spending our time.
Instance of a profile’s flame graph in pprof
The pprof software may be very highly effective and permits us to slice, filter, and kind the info to see, for instance, which compiler phases or strategies are taking more often than not. We won’t go into element about pprof itself; simply know that if the bar is greater then it means it took extra time of the compilation.
Certainly one of these views is the “backside up” one the place you possibly can see which strategies are taking more often than not. Within the picture under we are able to see a way referred to as Kill, accounting for over a 1% of the compile time. Among the different prime strategies may even be mentioned later within the weblog submit.
Backside up view of a profile
In our optimizing compiler, there’s a section referred to as International Worth Numbering (GVN). You don’t have to fret about what it does as a complete, however the related half is to know that it has a way referred to as `Kill` that it’ll delete some nodes in keeping with a filter. That is time consuming because it has to iterate by all of the nodes and test one after the other. We seen that there are some instances during which we all know prematurely that the test can be false, regardless of the nodes we now have alive at that time. In these instances, we are able to skip iterating altogether, bringing it from 1.023% all the way down to ~0.3% and bettering GVN’s runtime by ~15%.
Implementing worthwhile optimizations
We lined how one can measure and how one can detect the place the time is being spent, however that is solely the start. The subsequent step is how one can optimize the time being spent compiling.
Often, in a case just like the `Kill` one above we might check out how we iterate by the nodes and do it sooner by, for instance, doing issues in parallel or bettering the algorithm itself. The truth is, that’s what we tried at first and solely once we couldn’t discover something to can we had a “Wait a minute…” second and noticed that the answer was to (in some instances) not iterate in any respect! When doing these sorts of optimizations, it’s straightforward to overlook the forest for the bushes.
In different instances, we used a handful of various strategies together with:
utilizing heuristics to resolve whether or not an optimization will fail to supply worthwhile outcomes and due to this fact may be skipped
utilizing further information buildings to cache computed information
altering the present information buildings to get a pace increase
lazily computing outcomes to keep away from cycles in some instances
use the correct abstraction – pointless options can decelerate the code
keep away from chasing a ceaselessly used pointer by many masses
How do we all know if the optimizations are price pursuing?
That’s the neat half, you don’t. After detecting that an space is consuming a number of compile time and after devoting improvement time to attempt to enhance it, typically you possibly can’t simply discover a answer. Perhaps there’s nothing to do, it is going to take too lengthy to implement, it is going to regress one other metric considerably, enhance code base complexity, and so forth. For each profitable optimization that you would be able to see on this weblog submit, know that there are numerous others that simply didn’t come to fruition.
In case you are in the same scenario, attempt to estimate how a lot you will enhance the metric by doing as little work as you possibly can. This implies, so as:
Estimating with a metrics you could have already collected, or only a intestine feeling
Estimating with a fast and soiled prototype
Implement an answer.
Don’t neglect to think about estimating the drawbacks of your answer. For instance, if you will depend on further information buildings, how a lot reminiscence are you prepared to make use of?
With out additional ado, let’s have a look at a few of the adjustments we applied.
We applied a change to optimize a way referred to as FindReferenceInfoOf. This methodology was doing a linear search of a vector to search out an entry. We up to date that information construction to be listed by the instruction’s id in order that FindReferenceInfoOf can be O(1) as a substitute of O(n). Additionally, we pre-allocated the vector to keep away from resizing. We barely elevated reminiscence as we had so as to add an additional subject which counted what number of entries we inserted within the vector, nevertheless it was a small sacrifice to make as the height reminiscence didn’t enhance. This sped up our LoadStoreAnalysis section by 34-66% which in turns provides ~0.5-1.8% compile time enchancment.
We now have a customized implementation of HashSet that we use in a number of locations. Creating this information construction was taking a substantial period of time and we discovered why. A few years in the past, this information construction was utilized in only some locations that have been utilizing very huge HashSets and it was tweaked to be optimized for that. Nonetheless, these days it was utilized in the wrong way with only some entries and with a brief lifespan. This meant that we have been losing cycles by creating this enormous HashSet however we solely used it for a couple of entries earlier than discarding it. With this modification, compile time improved ~1.3-2% of compile time. As an added bonus, reminiscence utilization decreased by ~0.5-1% since we weren’t utilizing as huge information buildings as earlier than.
We improved ~0.5-1% of compile time by passing information buildings by reference to the lambda to keep away from copying them round. This was one thing that was missed within the authentic overview and sat in our codebase for years. It was because of looking on the profiles in pprof that we seen that these strategies have been creating and destroying a number of information buildings, which led us to research and optimize them.
We sped up the section that writes the compiled output by caching computed values, which translated to ~1.3-2.8% of complete compile time enchancment. Sadly, the additional bookkeeping was an excessive amount of and our automated testing alerted us of the reminiscence regression. Later, we took a second have a look at the identical code and applied a new model which not solely took care of the reminiscence regression but additionally improved the compile time an extra ~0.5-1.8%! On this second change we needed to refactor and reimagine how this section ought to work, with a purpose to eliminate one of many two information buildings.
We now have a section in our optimizing compiler which inlines operate calls with a purpose to get higher efficiency. To decide on which strategies to inline we use each heuristics earlier than we do any computation, and last checks after doing work however proper earlier than we finalize the inlining. If any of these detect that the inlining isn’t price it (for instance, too many new directions can be added), then we don’t inline the tactic name.
We moved two checks from the “last checks” class to the “heuristic” class to estimate whether or not an inlining will succeed or not earlier than we do any time-expensive computation. Since that is an estimate it isn’t good, however we verified that our new heuristics cowl 99.9% of what was inlined earlier than with out affecting efficiency. Certainly one of these new heuristics was in regards to the wanted DEX registers (~0.2-1.3% enchancment), and the opposite one about variety of directions (~2% enchancment).
We now have a customized implementation of a BitVector that we use in a number of locations. We changed the resizable BitVector class with a less complicated BitVectorView for sure fixed-size bit vectors. This eliminates some indirections and run-time vary checks and hastens the development of the bit vector objects.
Moreover, the BitVectorView class was templatized on the underlying storage kind (as a substitute of all the time utilizing uint32_t because the outdated BitVector). This permits some operations, for instance Union(), to course of twice as many bits collectively on 64-bit platforms. The samples of the affected features have been decreased by greater than 1% in complete when compiling the Android OS. This was finished throughout a number of adjustments [1, 2, 3, 4, 5, 6]
If we talked intimately about all of the optimizations we might be right here all day! In case you are fascinated with some extra optimizations, check out another adjustments we applied:
Our dedication to bettering ART’s compile-time pace has yielded vital enhancements, making Android extra fluid and environment friendly whereas additionally contributing to raised battery life and machine thermals. By diligently figuring out and implementing optimizations, we have demonstrated that substantial compile-time beneficial properties are potential with out compromising reminiscence utilization or code high quality.
Our journey concerned profiling with instruments like pprof, a willingness to iterate, and typically even abandon much less fruitful avenues. The collective efforts of the ART staff haven’t solely decreased compile time by a noteworthy proportion, however have additionally laid the groundwork for future developments.
All of those enhancements can be found within the 2025 end-of-year Android replace, and for Android 12 and above by mainline updates. We hope this deep dive into our optimization course of offers worthwhile insights into the complexities and rewards of compiler engineering!


