0%

Preventing mobile performance regressions with Maestro

1 시간 전 7 분 읽기
뉴스 기사 배너 이미지

Mobile performance vitals

In browsers there is already an industry standard set of metrics to measure performance in the Core Web Vitals, and while they are by no means perfect, they focus on the actual impact on the user experience. We wanted to have something similar but for apps, so we adopted App Render Complete and Navigation Total Blocking Time as our two most important metrics.

  • App Render Complete is the time it takes to open the cold boot the app for an authenticated user, to it being fully loaded and interactive, roughly equivalent to Time To Interactive in the browser.

  • Navigation Total Blocking Time is the time the application is blocked from processing code during the 2 second window after a navigation. It’s a proxy for overall responsiveness in lieu of something better like Interaction to Next Paint.

We still collect a slew of other metrics – such as render times, bundle sizes, network requests, frozen frames, memory usage etc. – but they are indicators to tell us why something went wrong rather than how our users perceive our apps.

Their advantage over the more holistic ARC/NTBT metrics is that they are more granular and deterministic. For example, it’s much easier to reliably impact and detect that bundle size increased or that total bandwidth usage decreased, but it doesn’t automatically translate to a noticeable difference for our users.

Collecting metrics

In the end, what we care about is how our apps run on our users’ actual physical devices, but we also want to know how an app performs before we ship it. For this we leverage the Performance API (via react-native-performance) that we pipe to Sentry for Real User Monitoring, and in development this is supported out of the box by Rozenite.

But we also wanted a reliable way to benchmark and compare two different builds to know whether our optimizations move the needle or new features regress performance. Since Maestro was already used for our End to End test suite, we simply extended that to also collect performance benchmarks in certain key flows.

To adjust for flukes we ran the same flow many times on different devices in our CI and calculated statistical significance for each metric. We were now able to compare each Pull Request to our main branch and see how they fared performance wise. Surely, performance regressions were a thing of the past.

Reality check

In practice, this didn’t have the outcomes we had hoped for a few reasons. First we saw that the automated benchmarks were mainly used when developers wanted validation that their optimizations had an effect – which in itself is important and highly valuable – but this was typically after we had seen a regression in Real User Monitoring, not before.

To address this we started running benchmarks between release branches to see how they fared. While this did catch regressions, they were typically hard to address as there was a full week of changes to go through – something our release managers simply weren’t able to do in every instance. Even if they found the cause, simply reverting often wasn’t a possibility.

On top of that, the App Render Complete metric was network-dependent and non-deterministic, so if the servers had extra load that hour or if a feature flag turned on, it would affect the benchmarks even if the code didn’t change, invalidating the statistical significance calculation.

Precision, specificity and variance

We had to go back to the drawing board and reconsider our strategy. We had three major challenges:

  1. Precision: Even if we could detect that a regression had occurred, it was not clear to us what change caused it.

  2. Specificity: We wanted to detect regressions caused by changes to our mobile codebase. While user impacting regressions in production for whatever reason is crucial in production, the opposite is true for pre-production where we want to isolate as much as possible.

  3. Variance: For reasons mentioned above, our benchmarks simply weren’t stable enough between each run to confidently say that one build was faster than another.

The solution to the precision problem was simple; we just needed to run the benchmarks for every merge, that way we could see on a time series graph when things changed. This was mainly an infrastructure problem, but thanks to optimized pipelines, build process and caching we were able to cut down the total time to about 8 minutes from merge to benchmarks ready.

When it comes to specificity, we needed to cut out as many confounding factors as possible, with the backend being the main one. To achieve this we first record the network traffic, and then replay it during the benchmarks, including API requests, feature flags and websocket data. Additionally the runs were spread out across even more devices.

Together, these changes also contributed to solving the variance problem, in part by reducing it, but also by increasing the sample size by orders of magnitude. Just like in production, a single sample never tells the whole story, but by looking at all of them over time it was easy to see trend shifts that we could attribute to a range of 1-5 commits.

Alerting

As mentioned above, simply having the metrics isn’t enough, as any regression needs to be actioned quickly, so we needed an automated way to alert us. At the same time, if we alerted too often or incorrectly due to inherent variance, it would go ignored.

After trialing more esoteric models like Bayesian online changepoint, we settled on a much simpler moving average. When a metric regresses more than 10% for at least two consecutive runs we fire an alert.

Next steps

While detecting and fixing regressions before a release branch is cut is fantastic, the holy grail is to prevent them from getting merged in the first place.

What’s stopping us from doing this at the moment is twofold: on one hand running this for every commit in every branch requires even more capacity in our pipelines, and on the other hand having enough statistical power to tell if there was an effect or not.

The two are antagonistic, meaning that given that we have the same budget to spend, running more benchmarks across fewer devices would reduce statistical power.

The trick we intend to apply is to spend our resources smarter – since effect can vary, so can our sample size. Essentially, for changes with big impact, we can do fewer runs, and for changes with smaller impact we do more runs.

Making mobile performance regressions observable and actionable

By combining Maestro-based benchmarks, tighter control over variance, and pragmatic alerting, we have moved performance regression detection from a reactive exercise to a systematic, near-real-time signal.

While there is still work to do to stop regressions before they are merged, this approach has already made performance a first-class, continuously monitored concern – helping us ship faster without getting slower.

Explore open engineering roles at Kraken

The post appeared first on Kraken Blog.

인기 뉴스

How to Set Up and Use Trust Wallet for Binance Smart Chain
#Bitcoin#Bitcoins#Config+2 더 많은 태그

How to Set Up and Use Trust Wallet for Binance Smart Chain

Your Essential Guide To Binance Leveraged Tokens

Your Essential Guide To Binance Leveraged Tokens

How to Sell Your Bitcoin Into Cash on Binance (2021 Update)
#Subscriptions

How to Sell Your Bitcoin Into Cash on Binance (2021 Update)

What is Grid Trading? (A Crypto-Futures Guide)

What is Grid Trading? (A Crypto-Futures Guide)

Cryptohopper에서 무료로 거래를 시작하세요!

무료 사용 - 신용카드 필요 없음

시작하기
Cryptohopper appCryptohopper app

면책 조항: Cryptohopper는 규제 기관이 아닙니다. 암호화폐 봇 거래에는 상당한 위험이 수반되며 과거 실적이 미래 결과를 보장하지 않습니다. 제품 스크린샷에 표시된 수익은 설명용이며 과장된 것일 수 있습니다. 봇 거래는 충분한 지식이 있거나 자격을 갖춘 재무 고문의 조언을 구한 경우에만 참여하세요. Cryptohopper는 어떠한 경우에도 (a) 당사 소프트웨어와 관련된 거래로 인해, 그로 인해 또는 이와 관련하여 발생하는 손실 또는 손해의 전부 또는 일부 또는 (b) 직접, 간접, 특별, 결과적 또는 부수적 손해에 대해 개인 또는 단체에 대한 어떠한 책임도 지지 않습니다. Cryptohopper 소셜 트레이딩 플랫폼에서 제공되는 콘텐츠는 Cryptohopper 커뮤니티 회원이 생성한 것이며 Cryptohopper 또는 그것을 대신한 조언이나 추천으로 구성되지 않는다는 점에 유의하시기 바랍니다. 마켓플레이스에 표시된 수익은 향후 결과를 나타내지 않습니다. Cryptohopper의 서비스를 사용함으로써 귀하는 암호화폐 거래와 관련된 내재적 위험을 인정하고 수락하며 발생하는 모든 책임이나 손실로부터 Cryptohopper를 면책하는 데 동의합니다. 당사의 소프트웨어를 사용하거나 거래 활동에 참여하기 전에 당사의 서비스 약관 및 위험 공개 정책을 검토하고 이해하는 것이 필수적입니다. 특정 상황에 따른 맞춤형 조언은 법률 및 재무 전문가와 상담하시기 바랍니다.

©2017 - 2025 저작권: Cryptohopper™ - 판권 소유.