Test stability

The tests must be reproducible. It is necessary to check some good practices in the GDSL code:

The end of page (or view) loading must be detected with a GDSL command of type wait. We wait an id, a text, or a description on the page. The use of GDSL command “pause” should not be used to consider a page is loaded.
The click by coordinate clickByXY should be avoid as possible. If you have no choice to used it, you need to control the success of the click action with the use of a GDSL command of type wait.
If/then conditions should be avoid as possible. If they are used, measurement steps included in the conditions must be compare carefully between analyse.

Consistency of the measures

Measure need to be verify following this guide

https://greenspector.atlassian.net/wiki/spaces/DOCUMENTATION/pages/40697893

Link between measure instability and back-end instability

If back-end platform is unstable (This is common on development platforms), many iterations could be done to have some iterations with success, iterations with error are ignored. However, we advice you to improve the stability of your environment to improve the testability of your application.

Link between measure instability and sobriety

Your application can be “unstable” in term of resource consumption. It is possible if you have a lot of network request, CPU process … In this case, we advice you to increase the number of iterations (>5) and to not deactivate iterations. You must first increase the sobriety of your solutions, the instability of the measurements reflects a potential instability on the user's device.

Link between measure instability and microbenchmarks

If you want to do some microbenchmarks (for instance, comparison of frameworks or comparison of coding best practices ), you can be on the margin error of the measure. We advice you to increase the number of iterations (>10) .

Campaign coherence

The measure which are on your version need to be coherent. It means that :

If you debugged the tests on a version, that version should not be used to analyze the results. You need a specific version.
Avoid launching campaigns too far apart in time

Reference

For comparison between version, reference step need to be as close as possible.

Measure with your own device

you must :

Prepare the device

https://greenspector.atlassian.net/wiki/spaces/DOCUMENTATION/pages/147456001

Check the stability of your device with the reference. If an iteration is too high, you can disable all steps related to that iteration

Measure with device on the Test Bench

The stability of the Test Bench device is under control. However, in some cases the reference may not be stable enough. For example, if you want to compare Ecoscores, the scores may be subject to threshold effects that cause the scores to fluctuate. This is a known limitation that we are investigating in our R&D (more thresholds will be added in scoring). In the meantime, you can apply the following solutions:

If an iteration is too high, you could disable the reference step
You can add more iterations

Documentation

How to improve test for a better comparison of results