The library call omp_get_num_procs is used to obtain the number of hardware threads available to the current process. The number of workers is initially set to twice that and then successively reduced by a factor of two until only one worker is left. Since my phone has 8 available cores then the timings for each computational test are taken for 16 workers, 8 workers, 4, 2 and finally one worker. After that only the fastest is reported along with how many workers were used for that timing.I have no idea what the flex scheduling is doing, but if it over provisioning, then would 2 workers be a single core? 4 workers be just two cores, so ignoring the other 7 and 6 cores respectively.
Given it didn't alter the temperature of the phone and it can actually get really nice an toasty as a great hand warmer, the poor thing was just ticking over.
And how would you compile for and use the A76s when you have to be compatible for A55.
I have no idea here.
Usually taking the number of workers equal to the number of available hardware threads is fastest. Sometimes over provisioning is faster though it's possible ignoring the little cores and running with fewer workers than the actual number of available hardware threads is faster.
Since your phone has 8 cores it seems strange to me that only 2 workers would be optimal for the prime sieve.
I wonder what value omp_get_num_procs actually returns on your phone.
Statistics: Posted by ejolson — Mon Feb 26, 2024 3:03 am