Regular least squares regression has conditions to perform optimally. The sampling error should be normally distributed, although it can still work well for distributions vaguely approximating normal (eg. Iāve never had issues using with uniformly distributed errors). This means also the variance of the error should be constant (ie. the normal distribution should be the same normal distribution throughout the dataset) and a particular failure scenario is when the variance is proportional to the dependent variable. This is what we clearly have here: the sampling error is substantially higher when execution time is higher.
In my experience of simple non-linear regression this scenario tends to lead to fits which under-estimate super-linearity. This might be the reason for some reporting the n^3 case looks more like a 2.something complexity, and could also overwhelm the subtle differences between linear and log-linear. Intuitively, errors in the largest measurements will dominate the fit (because they get squared) and thatās not ideal when we have lots of smaller datapoints to make use of too (notwithstanding that the very smallest datapoints are less useful for distinguishing super-linear functions asymptotically, but note that the smallest datapoints are most useful for distinguishing the constant from logarithmic cases).
There are ways of accurately accounting for non-constant variance, such as Weighted least squares - Wikipedia but ultimately weād need to do some deeper analytical work to model variance and then become more reliant on other assumptions like the normal-shape of the errors.
Here we donāt know up-front how the sample error is distributed. The example graphic in the problem statement shows a very extreme spike of outliers around n=3800 for instance (garbage collector kicked in? ). I couldnāt see this kind of error in the test cases but nevertheless, given we donāt need to accurately calculate factors but only fit to very few pre-determined functions, and as @VizGhar noted the non-constant cases appear to pass through the origin, Iāve chosen to make no further assumptions and picked what should be a very robust regression based on repeated median regression ( Repeated median regression - Wikipedia ).
Rather than calculate a linear slope for each pair of datapoints I test which complexity function best matches each pair by appropriately scaling the 1st datapoint to the 2ndās n and taking the minimum absolute difference to the observed execution time (this approach isnāt commutative, so best_fit(a, b) is not necessarily the same as best_fit(b, a) - I donāt think that matters much, but it could be done a few other ways too). Because we have a natural ordering on the complexity functions we can take medians over the result of these tests and effectively zero-in on the median pair of datapoints which have least error and most accurately represent the complexity function.
It appears to work well. In most cases thereās unanimous agreement about the function at the top level even before the final median. By far the worst performing test case is the final one which reports like this, before correctly picking Exponential in the final median:
SuperQuadratic, Cubic * 10, Exponential * 13,
Studying the datapoints for this case itās clear the errors for small values are much larger than the underlying function value and/or the assumption that the execution time function passes through the origin is wrong, and that causes those values to vote Cubic. I think the origin assumption could be relaxed by considering triplets of datapoints, calculating an intercept using the first 2 points, and then testing against the 3rd (or something more exotic), but that would be slower (cubic vs quadratic) and Blunder might relapse waiting for the answer Still Iām kinda curious what method was used to create the noiseā¦