Blunder - Episode 3 - Puzzle discussion

I got 100%. This is the puzzle which I feel I had the least confidence, so I just tried whatever and see what happens.

In order to deal with the constant term in program initialization, I sought to calculate the derivative of the input data, so the constant term can be ignored. Then I compared the derivatives against the derivatives of the ideal curve.

To compare the derivates, I chose to divide one gradient against another, this leads to errors with the O(1) curve as the gradient is zero. I hard-coded this case checking if t hasnā€™t varied much then just output O(1).

Also calculating the curve of O(2^n) leads to overflow in all test cases except the last one due to large n as input. I added a check to ignore this curve when overflow happens.

I was happy that I needed not add any special handling beyond that - no need to cut / remove data or detect outliers.

I saw ā€œregression analysisā€ as a note to this problem, so I calculated fitted values for various functions (least squares method, as it is usual in linear regression) and then residual sum of squares (RSS). Then I decided to choose a function with the least RSS. With this approach, I have problem only with the ā€œsimpleā€ example, where the answer should be O(1).

It seems that regression analysis is not appropriate for the first example (I can find another functions with smaller RSS and therefore is more optimal from this point of view). I solved this problem separately with a ratio of the last and the first value (if it is small enough, then it is O(1) problem), otherwise I use regression to find a better function. Do you have some other way to recognize O(1) problem?

1 Like

damnn, this puzzle messed a lot with my head, but solved it anyway ^^

Thereā€™s a typo in the problem description: ā€œLetā€™s determin the complexity of each algorithm and choose the most optimal one.ā€ where ā€œdetermineā€ is misspelt.

1 Like

At first this task sounded easy. At the end it turned out that if you do not know what you are doing it is quite complicated. It took me 4 days to find out that I need to make linear regression lines for each complexity model with the given input data for N and compare it to the regression line for given input data for the milliseconds and choose the closest pair (milliseconds regression line; closest error from other complexity models).
These videos helped me a lot: https://www.youtube.com/watch?v=ZkjP5RJLQF4&list=PLIeGtxpvyG-LoKUpV0fSY8BGKIMIdmfCi
Also I made several diagrams to help me visualize the problem: https://docs.google.com/spreadsheets/d/1ApMmM5T38RL5B0XR_pRsKWzQmdQqjIVC0ChFh5VSwkQ/edit?usp=sharing

This is an interesting find, and matches well with the theory. As n grows, the approximation by a constant multiplied with the reference fits better, because the fastest-growing component of the cost dominates more. In principle, trying to fit complexity measures to samples from small n is hoping for compliance in a region where it cannot be expected.

Nevertheless, my own current solution does exactly this, even overprioritizing the smaller nā€¦

It does not have to be bounded by k*n^2 for all n either, as long as there exists a finite n after which it is bounded for all larger n. (I suppose an equivalent criterion could be formulated that also adds a constant to make the function bounded for all n.)

No need for the ā€œLeast squaresā€ methodā€¦ All functions basically starts at [0,0] (except O(1)) and you can simply tell what complexity is used just by looking at graph (average local growth)

  1. Compute empirical orders of growth for log-log plot (didnā€™t quite get wikipedia, but first 3 minutes of this coursera video helped a lot)
  2. Compute average value of that growth
  3. TOP SECRET - observe what averages gives you :slight_smile:
  4. O(2^n) is the craziest that one is resolved without averaging, just looking at last entry of order of growth is enough
4 Likes

Actually you just need linear least squares here (you fit a non-linear model, but linear in terms of unknown parameters). At worst you have a 2x2 matrix to invert, otherwise only matrix / vector multiplications.

The only issue with O(1) comes from overfitting noised data with the additional regression parameter of the other complexity candidates. A penalty on parameter number does the trick.

I spent a few hours trying to wrap my head around the least squares method without success, so I tried computing the averages of the empirical orders of growth like you suggested, and I was able to solve the problem 100% in about 5 mins and less than 20 lines.

1 Like

I took the approach of checking which test / f(num) of the possible output functions produced a dataset that was closest to a constant value, but had problems with the n^3 validator. Removing outlier test values (those that were most different from the average) got me to 100%, but required some tuning. Removing too many outliers would pass the n^3 validator but caused one of the others (n^2, IIRC) to fail.

Maybe the correct approach is to disregard all values outside some threshold of variance, like 1 standard deviation. Maybe the difference between n^2 and n^3 is really fine when compared to random variation for the range of n values provided.

Iā€™m curious about what response my program was giving originally when failing validator n^3. Iā€™m betting it was n^2.

log-log is really powerful !!! Thanks for the idea, no need for regression, really elegant!
I was using only (x1, y1) and (xn, yn) to calculate slope, but cannot pass validator O(n^3). Then, using the average slope solves the problem.
I guess maybe in the O(n^3) validator, beginning or ending part has lots of noise.

Thank you @VizGhar for the tip and the video !
It became so easy ^^

1 Like

Regular least squares regression has conditions to perform optimally. The sampling error should be normally distributed, although it can still work well for distributions vaguely approximating normal (eg. Iā€™ve never had issues using with uniformly distributed errors). This means also the variance of the error should be constant (ie. the normal distribution should be the same normal distribution throughout the dataset) and a particular failure scenario is when the variance is proportional to the dependent variable. This is what we clearly have here: the sampling error is substantially higher when execution time is higher.

In my experience of simple non-linear regression this scenario tends to lead to fits which under-estimate super-linearity. This might be the reason for some reporting the n^3 case looks more like a 2.something complexity, and could also overwhelm the subtle differences between linear and log-linear. Intuitively, errors in the largest measurements will dominate the fit (because they get squared) and thatā€™s not ideal when we have lots of smaller datapoints to make use of too (notwithstanding that the very smallest datapoints are less useful for distinguishing super-linear functions asymptotically, but note that the smallest datapoints are most useful for distinguishing the constant from logarithmic cases).

There are ways of accurately accounting for non-constant variance, such as Weighted least squares - Wikipedia but ultimately weā€™d need to do some deeper analytical work to model variance and then become more reliant on other assumptions like the normal-shape of the errors.

Here we donā€™t know up-front how the sample error is distributed. The example graphic in the problem statement shows a very extreme spike of outliers around n=3800 for instance (garbage collector kicked in? :upside_down_face: ). I couldnā€™t see this kind of error in the test cases but nevertheless, given we donā€™t need to accurately calculate factors but only fit to very few pre-determined functions, and as @VizGhar noted the non-constant cases appear to pass through the origin, Iā€™ve chosen to make no further assumptions and picked what should be a very robust regression based on repeated median regression ( Repeated median regression - Wikipedia ).

Rather than calculate a linear slope for each pair of datapoints I test which complexity function best matches each pair by appropriately scaling the 1st datapoint to the 2ndā€™s n and taking the minimum absolute difference to the observed execution time (this approach isnā€™t commutative, so best_fit(a, b) is not necessarily the same as best_fit(b, a) - I donā€™t think that matters much, but it could be done a few other ways too). Because we have a natural ordering on the complexity functions we can take medians over the result of these tests and effectively zero-in on the median pair of datapoints which have least error and most accurately represent the complexity function.

It appears to work well. In most cases thereā€™s unanimous agreement about the function at the top level even before the final median. By far the worst performing test case is the final one which reports like this, before correctly picking Exponential in the final median:
SuperQuadratic, Cubic * 10, Exponential * 13,

Studying the datapoints for this case itā€™s clear the errors for small values are much larger than the underlying function value and/or the assumption that the execution time function passes through the origin is wrong, and that causes those values to vote Cubic. I think the origin assumption could be relaxed by considering triplets of datapoints, calculating an intercept using the first 2 points, and then testing against the 3rd (or something more exotic), but that would be slower (cubic vs quadratic) and Blunder might relapse waiting for the answer :grimacing: Still Iā€™m kinda curious what method was used to create the noiseā€¦

Validator troubles? yes, n^3 hated me. Keep in mind ratios with small x are more significant in least squares. Consider dumping small x values and any y that have zeros because they are probably not great data.