It’s not a problem. Thanks for trying. It’s really kind of you.
Solved in ~100 lines by reducing to a linear fit and computing the r^2 likelyhood. E.g fit y[i] to x[i]log(x[i]) if your trying to fit to nlog n. You can compute the r^2 likelyhood directly as a combination of (average x, average y average xy…) with a formula on wikipedia.
For O(1) I used a hack of if the last value is within 5 percent of the first value it’s O(1).
Solved it in ~20 lines. 3 real lines of codes, the rest is only a bunch of chained if/else
Hi Pierre, so would you like to share your solution with us ? What kind of algorithm, theorem did you us ?
Everything you need is explained here: https://en.wikipedia.org/wiki/Analysis_of_algorithms
@Skywalker_ Technically, O( n^2.1 ) is quite different to O(n^3)
@Elliot Ironically, I model it exactly A*(x^B) and found the MSE, B turn out to be somewhere between (2.5, 2.8)
I meant that any O(n^2.1) function is in O(n^3), nothing more.
Since all complexities follow a power rule (except the last one), you can just calculate the order of growth and you have your answer.
I had exactly the same issue, and sat to think a bit.
Measurement of complexity for low input values are notoriously unreliable, because of the fixed time spend in initialization that adds to the time actually spend running the algorithm (but becomes comparatively negligible for high values) . You can check this on the supplied test sets, with some Excel fiddling.
I did apply that reasoning and removed the first 10% of the data sets (5% works too). Magic, got 100%.
I have read all posts on the forum before trying this challenge because i had no idea. Then now, with also little searches on Internet, i can say I know much on linear regressions and least square methods
First thing i did with datas was put them in Excel and try to reduce the gap between mesures and theorical values. Calculate a simple average factor did the job and the result on a graphic was so good, showing that this puzzle was finally not so difficult. After that, coding was easy because i could refer to my Excel values to verify each intermediate result. It helped a lot.
To calculate gaps, i tried sum of all ratios (not good), then sum of all differences (better but not really), and finally sum of squares of differences that works fine, Thanks to mathematicians.
Validate 100% in IDE was easy because of forum : truncate values at beginning and end of datas.
Validate 100% in submission was not very easy because i validated IDE before using least squares and i tried a lot of things before using it.
So here is my solution :
- To validate 100% in IDE, I removed first data when calculating the average factor and removed 4 last datas when calculating the least square. Removing 10 datas everywhere works fine too.
- To validate 100% in submission, you just have to remove 1 data at the beginning of the average factor process. Removing 10 datas everywhere is working to. I think that removing 10% on each side as it is said in the forum was a gread idea.
- Calculate n^2.5 to n^9 instead of n^3 is working too but not without removing datas so i kept n^3
This puzzle was very interesting. Thanks to everybody for help and to Codingame for it is a great job.
I calculated an estimated X and Y for every O(…)
Then estimated X * O(…) + Y and created a distance function for each estimate
distance = (sum of (estimate/real))
Finally took the minimum distance, I had some issues too on n^3 in submission as I was taking only a subset of the data points (last 5 or 8 points) for distance function and managed to get it passed after taking all points, I guess it’s because on of the interpolations was very effective to match the last points (maybe they are quite close in this test)
Hi! Just for the kicks, I’ve applied some concepts from a Machine Learning course I’m doing. I’ve implemented a linear regression function, a cost function and a gradient descent implementation in order to minimize the parameters of my regression function.
I’ve defined a model for every graph shape and fine tuned my alpha (learning rate parameter) and number of iterations to best fit their corresponding solution.
I currently can predict every shape but some funny things happen in O(1) and O(n).
O(1) is being identified as O(log n). Take a look at the cost values for each regression after minimizing params…
This is for O(1) test case:
O(1),13000000 O(n),4395033193.940686 O(log n),12829302.137301734 O(n log n),4394469826.132628 O(n^2),7738738917.851379 O(n^2 log n),Infinity O(n^3),Infinity O(2^n),Infinity O(log n) << My result
They’re almost the same, although logn scores lower (best result) (I’m guessing because of that first 200000 value in the series)
This is for O(n) test case:
O(1),2836784129.6022005 O(n),116286.38013830909 O(log n),937202031.0230501 O(n log n),116277.304665206 O(n^2),709350118.6911637 O(n^2 log n),Infinity O(n^3),Infinity O(2^n),Infinity O(n log n) << My result
In this case the difference is even smaller…
My conclusion is that “according to the data”, the test is broken No, seriously… this is interesting
If I give O(n)'s model 100x more steps I can fine tune the params so that it gives a regression that fits better to the data and wins, but then, in other test cases fails because it fails to produce results in time
Reading you gave me the will to discover and do this puzzle.
I kept it simple, modelising curves and computing their sum of squared residuals. I always have a clear winner that way.
In essence, that’s what I’m doing. The cost function sums the cuadratic error between my hypothesis and the real curve.
Theoretically, the model with the lowest cost value is the one that better fits the data. I’ll try to work it manually, as you suggested, though.
I used another method that I don’t see mentioned here. I compute the integral (with trapezoid rule) of the function between [0,N/2] and between [0,N], where N is the maximal size. Then the ratio of the two integrals permits to decide.
Interesting. At first I looked for asymptotes and ratio of the mean value between the few first and last values, but it was enough only for the first half of tests.
I stopped this way at that point, but you made it work by integrating, I should think about it next time I face something similar.
Thanks a lot for that link, it makes the problem way easier to solve.
Fixed checking for n^2.8 instead of n^3, kinda tricky, but it must be due to not accurate enough input.
I don’t know what kind of data is used for testing, but it’s really tricky. I managed to solve this 100% by comparing definite integral of given complexity functions and sum of small areas under function’s curve. But i had serious problems with n^2 validator and had to cut first and last 10% of data to complete it. I guess my method is not optimal precision-wise(cause of value dispersions), but it should be one of the fastest.