Nintendo Challenge

The algorithm from the description is taking out.txt and printing in.txt.

1 Like

Can someone tell me if I am on the right track (and whether my unrolling is correct)?

A few definitions first ā€¦
Ā + means XOR
Ā Sum i=[x..y] term(i) means XOR repeated on term (like the Sigma) with i in [x,y]; if x>y then the generated Sum has no elements.
Ā * means bitwise AND

a_i [j] means the bit number j of a_i (a_i being the ith input), bit 0 is the rightmost bit (aka LSB 0, see https://en.wikipedia.org/wiki/Bit_numbering)
b_i [j] means the bit number j of b_i being the ith output (e.g. b_0 being the first output)

I.e.
a_0 [0] is the right most bit of a_0
a_0 [0] * a_1[0] means AND of 0th bit of a_0 and 0th bit of a_1 (and the result is 1 if and only if a_0[0] == a_1[0] == 1)

I unrolled the sample code for the case size=32 and I get:

b_0 = Sum m=[0..31] ( Sum k=[0..31-m]Ā Ā  a_0[m] * a_1[k] )
b_1 = Sum m=[1..31] ( Sum k=[32-m..31]Ā  a_0[m] * a_1[31-k] )

Unrolling for the bits ā€¦

b_0 [j] = Sum j=[0..31] ( Sum i=[0..j] a_0[i] * a_1[j-i] ) + ( Sum i=[j+1..31] a_0[i] * a_1[32+j-i] )
b_1 [j] = Sum j=[1..31] ( Sum i=[1..j] a_0[i] * a_1[j-i] ) + ( Sum i=[j+1..31] a_0[i] * a_1[32+j-i] )

Especially the bits part is astonishing and makes me doubt this result

I think your unrolling is wrong, itā€™s not miles away but itā€™s wrong. The first step is indeed to understand how it unrolls.

I completed it as well yesterday.
Once you understand what the problem is, itā€™s not very hard to solve and the algorithm was fun to implement!

I have completed it today. Takes around 130ms for the 256 bit test :smile:

Iā€™m very close - 95%. I pass all the test cases, but when I submit the second 32bit fails. There must be some special case I am missing. Did anyone else have this problem? If so, any clues?

The only things that comes to me is the ā€œlast bit set to one caseā€. On my random testings, i had a lot of frustration with those with some of my trials :slight_smile: Iā€™m not sure that the test cases do have those though (never bother to check out).

What i do for testing, is that i make a weak solver that is brute force, and very very easy to asses that he never goes wrong ā€¦ i do that with a 8 to 16 bits tests cases (wich would be 4 to 8 per challenge definition), and i run my programm against it to see if they have same output.*

Hope that helps, good luck !

1 Like

Thanks for your help. I did find a couple of bugs relating to multiple solutions but it didnā€™t affect the result. Iā€™ve got quite a lot of test cases which all seem OK. I think I may still have a problem in some cases if the number of solutions is very large, but canā€™t believe they would be mean enough to put this into the second 32 bit test.

I have also looked pretty closely at the bits at the edges of the range. I am reasonably sure that the solutions I do find are correct (by reapplying the Nintendo function), but perhaps there are cases where I am not finding all of them.

Iā€™ll post again if I ever track this down!

Thereā€™s not much topics on which Donald didnā€™t write ā€¦ :smile:

It took me some time to understand what was the real problem. For me, it was really fun to figure that out and code the solution.
That was a awesome puzzle.

LOL Not at all. I actually understand what this is about and I can solve the 32 bit test cases, but how in the world do I even go about for 256 bit one ?

Iā€™ve put lots of time into this. Probably 120+ hours so far. Learned a lot about math Iā€™d never even heard of before. So far solved the first 32 bit case. Just have to find the right first set of things to find the right second set of things at this point. Very difficult!

Once youā€™ve understood what the problem is, you simply have to Google for known algorithms to solve it and implement them the fastest way you could. The same implementation passes 32bit cases as well as 256bits without trouble.

Iā€™ve got a <300 LoC solution, using C++ and the standard library only and I havenā€™t really tried to minimize the size. 256bit test case runs in <60ms.

Thanks great Professor B! He is so genius to invent such an algorithmā€¦

I made it ! This challenge shows the power of mathematics ! :slight_smile:

15 ms is more than enough for the 256b test case ^^

your solution must be overoptimized. mine takes 22ms for 256b case, and there is only one point in the code where I have added about 5 lines for optimization purposes.

I didnā€™t work that much on optimization ; I just have an array of 64b words class with standard operations redefined (<< >> ^ ā€¦), and work with bits operations only (not for optimization purpose, but because the algorithms to use lent themselves well to it).

For 256b test case, I get times from 12 to 13 ms (and ~5 ms on my iMac with -O3 option) ; thats why Iā€™m surprised to see some people talking about 800+ ms. I guess there are not tons of algorithms to solve it ^^

I have pretty much the same as what you have described, but I used 32-bit unsigned integers, instead of 64-bit, just because they are simpler to work with, considering there are 3 32-bit testcases in the puzzle.
I have remade it with 64-bit, and it shows 19ms now.
looks like my class implementaton is not as fast as yours. or maybe you have found a way to shave about 20-25% of main algorithm iterations. hmm, now that I think about itā€¦

update:
I have made a few really basic optimizations, like replacing (x % WORD_BITS) with (x & WORD_BITS_MASK), and I have 16ms now. So it was just a matter of optimizing the most critical part of code over keeping the code nice and clean.

update 2:
well, now Iā€™m stuck 18-19 seconds. I tried different ways to speed it up, but so far only going from 32-bit to 64-bit gives noticeable results (~3ms faster). Well I have to admit that my implementation is not the fastest possible, then.

As for 800ms times, those might be some ad-hoc self-devised algorithm. Or maybe a poor implementation like using an int to store a boolean or something.

Maybe the fact that I keep the degree in my class helps to optimize ; I had to add some code to recalculate it from last position when the higher bit disappears (with ^ or derivative), but in most cases, it keeps me from doing lots of operations with 0 words.

Be careful, cerr logs and other unitary tests can slow a lot the whole thing ; my times are without any logs or checks.

I guess we wonā€™t be able to discuss it in more details without giving too many hints.