Binary Neural Network - Part 1: Question about sigmoid activation function

I’m attempting to solve the puzzle “Binary Neural Network - Part 1”. I have (in theory) finished the code. Some tests work, some fail.
In checking the weights, for debugging, I realized something that struck me as odd: If a node is connected to two inputs, and has, per example, weights of 0.729777 and 0.176734, I would expect it to pretty much follow the first one. However, just the second one being 1 results in an output of 0.544, which gets rounded to 1. This is an effect, of course, of the sigmoid activation function: 1/(1+exp(-a)), where a is the unnormalized output. It seems that this function only requires a very low unnormalized output to create a “high” (or 1) final output. I don’t know much about neural networks, so I wonder: Is this normal? Is this behavior to be expected? I suppose I could be calculating something wrong, but I don’t see what it could be.
Thank you for reading this, and I hope you might help me.

Your calculations are correct. If the unnormalized result is positive, then it will be interpreted as a 1. If it is negative, then it will be interpreted as a zero.

Question: Are you only rounding the result of “output” nodes, or are you applying a rounding operation to all nodes (including hidden nodes)? You should only round the output nodes. Other nodes should pass the raw (normalized) output into the next node(s) to the right.

  • danBhentschel

Oh, I see, so it works based on negative/positive values. I was thinking in terms of [0, 1].
I’m only rounding results for the display, the network itself works with unrounded values. As I write that, I realize that maybe I should be rounding the results before applying the backpropagation method… although doing that makes it fail the test cases it used to get right - which could mean something else is wrong, I suppose.
Are the normalized outputs o(k) of the output nodes supposed to be rounded before being passed back to the hidden nodes for delta calculations?

Currently, the following tests are failing:
-> Fun fact: if I use 5*traininingIterations, the first two cases work, so it could be that the network is not learning fast enough… which I don’t really know how it can happen. <- end of fun fact
1: Copycat - returns two 1’s (the weights are 0.7… to input and 0.1… for theta)
2: Opposite - returns two 1’s (the weight for the input is negative, which is good… but not good enough)
9: Average - returns 0 0 0 1
10:Two Hidden Layers - returns all 0’s

Here is the data I get for the weights:

Initial weights:
0.5138700783782965
0.17574130332830423

After each training (7 iterations, 2 data, 14 training runs):
0.5138700783782965
0.10828566275947528

0.5535647480620702
0.14798033244324904

0.5535647480620702
0.08123045991045925

0.5927824471415556
0.12044815898994463

0.5927824471415556
0.05442843721866626

0.6315331068292136
0.09317909690632425

0.6315331068292136
0.027911127379379244

0.6698251193473866
0.06620313989755225

0.6698251193473866
0.0017057435778143198

0.7076655681062552
0.03954619233668292

0.7076655681062552
-0.024164552564712748

0.7450604380849442
0.013230317413976213

0.7450604380849442
-0.04968037094214341

0.7820148064792496
-0.012726002547838032

Here are the normalized node outputs for each training round (0 and then 1):

0.5438225954215031
0.6507086793294747

0.5369277202074318
0.6535759644891805

0.5300756876340308
0.65638166688345

0.5232779344102069
0.6591355323275703

0.5165447426474057
0.6618458602265253

0.5098852598201687
0.6645196393251293

0.503307531107423
0.6671626788891555

And the final normalized outputs:
0.4968185422996499
0.68336702766874

Hope this helps.

  • danBhentschel

I am also solving the puzzle and get the same results for tests 9 and 10 (0001, all zeros).

Using my set of initialization values for the weights (see below) the output values (not rounded) for test 9 are (0.002573865129459424, 0.14260723897673383, 0.16852918770697206, 0.8232463310413956) and for test 10 all are around 0.245.
Tests 1 and 2 work as expected with the exact same output values for test 1, as listed above.

Things that I have tried so far:

  • varying the number of bias nodes (one for the network, one per hidden and output nodes, one per hidden and output layers), all producing the same output.
  • trying to solve cases 9 and 10 by varying the number of hidden layers and hidden nodes, and the number of iterations, rendering similar results.
    I pass validator tests 1-9, with only the last one not working.

I have also noticed that the java input does not work properly and I needed to add an additional in.nextLine() after reading in all integer values. (I doubt this has something to do with tests 9 and 10 though, as I checked the input).

For the initialization of the weights I get the sequence 0.1757413148880005, 0.0903129415378226, 0.046414347741650874, 0.02385642627237507 …
If i add 0.5138700783782965 at index 0, I get the exact same result for test 1.

I am curious if anybody else has the same problem and would be happy to hear some suggestions.

Thanks.

Let’s see if this is helpful. For Test 9, there’s obviously an awful lot of data that I could give you. Here are some key milestones:

Weights before training

Layer 0 Node 0
0.5138700783782965
0.17574130332830423
0.3086515163577402
0.5345338869535057
0.9476279257552829
0.17173630146856247
0.7022311690739501
0.2264306811738902
0.49477344681265456
0.1247203196979688
0.08389880325826761
0.38964712125698436
0.2772257971936957
0.3680707194693716
0.9834371921529236
0.535397940098959
0.7656819032345349

Layer 0 Node 1
0.646473150535707
0.7671388111855549
0.7802369211708368
0.8229514224561636
0.15193229315426773
0.6254767405919157
0.3146848274928913
0.3469010807326534
0.9172044768543934
0.5197599365002289
0.40115420771816473
0.6067583833852589
0.7854021693511876
0.9315228801833106
0.8699210741882776
0.8665246995475724
0.674520347115826

Layer 0 Node 2
0.7583996005162594
0.5818934578364219
0.38924772403633584
0.35563473559712744
0.20023207375790555
0.8269268394573251
0.4159033142104295
0.4635219273453215
0.9791629970907992
0.12643645197452813
0.2126366990677252
0.9584513734832645
0.7374629344499963
0.4090564630036505
0.7801130669098874
0.7578992507224434
0.9568418436482744

Layer 0 Node 3
0.028096026288390172
0.31872752416819683
0.7569342049569516
0.24299497168650616
0.5895422145675598
0.04342443404878696
0.9560249671135214
0.31913313098211454
0.059359821052923714
0.4418761257277225
0.9150198455504234
0.5722473452669788
0.11883804254179729
0.5697709799603424
0.25204809347728646
0.49585787416242894
0.23673403134417442

Layer 1 Node 0
0.4769608911485229
0.40609315196335927
0.8729976121676143
0.42696332625437683
0.35821810288271777

Weights after one training iteration

Layer 0 Node 0
0.5138631380996322
0.16471440843451254
0.29668292785620726
0.5242009889573889
0.9448970579178643
0.1585097893613768
0.6892873981482288
0.22318864841620056
0.4846953755239628
0.11025889450512077
0.08065972408530163
0.38172534214071274
0.2686561669157504
0.3601554260141946
0.9715533452041712
0.5319256467213828
0.742311348060544

Layer 0 Node 1
0.6511378747911312
0.7720511542711591
0.7824469109930823
0.8259249293229264
0.15479240614523745
0.6276944479846109
0.3184425195416554
0.3475029946448822
0.9167113342494151
0.5247768846758138
0.4049463240147305
0.609688624244224
0.787208045923412
0.932547687341274
0.8723873804016029
0.8690633719265544
0.6837922532405741

Layer 0 Node 2
0.7484571714562052
0.5612919634207927
0.3605795918020662
0.3279540324707262
0.189790137652342
0.8061127502928597
0.3821487213648927
0.45752522886363084
0.9719583520742551
0.09734001486155096
0.1966387615186809
0.9448005621321711
0.7260784044214973
0.3997184966687958
0.7596564209245137
0.7427074440305291
0.8998599307623979

Layer 0 Node 3
0.01354327377378141
0.2916477954914482
0.73912033212648
0.2096252891315484
0.5741033759935515
0.019176234209632596
0.9339780292093736
0.3067235629604105
0.029624678185721065
0.41011302276399064
0.9081078987178246
0.5614028084898263
0.08982183799424773
0.5452934060521041
0.21316329297539316
0.4727653251392619
0.17232565752802995

Layer 1 Node 0
0.14610666190110919
-0.11556137592902982
0.39850006802085675
0.27947215890805926
-0.266966525191905

Weights after all training iterations

Layer 0 Node 0
0.5319061942634977
-0.458067502842384
0.4915197773172106
0.21827613203865398
1.1601775803628827
0.29440500509970624
0.7923437677186648
0.19327627117891075
0.4738448773463162
-0.14915872519150175
-0.5553635479709592
0.10391852745921809
0.09853292868633451
0.0071297709934608185
0.8109033150030133
0.3421953468194558
-0.41037465230635767

Layer 0 Node 1
0.7284605591034866
0.8607108421201679
0.8977140738852808
0.9178229544948123
0.27801816077719876
0.7367115494717716
0.49021916106872
0.45830040005775063
0.9467995581894103
0.7029031223655374
0.49836835354366343
0.6948069919714179
0.8911503837847996
0.9931223442438667
0.9540811203946675
0.9195803379221197
0.968812450627726

Layer 0 Node 2
0.7448146965050647
0.5463550006765606
0.3485644498749925
0.3044634889321021
0.22406047378300317
0.8030415555165012
0.3774675627868416
0.5452049197495463
0.9708935927445964
0.11386056030798249
0.2037324790935415
0.9509150604924208
0.7698947806676518
0.3995902239291382
0.7401036120036572
0.7191971006432387
0.9052397003545852

Layer 0 Node 3
1.7331219486924003
1.5059211380875968
1.8194993293603725
1.0349900697640984
1.1937827412926632
1.772783039366032
1.3787762517075595
2.0399376234021336
1.3750364414332243
0.8194773688909447
1.959394218917327
1.6113701241217657
1.3042625124436613
0.6051457537031166
1.2836333629488013
1.7608954795698548
-10.760467473044894

Layer 1 Node 0
1.2227750609911112
-2.3085664367313488
-1.1036968691571816
10.023432762374446
-2.871544462296784

Output before training

0.9234163451037934
0.9248494696279053
0.9247091475067545
0.9257523729747934

Output after one training iteration

0.6014626867145206
0.605273928088629
0.6034946560045894
0.6064469752835443

Output after all training iterations

0.02390042747686394
0.11046779833635939
0.7370635517892113
0.9637416082116796

  • danBhentschel

Thanks for the data. The problem was the initial values of the weights. When generating them, I was passing the normalized previous value to get the next one, instead of using the one before division by 2^31. In addition the first value generated was the one where the seed was passed (resulting in 0.17574 …), and not the normalization of the seed itself (thus omitting the first weight of 0.51387…).
Once I fixed this, I passed all tests and validators.

The outputs I get for the final test are:

0.8486715424802767
0.15722586686909315
0.1221993210908292
0.03888068899288182
0.1772127140540704

I am a bit surprised as to how sensitive the training is with respect to the initial weights. For example if the initial values are properly generated but the first value is omitted, the last test won’t pass.

1 Like

… … I feel like an idiot. I somehow missed that we were adding 12345, not multiplying. 100% completion now.
Thank you for the data, which proved quite useful for debugging (I had some minor things to iron out, nothing quite as serious as using the wrong operator). And most of all, thank you for this great puzzle, I had no idea it would be so much fun - not to mention possible for me - to create a neural network. Awesome! Checking out part 2 now.

I’m glad you got it working, and that you enjoyed it!

  • danBhentschel

Yeah. This was something that I put an awful lot of thought into. I specified the initial weights because of the following goals:

  1. I wanted a formula: do this, then this, and it’s guaranteed to work.
  2. I wanted the solution to be performance-friendly. I intentionally created a sub-optimal (performance-wise) implementation in Python so that I could ensure that performance considerations would not be a major factor in solving the puzzle.
  3. I wanted a test failure to be an indication that coding of the neural network was incorrect, rather than a training issue. I think I somewhat failed on this count.

Given these constraints, it made sense to hard-code all the variables (training data, learning rate, initial weights, etc.) into the puzzle description. Otherwise, it would be entirely possible to code a correct NN that would not be able to solve the puzzle, either because it would not be fast enough to learn in the allotted timeframe, or because it could get stuck in a local minimum.

I also specifically chose training data that would just barely work, in that it was just enough training to solve the problem at-hand. This reinforces the concept of “if it doesn’t work, then you coded it wrong.”

Neural network concepts

A neural network can be viewed as a way to find a solution to a multi-dimensional function. The error function for the network can be viewed as an n-dimensional landscape, and the neural network continually moves its coefficients “downhill” towards lower error values.

I like to think of it like the classic arcade game, Marble Madness. Just keep rolling downhill until you get to the bottom of the surface. Unfortunately, some landscapes have small pits (local minima) in them that your “ball” (the NN) can get stuck in before it gets to the “real” bottom, which is presumably the correct solution.

In order to avoid these potential pitfalls (haha) I predefined and pre-tested everything in the puzzle to ensure success. Binary Neural Network - Part 2 allows for more exploration of these concepts as you try to build a single NN that is flexible enough to solve various different problems.

  • danBhentschel

Yes, thanks. I see your point. Alternative ways for initialization of the weights that I have looked up rely on random draw which would not have guaranteed getting the same result every time.
I have tried some other approaches in the second part and they work most of the time, provided the right number of layers and range of hidden nodes are selected.

please help me to understand the following:
your first initial weight is 0.5138700783782965 which is the result of dividing X(0) by 2^31, right?
X(0) = 1103527590
2 ^ 31 = 2147483648
why then 1103527590 / 2147483648 results in 0.5138700783782965?
this is the result of division by 2^31 -1.
the same is with all next initial weights. all of them are divided by 2^31-1( 2147483647) not by 2^31( 2147483648).
what am i doing wrong?

Sigh. You are correct. Thanks. I have modified it as follows:

  • danBhentschel

Hi,

I think you should add a note on how to interpret and display the final outputs. I tried to be smart and decided an output was a 1 if it was larger than ~0,62, which is 1/(1+exp(-0,5)), but apparently you should just be using the obvious threshold of 0,5 .

(Edit: Actually I just made sense of it.)

This is a great puzzle. Good job! :+1:

Very educational puzzle, clear description, nice youtube links - so congrats, I liked this!!! :clap:

The puzzle itself is not ‘very hard’ at all, we just need to follow the instructions in the statement carefully. Not getting lost with such high number of different variable names and array indexes was the main challenge. The ability to play around with the modell in puzzle part-2 was also nice. I found some good parameters after 5 minutes of try-and-error.

Interestingly, having a deeper NN or with more nodes per layer did not always improve the results, so in the end, modifying Eta and maxing out the # of training iterations within the timeout threshold proved the way to go for me.
Is there some rules of thumb to choose the correct number of hidden layers, nodes per layers or eta? Are more training runs always better or some overfitting for noise can kick in?

1 Like