Coding Games and Programming Challenges to Code Better
Send your feedback or ask for help here!
Created by @Westicles,validated by @ggrn,@Zeba and @alze.
If you have any issues, feel free to ping them.
Coding Games and Programming Challenges to Code Better
Send your feedback or ask for help here!
Created by @Westicles,validated by @ggrn,@Zeba and @alze.
If you have any issues, feel free to ping them.
First of all, great puzzle, I learned a lot trying to solve it so far!
I wrote a C# solution that passes the first three tests, but falls short of 5 nodes in Test 4 (1257 rather than 1262) and of 2 nodes in Test 5 (1501 vs 1503). Any ideas on what may be going on?
Rather than having an āend of wordā node, I equipped every node with an attribute āIsEndā. Two nodes are declared equal if they point to the same children through the same links and have the same āIsEndā status. In this way my DAWG is always 1 node shorter (it is missing the End node) but this is already accounted for in the numbers I quoted above. Am I missing something? Could there be more than one End nodes?
That sounds like it should work. Maybe try generating all words from your DAWG when you are done and see if any are missing or you have extra words?
Thanks for the help. I added a lookup function. It finds every word from the input list, while correctly leaving out similar words not in the list (like the ones obtained from those by adding a random character at the end, or removing the last character - unless these were already in the list, of course).
I also added a function that generates all words by using dps starting from the root node, and using that I correctly generate back the input list (and no other words).
Could we perhaps try and see if there are specific words that cause the discrepancy? For example I could share a list of the number of nodes after inserting each of the words in Test 5.
I made this from UnnamedCodinGamerās c++ code (Iām terrible at python):
3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,574,578,579,584,584,584,589,589,590,591,592,597,601,601,602,602,605,607,607,607,610,610,612,613,616,618,616,617,620,620,621,624,627,629,630,632,632,635,635,636,638,638,639,639,641,641,641,646,649,649,652,652,652,653,656,656,657,658,658,658,658,660,661,661,661,662,663,664,665,666,666,666,667,667,668,668,668,669,670,670,670,671,671,673,673,673,674,674,674,675,675,676,678,678,684,684,684,687,690,690,691,691,691,698,698,700,705,708,708,708,709,709,709,709,709,712,712,712,712,713,713,713,714,715,715,716,718,718,720,722,722,723,723,728,728,729,729,730,735,735,735,735,735,739,739,740,741,742,742,742,742,743,744,744,744,744,746,750,750,751,751,751,751,753,755,755,755,755,760,764,766,767,769,769,771,771,772,773,775,775,776,778,780,780,780,783,782,787,789,791,792,793,793,793,795,798,798,798,801,802,803,805,806,807,811,815,818,818,820,821,821,823,824,824,826,828,830,830,833,834,836,836,840,843,845,846,846,850,851,851,855,855,855,858,860,860,861,863,863,864,864,868,868,869,871,872,873,874,874,874,874,875,877,878,878,882,882,882,882,884,884,886,886,887,892,892,892,892,892,892,892,893,895,896,896,896,896,897,897,900,900,902,902,902,906,906,906,908,910,910,910,911,911,911,911,911,915,919,920,920,922,922,922,922,923,929,929,930,930,932,933,934,934,935,935,935,935,936,937,937,941,941,944,944,944,944,945,945,946,948,948,948,949,950,950,952,953,953,956,956,957,957,958,958,960,960,961,961,961,962,963,965,966,970,970,970,972,973,973,974,977,977,980,980,985,985,985,988,996,996,998,998,1001,1001,1001,1001,1005,1005,1005,1005,1009,1009,1010,1010,1010,1011,1011,1011,1013,1015,1015,1018,1018,1021,1021,1023,1023,1024,1026,1026,1026,1027,1028,1028,1029,1029,1031,1032,1034,1034,1035,1037,1038,1040,1040,1042,1042,1045,1045,1046,1048,1048,1050,1050,1051,1051,1051,1054,1054,1054,1056,1056,1057,1057,1057,1058,1062,1063,1063,1063,1064,1065,1065,1065,1066,1066,1067,1067,1067,1068,1070,1072,1072,1075,1075,1075,1081,1081,1087,1087,1090,1091,1094,1094,1095,1095,1095,1095,1097,1098,1101,1101,1101,1101,1102,1102,1102,1103,1104,1107,1108,1108,1108,1109,1109,1109,1111,1111,1112,1113,1113,1113,1113,1116,1116,1117,1118,1119,1120,1120,1122,1122,1122,1126,1126,1127,1130,1131,1134,1135,1136,1139,1139,1141,1141,1141,1142,1142,1142,1144,1145,1145,1148,1148,1149,1149,1151,1153,1154,1154,1157,1157,1159,1159,1159,1162,1163,1163,1164,1164,1164,1164,1167,1167,1167,1167,1168,1170,1170,1170,1172,1172,1175,1175,1176,1177,1177,1178,1181,1181,1182,1182,1183,1183,1184,1185,1186,1187,1187,1188,1188,1189,1191,1195,1195,1197,1197,1197,1199,1200,1200,1205,1205,1210,1211,1212,1213,1214,1215,1216,1216,1216,1217,1220,1221,1222,1223,1223,1224,1225,1227,1227,1228,1229,1229,1231,1231,1231,1231,1231,1232,1236,1241,1242,1243,1246,1247,1249,1249,1250,1251,1251,1251,1251,1251,1251,1251,1252,1252,1253,1253,1255,1257,1257,1257,1258,1258,1260,1260,1261,1263,1263,1263,1263,1264,1264,1265,1266,1266,1266,1266,1267,1267,1267,1267,1267,1267,1267,1269,1269,1269,1273,1274,1274,1274,1274,1275,1275,1276,1276,1276,1277,1278,1278,1279,1279,1282,1282,1285,1285,1287,1287,1291,1292,1297,1297,1298,1298,1300,1301,1301,1301,1301,1301,1302,1302,1304,1304,1307,1309,1309,1311,1311,1312,1312,1313,1314,1316,1316,1316,1317,1319,1319,1321,1321,1323,1323,1325,1325,1327,1328,1328,1331,1331,1332,1334,1335,1335,1336,1337,1339,1340,1342,1343,1344,1346,1347,1348,1349,1349,1349,1349,1349,1351,1352,1354,1356,1360,1362,1366,1366,1367,1367,1369,1369,1369,1372,1372,1375,1375,1376,1376,1380,1381,1381,1382,1382,1382,1383,1384,1385,1386,1386,1388,1389,1389,1389,1389,1389,1389,1391,1393,1394,1394,1394,1397,1398,1398,1398,1399,1400,1401,1401,1402,1403,1406,1406,1406,1409,1409,1412,1412,1413,1413,1413,1414,1414,1415,1415,1415,1415,1417,1418,1418,1418,1420,1421,1422,1422,1423,1426,1427,1429,1429,1430,1433,1434,1435,1436,1436,1437,1437,1438,1439,1439,1439,1442,1442,1444,1444,1445,1445,1447,1448,1448,1450,1451,1451,1451,1452,1452,1453,1457,1457,1457,1459,1460,1461,1461,1462,1462,1463,1463,1464,1465,1465,1465,1465,1465,1467,1467,1467,1467,1469,1472,1472,1473,1475,1476,1476,1477,1483,1483,1483,1484,1483,1483,1484,1484,1484,1484,1484,1488,1489,1489,1491,1492,1492,1493,1493,1493,1493,1493,1494,1494,1494,1496,1499,1499,1500,1503
Thanks! Here are my results:
The first difference occurs for the word at index 389, āemphasizeā after āemphasisā, which costs me 1 additional node (572->573) but 2 nodes to Unnamed (572->574).
The second difference occurs at 988, ārollā after āroleā, which costs me 0 nodes (1223->1223) and 1 node to Unnamed (1224->1225).
Iām not sure of how to use this information though, let me know if you have ideas.
3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,573,577,578,583,583,583,588,588,589,590,591,596,600,600,601,601,604,606,606,606,609,609,611,612,615,617,615,616,619,619,620,623,626,628,629,631,631,634,634,635,637,637,638,638,640,640,640,645,648,648,651,651,651,652,655,655,656,657,657,657,657,659,660,660,660,661,662,663,664,665,665,665,666,666,667,667,667,668,669,669,669,670,670,672,672,672,673,673,673,674,674,675,677,677,683,683,683,686,689,689,690,690,690,697,697,699,704,707,707,707,708,708,708,708,708,711,711,711,711,712,712,712,713,714,714,715,717,717,719,721,721,722,722,727,727,728,728,729,734,734,734,734,734,738,738,739,740,741,741,741,741,742,743,743,743,743,745,749,749,750,750,750,750,752,754,754,754,754,759,763,765,766,768,768,770,770,771,772,774,774,775,777,779,779,779,782,781,786,788,790,791,792,792,792,794,797,797,797,800,801,802,804,805,806,810,814,817,817,819,820,820,822,823,823,825,827,829,829,832,833,835,835,839,842,844,845,845,849,850,850,854,854,854,857,859,859,860,862,862,863,863,867,867,868,870,871,872,873,873,873,873,874,876,877,877,881,881,881,881,883,883,885,885,886,891,891,891,891,891,891,891,892,894,895,895,895,895,896,896,899,899,901,901,901,905,905,905,907,909,909,909,910,910,910,910,910,914,918,919,919,921,921,921,921,922,928,928,929,929,931,932,933,933,934,934,934,934,935,936,936,940,940,943,943,943,943,944,944,945,947,947,947,948,949,949,951,952,952,955,955,956,956,957,957,959,959,960,960,960,961,962,964,965,969,969,969,971,972,972,973,976,976,979,979,984,984,984,987,995,995,997,997,1000,1000,1000,1000,1004,1004,1004,1004,1008,1008,1009,1009,1009,1010,1010,1010,1012,1014,1014,1017,1017,1020,1020,1022,1022,1023,1025,1025,1025,1026,1027,1027,1028,1028,1030,1031,1033,1033,1034,1036,1037,1039,1039,1041,1041,1044,1044,1045,1047,1047,1049,1049,1050,1050,1050,1053,1053,1053,1055,1055,1056,1056,1056,1057,1061,1062,1062,1062,1063,1064,1064,1064,1065,1065,1066,1066,1066,1067,1069,1071,1071,1074,1074,1074,1080,1080,1086,1086,1089,1090,1093,1093,1094,1094,1094,1094,1096,1097,1100,1100,1100,1100,1101,1101,1101,1102,1103,1106,1107,1107,1107,1108,1108,1108,1110,1110,1111,1112,1112,1112,1112,1115,1115,1116,1117,1118,1119,1119,1121,1121,1121,1125,1125,1126,1129,1130,1133,1134,1135,1138,1138,1140,1140,1140,1141,1141,1141,1143,1144,1144,1147,1147,1148,1148,1150,1152,1153,1153,1156,1156,1158,1158,1158,1161,1162,1162,1163,1163,1163,1163,1166,1166,1166,1166,1167,1168,1169,1169,1171,1171,1174,1174,1175,1176,1176,1177,1180,1180,1181,1181,1182,1182,1183,1184,1185,1186,1186,1187,1187,1188,1190,1194,1194,1196,1196,1196,1198,1199,1199,1204,1204,1209,1210,1211,1212,1213,1214,1215,1215,1215,1216,1219,1220,1221,1222,1222,1223,1223,1225,1225,1226,1227,1227,1229,1229,1229,1229,1229,1230,1234,1239,1240,1241,1244,1245,1247,1247,1248,1249,1249,1249,1249,1249,1249,1249,1250,1250,1251,1251,1253,1255,1255,1255,1256,1256,1258,1258,1259,1261,1261,1261,1261,1262,1262,1263,1264,1264,1264,1264,1265,1265,1265,1265,1265,1265,1265,1267,1267,1267,1271,1272,1272,1272,1272,1273,1273,1274,1274,1274,1275,1276,1276,1277,1277,1280,1280,1283,1283,1285,1285,1289,1290,1295,1295,1296,1296,1298,1299,1299,1299,1299,1299,1300,1300,1302,1302,1305,1307,1307,1309,1309,1310,1310,1311,1312,1314,1314,1314,1315,1317,1317,1319,1319,1321,1321,1323,1323,1325,1326,1326,1329,1329,1330,1332,1333,1333,1334,1335,1337,1338,1340,1341,1342,1344,1345,1346,1347,1347,1347,1347,1347,1349,1350,1352,1354,1358,1360,1364,1364,1365,1365,1367,1367,1367,1370,1370,1373,1373,1374,1374,1378,1379,1379,1380,1380,1380,1381,1382,1383,1384,1384,1386,1387,1387,1387,1387,1387,1387,1389,1391,1392,1392,1392,1395,1396,1396,1396,1397,1398,1399,1399,1400,1401,1404,1404,1404,1407,1407,1410,1410,1411,1411,1411,1412,1412,1413,1413,1413,1413,1415,1416,1416,1416,1418,1419,1420,1420,1421,1424,1425,1427,1427,1428,1431,1432,1433,1434,1434,1435,1435,1436,1437,1437,1437,1440,1440,1442,1442,1443,1443,1445,1446,1446,1448,1449,1449,1449,1450,1450,1451,1455,1455,1455,1457,1458,1459,1459,1460,1460,1461,1461,1462,1463,1463,1463,1463,1463,1465,1465,1465,1465,1467,1470,1470,1471,1473,1474,1474,1475,1481,1481,1481,1482,1481,1481,1482,1482,1482,1482,1482,1486,1487,1487,1489,1490,1490,1491,1491,1491,1491,1491,1492,1492,1492,1494,1497,1497,1498,1501
Hi,
I just checked your examples. āemphasisā takes 10 nodes, āemphasisā and āemphasizeā together take 11 nodes. Your other example āroleā/ārollā also checks out at my side - āroleā takes 6 nodes and āroleā and ārollā together also take 6 nodes.
I do not know how the data above was generated. My implementation has two node counts - of the ones that have been added to the graph and the ones that have been minimized after being re-linked for conciseness. You can see how the algorithm works as shown here: Directed Acyclic Word Graphs - Part 1 - The Basics.
The numbers generated above are quite possibly implementation dependent, so it is possible that you come up with different ones or even that there is no correspondence between yours and the given ones.
I hope this helps.
Hi,
the data above was generated by running our two codes on sublists of the 1300-word list of Test 5. Element [[i]] is the number of nodes in a DAWG built out of the first [[i]] elements of the list, rather than the full list. These are well-defined DAWGs that just describe a smaller dictionary, so their minimum number of nodes should not be implementation dependent. I agree that these do not describe the intermediate steps of building the final 1300-word DAWG, which indeed can be implementation dependent. But I was still hoping that looking at those smaller sets of words may help fix the discrepancy.
I agree that āemphasisā takes 10 nodes and āemphasis, emphasizeā take 11. This is also what I get. What I checked in addition, is that if I build a DAWG with the 389 words preceding āemphasizeā (the last being āemphasisā) and then I build a separate DAWG with the same words plus āemphasizeā, the two DAWGs differ by 1 node in my implementation, but they seem to differ by 2 nodes in yours. I am not claiming that your result is wrong - perhaps the previous 388 words make a difference - but maybe it helps understanding what is going on. (Same for ārole, rollā - in my case the two DAWGs have the same number of nodes, in your case the second DAWG gains one extra node - but again, perhaps this is due to the 900+ words already in the DAWG).
I based my implementation on the very same algorithm, so it should be possible to understand what goes wrong. Actually, I closely followed the C# implementation given at the link you shared, although I think his code would not work directly (his nodes cannot be used as dictionary keys without overwriting equality and hash functions, and his function to determine the length of the common prefix seem to return 0 when a word exactly contains the previous one).
Thanks for replying.
My impression of the C# implementation was also that the common prefix sum method was bugged. I wrote my own version which made sense seeing what the algorithm was doing. I did not get into much details because I did not feel like learning C# specific stuff.
It seems to me that the identification of the nodes which is used to check for equality might make the difference. I based this part on the python implementation linked in the same article.
I was not bothered to check if the implementation itself works as a whole.
I can confirm what you noted about āemphasisā/āemphasizeā - there is really a difference of two nodes between DAWG to index 388 and 389, but as you mention the difference might come from the structure of the graph up to this point.
I would be happy to send you the code for reference in a private message. (if I do not get reprimanded by the CG staff :).
Thanks! I donāt feel thereās something wrong with sharing code in this case - weāre trying to see if the published test cases are correct, after all. If my solution is legit, this proves that there exists a DAWG with less nodes; but it could well be that my code misses some nodes and works just because of some loophole.
Iāll also try to write a solution in python following the one you mention. I found their explanation of the same algorithm less clear, but the code looks neat.
Hi again,
after trying a simple test suggested by Westicles (āanalysisā, āanalyzeā, āemphasisā, āemphasizeā) I agree that thereās something wrong with my solution, possibly something subtle about overriding hashcodes in c#.
EDIT: found the mistake: in the way I overrode the equality operator, my code was identifying nodes with multiple children A ā (B,C) with nodes with multiple generations, like A ā B ā C.
Thanks again Westicles and UnnamedCodingGamer for your time and patience!
How many nodes is need for this example?
Iām thinking with the number of people whoāve solved this I must be wrong but I canāt think why.
Doesnāt the initial example collapse to 5 nodes rather than 6?
(mermaid diagrams donāt seem to render but this should make it clear)
flowchart LR
SOURCE --> t --> a --> p --> s --> SINK
t --> o --> p
p --> SINK
Hi, you can see the graph for that one in the wiki link in the description
Labelled edges are important. Iād got it in my head that there was no difference between a labelled edge and a node with the same label but of course there is.
Cheers yep that makes it much clearer.
And the Woof tag.
i donāt get it
my result for test #2 is 143 instead of 204 - counting by hand yields the same - so i must have a wrong idea about the problem.
I wish there were some short testcases with words that differ much from each other - and donāt begin with the same letter.
what if the example included the word ālapsā would the result be 8 because i have to add a node for (ālā in the first place) AND an empty starting node ?
would still not explain my missing 61 results, of course . . .
How about trying this custom case?
Input
28
against
always
anyone
baby
budget
by
charge
degree
eight
energy
everyone
finger
generation
get
hang
history
legal
light
long
military
myself
party
tough
type
worry
yeah
year
yet
Output
86