[Community Puzzle] DAWG

Coding Games and Programming Challenges to Code Better

Send your feedback or ask for help here!

Created by @Westicles,validated by @ggrn,@Zeba and @alze.
If you have any issues, feel free to ping them.

Hey @Westicles I liked this puzzle ! I solved it multiple ways to test some ideas.

1 Like

First of all, great puzzle, I learned a lot trying to solve it so far!

I wrote a C# solution that passes the first three tests, but falls short of 5 nodes in Test 4 (1257 rather than 1262) and of 2 nodes in Test 5 (1501 vs 1503). Any ideas on what may be going on?

Rather than having an ā€œend of wordā€ node, I equipped every node with an attribute ā€œIsEndā€. Two nodes are declared equal if they point to the same children through the same links and have the same ā€œIsEndā€ status. In this way my DAWG is always 1 node shorter (it is missing the End node) but this is already accounted for in the numbers I quoted above. Am I missing something? Could there be more than one End nodes?

1 Like

That sounds like it should work. Maybe try generating all words from your DAWG when you are done and see if any are missing or you have extra words?

2 Likes

Thanks for the help. I added a lookup function. It finds every word from the input list, while correctly leaving out similar words not in the list (like the ones obtained from those by adding a random character at the end, or removing the last character - unless these were already in the list, of course).

I also added a function that generates all words by using dps starting from the root node, and using that I correctly generate back the input list (and no other words).

Could we perhaps try and see if there are specific words that cause the discrepancy? For example I could share a list of the number of nodes after inserting each of the words in Test 5.

2 Likes

I made this from UnnamedCodinGamerā€™s c++ code (Iā€™m terrible at python):

Test 5 nodes with each added word

3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,574,578,579,584,584,584,589,589,590,591,592,597,601,601,602,602,605,607,607,607,610,610,612,613,616,618,616,617,620,620,621,624,627,629,630,632,632,635,635,636,638,638,639,639,641,641,641,646,649,649,652,652,652,653,656,656,657,658,658,658,658,660,661,661,661,662,663,664,665,666,666,666,667,667,668,668,668,669,670,670,670,671,671,673,673,673,674,674,674,675,675,676,678,678,684,684,684,687,690,690,691,691,691,698,698,700,705,708,708,708,709,709,709,709,709,712,712,712,712,713,713,713,714,715,715,716,718,718,720,722,722,723,723,728,728,729,729,730,735,735,735,735,735,739,739,740,741,742,742,742,742,743,744,744,744,744,746,750,750,751,751,751,751,753,755,755,755,755,760,764,766,767,769,769,771,771,772,773,775,775,776,778,780,780,780,783,782,787,789,791,792,793,793,793,795,798,798,798,801,802,803,805,806,807,811,815,818,818,820,821,821,823,824,824,826,828,830,830,833,834,836,836,840,843,845,846,846,850,851,851,855,855,855,858,860,860,861,863,863,864,864,868,868,869,871,872,873,874,874,874,874,875,877,878,878,882,882,882,882,884,884,886,886,887,892,892,892,892,892,892,892,893,895,896,896,896,896,897,897,900,900,902,902,902,906,906,906,908,910,910,910,911,911,911,911,911,915,919,920,920,922,922,922,922,923,929,929,930,930,932,933,934,934,935,935,935,935,936,937,937,941,941,944,944,944,944,945,945,946,948,948,948,949,950,950,952,953,953,956,956,957,957,958,958,960,960,961,961,961,962,963,965,966,970,970,970,972,973,973,974,977,977,980,980,985,985,985,988,996,996,998,998,1001,1001,1001,1001,1005,1005,1005,1005,1009,1009,1010,1010,1010,1011,1011,1011,1013,1015,1015,1018,1018,1021,1021,1023,1023,1024,1026,1026,1026,1027,1028,1028,1029,1029,1031,1032,1034,1034,1035,1037,1038,1040,1040,1042,1042,1045,1045,1046,1048,1048,1050,1050,1051,1051,1051,1054,1054,1054,1056,1056,1057,1057,1057,1058,1062,1063,1063,1063,1064,1065,1065,1065,1066,1066,1067,1067,1067,1068,1070,1072,1072,1075,1075,1075,1081,1081,1087,1087,1090,1091,1094,1094,1095,1095,1095,1095,1097,1098,1101,1101,1101,1101,1102,1102,1102,1103,1104,1107,1108,1108,1108,1109,1109,1109,1111,1111,1112,1113,1113,1113,1113,1116,1116,1117,1118,1119,1120,1120,1122,1122,1122,1126,1126,1127,1130,1131,1134,1135,1136,1139,1139,1141,1141,1141,1142,1142,1142,1144,1145,1145,1148,1148,1149,1149,1151,1153,1154,1154,1157,1157,1159,1159,1159,1162,1163,1163,1164,1164,1164,1164,1167,1167,1167,1167,1168,1170,1170,1170,1172,1172,1175,1175,1176,1177,1177,1178,1181,1181,1182,1182,1183,1183,1184,1185,1186,1187,1187,1188,1188,1189,1191,1195,1195,1197,1197,1197,1199,1200,1200,1205,1205,1210,1211,1212,1213,1214,1215,1216,1216,1216,1217,1220,1221,1222,1223,1223,1224,1225,1227,1227,1228,1229,1229,1231,1231,1231,1231,1231,1232,1236,1241,1242,1243,1246,1247,1249,1249,1250,1251,1251,1251,1251,1251,1251,1251,1252,1252,1253,1253,1255,1257,1257,1257,1258,1258,1260,1260,1261,1263,1263,1263,1263,1264,1264,1265,1266,1266,1266,1266,1267,1267,1267,1267,1267,1267,1267,1269,1269,1269,1273,1274,1274,1274,1274,1275,1275,1276,1276,1276,1277,1278,1278,1279,1279,1282,1282,1285,1285,1287,1287,1291,1292,1297,1297,1298,1298,1300,1301,1301,1301,1301,1301,1302,1302,1304,1304,1307,1309,1309,1311,1311,1312,1312,1313,1314,1316,1316,1316,1317,1319,1319,1321,1321,1323,1323,1325,1325,1327,1328,1328,1331,1331,1332,1334,1335,1335,1336,1337,1339,1340,1342,1343,1344,1346,1347,1348,1349,1349,1349,1349,1349,1351,1352,1354,1356,1360,1362,1366,1366,1367,1367,1369,1369,1369,1372,1372,1375,1375,1376,1376,1380,1381,1381,1382,1382,1382,1383,1384,1385,1386,1386,1388,1389,1389,1389,1389,1389,1389,1391,1393,1394,1394,1394,1397,1398,1398,1398,1399,1400,1401,1401,1402,1403,1406,1406,1406,1409,1409,1412,1412,1413,1413,1413,1414,1414,1415,1415,1415,1415,1417,1418,1418,1418,1420,1421,1422,1422,1423,1426,1427,1429,1429,1430,1433,1434,1435,1436,1436,1437,1437,1438,1439,1439,1439,1442,1442,1444,1444,1445,1445,1447,1448,1448,1450,1451,1451,1451,1452,1452,1453,1457,1457,1457,1459,1460,1461,1461,1462,1462,1463,1463,1464,1465,1465,1465,1465,1465,1467,1467,1467,1467,1469,1472,1472,1473,1475,1476,1476,1477,1483,1483,1483,1484,1483,1483,1484,1484,1484,1484,1484,1488,1489,1489,1491,1492,1492,1493,1493,1493,1493,1493,1494,1494,1494,1496,1499,1499,1500,1503

Thanks! Here are my results:

The first difference occurs for the word at index 389, ā€œemphasizeā€ after ā€œemphasisā€, which costs me 1 additional node (572->573) but 2 nodes to Unnamed (572->574).

The second difference occurs at 988, ā€œrollā€ after ā€œroleā€, which costs me 0 nodes (1223->1223) and 1 node to Unnamed (1224->1225).

Iā€™m not sure of how to use this information though, let me know if you have ideas.

Test 5 nodes with each added word

3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,573,577,578,583,583,583,588,588,589,590,591,596,600,600,601,601,604,606,606,606,609,609,611,612,615,617,615,616,619,619,620,623,626,628,629,631,631,634,634,635,637,637,638,638,640,640,640,645,648,648,651,651,651,652,655,655,656,657,657,657,657,659,660,660,660,661,662,663,664,665,665,665,666,666,667,667,667,668,669,669,669,670,670,672,672,672,673,673,673,674,674,675,677,677,683,683,683,686,689,689,690,690,690,697,697,699,704,707,707,707,708,708,708,708,708,711,711,711,711,712,712,712,713,714,714,715,717,717,719,721,721,722,722,727,727,728,728,729,734,734,734,734,734,738,738,739,740,741,741,741,741,742,743,743,743,743,745,749,749,750,750,750,750,752,754,754,754,754,759,763,765,766,768,768,770,770,771,772,774,774,775,777,779,779,779,782,781,786,788,790,791,792,792,792,794,797,797,797,800,801,802,804,805,806,810,814,817,817,819,820,820,822,823,823,825,827,829,829,832,833,835,835,839,842,844,845,845,849,850,850,854,854,854,857,859,859,860,862,862,863,863,867,867,868,870,871,872,873,873,873,873,874,876,877,877,881,881,881,881,883,883,885,885,886,891,891,891,891,891,891,891,892,894,895,895,895,895,896,896,899,899,901,901,901,905,905,905,907,909,909,909,910,910,910,910,910,914,918,919,919,921,921,921,921,922,928,928,929,929,931,932,933,933,934,934,934,934,935,936,936,940,940,943,943,943,943,944,944,945,947,947,947,948,949,949,951,952,952,955,955,956,956,957,957,959,959,960,960,960,961,962,964,965,969,969,969,971,972,972,973,976,976,979,979,984,984,984,987,995,995,997,997,1000,1000,1000,1000,1004,1004,1004,1004,1008,1008,1009,1009,1009,1010,1010,1010,1012,1014,1014,1017,1017,1020,1020,1022,1022,1023,1025,1025,1025,1026,1027,1027,1028,1028,1030,1031,1033,1033,1034,1036,1037,1039,1039,1041,1041,1044,1044,1045,1047,1047,1049,1049,1050,1050,1050,1053,1053,1053,1055,1055,1056,1056,1056,1057,1061,1062,1062,1062,1063,1064,1064,1064,1065,1065,1066,1066,1066,1067,1069,1071,1071,1074,1074,1074,1080,1080,1086,1086,1089,1090,1093,1093,1094,1094,1094,1094,1096,1097,1100,1100,1100,1100,1101,1101,1101,1102,1103,1106,1107,1107,1107,1108,1108,1108,1110,1110,1111,1112,1112,1112,1112,1115,1115,1116,1117,1118,1119,1119,1121,1121,1121,1125,1125,1126,1129,1130,1133,1134,1135,1138,1138,1140,1140,1140,1141,1141,1141,1143,1144,1144,1147,1147,1148,1148,1150,1152,1153,1153,1156,1156,1158,1158,1158,1161,1162,1162,1163,1163,1163,1163,1166,1166,1166,1166,1167,1168,1169,1169,1171,1171,1174,1174,1175,1176,1176,1177,1180,1180,1181,1181,1182,1182,1183,1184,1185,1186,1186,1187,1187,1188,1190,1194,1194,1196,1196,1196,1198,1199,1199,1204,1204,1209,1210,1211,1212,1213,1214,1215,1215,1215,1216,1219,1220,1221,1222,1222,1223,1223,1225,1225,1226,1227,1227,1229,1229,1229,1229,1229,1230,1234,1239,1240,1241,1244,1245,1247,1247,1248,1249,1249,1249,1249,1249,1249,1249,1250,1250,1251,1251,1253,1255,1255,1255,1256,1256,1258,1258,1259,1261,1261,1261,1261,1262,1262,1263,1264,1264,1264,1264,1265,1265,1265,1265,1265,1265,1265,1267,1267,1267,1271,1272,1272,1272,1272,1273,1273,1274,1274,1274,1275,1276,1276,1277,1277,1280,1280,1283,1283,1285,1285,1289,1290,1295,1295,1296,1296,1298,1299,1299,1299,1299,1299,1300,1300,1302,1302,1305,1307,1307,1309,1309,1310,1310,1311,1312,1314,1314,1314,1315,1317,1317,1319,1319,1321,1321,1323,1323,1325,1326,1326,1329,1329,1330,1332,1333,1333,1334,1335,1337,1338,1340,1341,1342,1344,1345,1346,1347,1347,1347,1347,1347,1349,1350,1352,1354,1358,1360,1364,1364,1365,1365,1367,1367,1367,1370,1370,1373,1373,1374,1374,1378,1379,1379,1380,1380,1380,1381,1382,1383,1384,1384,1386,1387,1387,1387,1387,1387,1387,1389,1391,1392,1392,1392,1395,1396,1396,1396,1397,1398,1399,1399,1400,1401,1404,1404,1404,1407,1407,1410,1410,1411,1411,1411,1412,1412,1413,1413,1413,1413,1415,1416,1416,1416,1418,1419,1420,1420,1421,1424,1425,1427,1427,1428,1431,1432,1433,1434,1434,1435,1435,1436,1437,1437,1437,1440,1440,1442,1442,1443,1443,1445,1446,1446,1448,1449,1449,1449,1450,1450,1451,1455,1455,1455,1457,1458,1459,1459,1460,1460,1461,1461,1462,1463,1463,1463,1463,1463,1465,1465,1465,1465,1467,1470,1470,1471,1473,1474,1474,1475,1481,1481,1481,1482,1481,1481,1482,1482,1482,1482,1482,1486,1487,1487,1489,1490,1490,1491,1491,1491,1491,1491,1492,1492,1492,1494,1497,1497,1498,1501

1 Like

Hi,
I just checked your examples. ā€œemphasisā€ takes 10 nodes, ā€œemphasisā€ and ā€œemphasizeā€ together take 11 nodes. Your other example ā€œroleā€/ā€œrollā€ also checks out at my side - ā€œroleā€ takes 6 nodes and ā€œroleā€ and ā€œrollā€ together also take 6 nodes.
I do not know how the data above was generated. My implementation has two node counts - of the ones that have been added to the graph and the ones that have been minimized after being re-linked for conciseness. You can see how the algorithm works as shown here: Directed Acyclic Word Graphs - Part 1 - The Basics.
The numbers generated above are quite possibly implementation dependent, so it is possible that you come up with different ones or even that there is no correspondence between yours and the given ones.
I hope this helps.

1 Like

Hi,
the data above was generated by running our two codes on sublists of the 1300-word list of Test 5. Element [[i]] is the number of nodes in a DAWG built out of the first [[i]] elements of the list, rather than the full list. These are well-defined DAWGs that just describe a smaller dictionary, so their minimum number of nodes should not be implementation dependent. I agree that these do not describe the intermediate steps of building the final 1300-word DAWG, which indeed can be implementation dependent. But I was still hoping that looking at those smaller sets of words may help fix the discrepancy.

I agree that ā€œemphasisā€ takes 10 nodes and ā€œemphasis, emphasizeā€ take 11. This is also what I get. What I checked in addition, is that if I build a DAWG with the 389 words preceding ā€œemphasizeā€ (the last being ā€œemphasisā€) and then I build a separate DAWG with the same words plus ā€œemphasizeā€, the two DAWGs differ by 1 node in my implementation, but they seem to differ by 2 nodes in yours. I am not claiming that your result is wrong - perhaps the previous 388 words make a difference - but maybe it helps understanding what is going on. (Same for ā€œrole, rollā€ - in my case the two DAWGs have the same number of nodes, in your case the second DAWG gains one extra node - but again, perhaps this is due to the 900+ words already in the DAWG).

I based my implementation on the very same algorithm, so it should be possible to understand what goes wrong. Actually, I closely followed the C# implementation given at the link you shared, although I think his code would not work directly (his nodes cannot be used as dictionary keys without overwriting equality and hash functions, and his function to determine the length of the common prefix seem to return 0 when a word exactly contains the previous one).

1 Like

Thanks for replying.
My impression of the C# implementation was also that the common prefix sum method was bugged. I wrote my own version which made sense seeing what the algorithm was doing. I did not get into much details because I did not feel like learning C# specific stuff.
It seems to me that the identification of the nodes which is used to check for equality might make the difference. I based this part on the python implementation linked in the same article.
I was not bothered to check if the implementation itself works as a whole.
I can confirm what you noted about ā€œemphasisā€/ā€œemphasizeā€ - there is really a difference of two nodes between DAWG to index 388 and 389, but as you mention the difference might come from the structure of the graph up to this point.
I would be happy to send you the code for reference in a private message. (if I do not get reprimanded by the CG staff :).

2 Likes

Thanks! I donā€™t feel thereā€™s something wrong with sharing code in this case - weā€™re trying to see if the published test cases are correct, after all. If my solution is legit, this proves that there exists a DAWG with less nodes; but it could well be that my code misses some nodes and works just because of some loophole.

Iā€™ll also try to write a solution in python following the one you mention. I found their explanation of the same algorithm less clear, but the code looks neat.

1 Like

Hi again,
after trying a simple test suggested by Westicles (ā€œanalysisā€, ā€œanalyzeā€, ā€œemphasisā€, ā€œemphasizeā€) I agree that thereā€™s something wrong with my solution, possibly something subtle about overriding hashcodes in c#.

EDIT: found the mistake: in the way I overrode the equality operator, my code was identifying nodes with multiple children A ā†’ (B,C) with nodes with multiple generations, like A ā†’ B ā†’ C.

Thanks again Westicles and UnnamedCodingGamer for your time and patience!

3 Likes

How many nodes is need for this example?

Iā€™m thinking with the number of people whoā€™ve solved this I must be wrong but I canā€™t think why.

Doesnā€™t the initial example collapse to 5 nodes rather than 6?

(mermaid diagrams donā€™t seem to render but this should make it clear)

flowchart LR
    SOURCE --> t --> a --> p --> s --> SINK
    t --> o --> p
    p --> SINK

Hi, you can see the graph for that one in the wiki link in the description

Labelled edges are important. Iā€™d got it in my head that there was no difference between a labelled edge and a node with the same label but of course there is.

Cheers yep that makes it much clearer.

I think this problem deserves the Tries tag!

And the Woof tag. :blush:

1 Like

i donā€™t get it

my result for test #2 is 143 instead of 204 - counting by hand yields the same - so i must have a wrong idea about the problem.

I wish there were some short testcases with words that differ much from each other - and donā€™t begin with the same letter.

what if the example included the word ā€˜lapsā€™ would the result be 8 because i have to add a node for (ā€˜lā€™ in the first place) AND an empty starting node ?

would still not explain my missing 61 results, of course . . .

How about trying this custom case?

Input

28
against
always
anyone
baby
budget
by
charge
degree
eight
energy
everyone
finger
generation
get
hang
history
legal
light
long
military
myself
party
tough
type
worry
yeah
year
yet

Output

86