[Community Puzzle] DAWG

Coding Games and Programming Challenges to Code Better

Send your feedback or ask for help here!

Created by @Westicles,validated by @ggrn,@Zeba and @alze.
If you have any issues, feel free to ping them.

Hey @Westicles I liked this puzzle ! I solved it multiple ways to test some ideas.

1 Like

First of all, great puzzle, I learned a lot trying to solve it so far!

I wrote a C# solution that passes the first three tests, but falls short of 5 nodes in Test 4 (1257 rather than 1262) and of 2 nodes in Test 5 (1501 vs 1503). Any ideas on what may be going on?

Rather than having an “end of word” node, I equipped every node with an attribute “IsEnd”. Two nodes are declared equal if they point to the same children through the same links and have the same “IsEnd” status. In this way my DAWG is always 1 node shorter (it is missing the End node) but this is already accounted for in the numbers I quoted above. Am I missing something? Could there be more than one End nodes?

1 Like

That sounds like it should work. Maybe try generating all words from your DAWG when you are done and see if any are missing or you have extra words?

2 Likes

Thanks for the help. I added a lookup function. It finds every word from the input list, while correctly leaving out similar words not in the list (like the ones obtained from those by adding a random character at the end, or removing the last character - unless these were already in the list, of course).

I also added a function that generates all words by using dps starting from the root node, and using that I correctly generate back the input list (and no other words).

Could we perhaps try and see if there are specific words that cause the discrepancy? For example I could share a list of the number of nodes after inserting each of the words in Test 5.

2 Likes

I made this from UnnamedCodinGamer’s c++ code (I’m terrible at python):

Test 5 nodes with each added word

3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,574,578,579,584,584,584,589,589,590,591,592,597,601,601,602,602,605,607,607,607,610,610,612,613,616,618,616,617,620,620,621,624,627,629,630,632,632,635,635,636,638,638,639,639,641,641,641,646,649,649,652,652,652,653,656,656,657,658,658,658,658,660,661,661,661,662,663,664,665,666,666,666,667,667,668,668,668,669,670,670,670,671,671,673,673,673,674,674,674,675,675,676,678,678,684,684,684,687,690,690,691,691,691,698,698,700,705,708,708,708,709,709,709,709,709,712,712,712,712,713,713,713,714,715,715,716,718,718,720,722,722,723,723,728,728,729,729,730,735,735,735,735,735,739,739,740,741,742,742,742,742,743,744,744,744,744,746,750,750,751,751,751,751,753,755,755,755,755,760,764,766,767,769,769,771,771,772,773,775,775,776,778,780,780,780,783,782,787,789,791,792,793,793,793,795,798,798,798,801,802,803,805,806,807,811,815,818,818,820,821,821,823,824,824,826,828,830,830,833,834,836,836,840,843,845,846,846,850,851,851,855,855,855,858,860,860,861,863,863,864,864,868,868,869,871,872,873,874,874,874,874,875,877,878,878,882,882,882,882,884,884,886,886,887,892,892,892,892,892,892,892,893,895,896,896,896,896,897,897,900,900,902,902,902,906,906,906,908,910,910,910,911,911,911,911,911,915,919,920,920,922,922,922,922,923,929,929,930,930,932,933,934,934,935,935,935,935,936,937,937,941,941,944,944,944,944,945,945,946,948,948,948,949,950,950,952,953,953,956,956,957,957,958,958,960,960,961,961,961,962,963,965,966,970,970,970,972,973,973,974,977,977,980,980,985,985,985,988,996,996,998,998,1001,1001,1001,1001,1005,1005,1005,1005,1009,1009,1010,1010,1010,1011,1011,1011,1013,1015,1015,1018,1018,1021,1021,1023,1023,1024,1026,1026,1026,1027,1028,1028,1029,1029,1031,1032,1034,1034,1035,1037,1038,1040,1040,1042,1042,1045,1045,1046,1048,1048,1050,1050,1051,1051,1051,1054,1054,1054,1056,1056,1057,1057,1057,1058,1062,1063,1063,1063,1064,1065,1065,1065,1066,1066,1067,1067,1067,1068,1070,1072,1072,1075,1075,1075,1081,1081,1087,1087,1090,1091,1094,1094,1095,1095,1095,1095,1097,1098,1101,1101,1101,1101,1102,1102,1102,1103,1104,1107,1108,1108,1108,1109,1109,1109,1111,1111,1112,1113,1113,1113,1113,1116,1116,1117,1118,1119,1120,1120,1122,1122,1122,1126,1126,1127,1130,1131,1134,1135,1136,1139,1139,1141,1141,1141,1142,1142,1142,1144,1145,1145,1148,1148,1149,1149,1151,1153,1154,1154,1157,1157,1159,1159,1159,1162,1163,1163,1164,1164,1164,1164,1167,1167,1167,1167,1168,1170,1170,1170,1172,1172,1175,1175,1176,1177,1177,1178,1181,1181,1182,1182,1183,1183,1184,1185,1186,1187,1187,1188,1188,1189,1191,1195,1195,1197,1197,1197,1199,1200,1200,1205,1205,1210,1211,1212,1213,1214,1215,1216,1216,1216,1217,1220,1221,1222,1223,1223,1224,1225,1227,1227,1228,1229,1229,1231,1231,1231,1231,1231,1232,1236,1241,1242,1243,1246,1247,1249,1249,1250,1251,1251,1251,1251,1251,1251,1251,1252,1252,1253,1253,1255,1257,1257,1257,1258,1258,1260,1260,1261,1263,1263,1263,1263,1264,1264,1265,1266,1266,1266,1266,1267,1267,1267,1267,1267,1267,1267,1269,1269,1269,1273,1274,1274,1274,1274,1275,1275,1276,1276,1276,1277,1278,1278,1279,1279,1282,1282,1285,1285,1287,1287,1291,1292,1297,1297,1298,1298,1300,1301,1301,1301,1301,1301,1302,1302,1304,1304,1307,1309,1309,1311,1311,1312,1312,1313,1314,1316,1316,1316,1317,1319,1319,1321,1321,1323,1323,1325,1325,1327,1328,1328,1331,1331,1332,1334,1335,1335,1336,1337,1339,1340,1342,1343,1344,1346,1347,1348,1349,1349,1349,1349,1349,1351,1352,1354,1356,1360,1362,1366,1366,1367,1367,1369,1369,1369,1372,1372,1375,1375,1376,1376,1380,1381,1381,1382,1382,1382,1383,1384,1385,1386,1386,1388,1389,1389,1389,1389,1389,1389,1391,1393,1394,1394,1394,1397,1398,1398,1398,1399,1400,1401,1401,1402,1403,1406,1406,1406,1409,1409,1412,1412,1413,1413,1413,1414,1414,1415,1415,1415,1415,1417,1418,1418,1418,1420,1421,1422,1422,1423,1426,1427,1429,1429,1430,1433,1434,1435,1436,1436,1437,1437,1438,1439,1439,1439,1442,1442,1444,1444,1445,1445,1447,1448,1448,1450,1451,1451,1451,1452,1452,1453,1457,1457,1457,1459,1460,1461,1461,1462,1462,1463,1463,1464,1465,1465,1465,1465,1465,1467,1467,1467,1467,1469,1472,1472,1473,1475,1476,1476,1477,1483,1483,1483,1484,1483,1483,1484,1484,1484,1484,1484,1488,1489,1489,1491,1492,1492,1493,1493,1493,1493,1493,1494,1494,1494,1496,1499,1499,1500,1503

Thanks! Here are my results:

The first difference occurs for the word at index 389, “emphasize” after “emphasis”, which costs me 1 additional node (572->573) but 2 nodes to Unnamed (572->574).

The second difference occurs at 988, “roll” after “role”, which costs me 0 nodes (1223->1223) and 1 node to Unnamed (1224->1225).

I’m not sure of how to use this information though, let me know if you have ideas.

Test 5 nodes with each added word

3,6,11,12,12,15,16,22,24,25,30,34,37,37,40,40,44,45,48,48,49,51,60,60,66,66,67,71,73,73,75,78,81,81,84,84,86,87,93,94,95,95,98,101,105,105,107,108,109,114,118,123,123,124,124,125,127,129,129,130,134,137,137,138,139,140,142,144,148,148,150,151,152,155,158,160,160,162,162,162,164,167,167,171,171,172,175,177,177,178,179,181,184,185,187,189,190,192,196,198,198,198,202,202,203,205,205,206,207,207,210,212,212,212,212,212,213,215,217,219,219,224,225,225,227,229,229,231,233,234,234,235,235,236,236,237,238,239,239,239,243,245,245,246,250,250,252,253,254,254,256,256,256,257,258,258,262,262,262,265,267,268,268,269,269,273,274,274,277,277,277,278,278,278,280,283,283,283,285,288,290,291,293,294,294,296,297,297,302,302,302,303,303,303,303,304,310,311,316,318,320,321,323,323,326,328,329,329,330,332,332,332,332,333,334,335,336,337,341,345,345,345,350,351,351,353,353,353,354,354,358,358,360,362,362,363,367,368,368,372,374,376,376,377,378,378,383,386,386,388,391,394,394,395,397,398,401,401,406,406,408,413,413,415,416,416,419,419,419,422,425,425,426,426,430,430,431,431,431,434,436,437,438,438,441,441,445,445,446,446,448,450,450,453,453,455,455,455,456,459,459,459,461,461,461,463,463,465,465,465,467,467,469,471,472,473,475,476,480,480,482,482,484,484,486,486,490,491,493,494,495,497,500,500,500,502,505,505,508,510,510,510,514,514,518,521,524,525,525,525,527,527,529,529,529,530,531,532,534,534,534,534,538,540,540,541,541,543,543,543,544,544,545,546,546,546,547,547,550,551,552,553,554,556,559,563,566,569,572,573,577,578,583,583,583,588,588,589,590,591,596,600,600,601,601,604,606,606,606,609,609,611,612,615,617,615,616,619,619,620,623,626,628,629,631,631,634,634,635,637,637,638,638,640,640,640,645,648,648,651,651,651,652,655,655,656,657,657,657,657,659,660,660,660,661,662,663,664,665,665,665,666,666,667,667,667,668,669,669,669,670,670,672,672,672,673,673,673,674,674,675,677,677,683,683,683,686,689,689,690,690,690,697,697,699,704,707,707,707,708,708,708,708,708,711,711,711,711,712,712,712,713,714,714,715,717,717,719,721,721,722,722,727,727,728,728,729,734,734,734,734,734,738,738,739,740,741,741,741,741,742,743,743,743,743,745,749,749,750,750,750,750,752,754,754,754,754,759,763,765,766,768,768,770,770,771,772,774,774,775,777,779,779,779,782,781,786,788,790,791,792,792,792,794,797,797,797,800,801,802,804,805,806,810,814,817,817,819,820,820,822,823,823,825,827,829,829,832,833,835,835,839,842,844,845,845,849,850,850,854,854,854,857,859,859,860,862,862,863,863,867,867,868,870,871,872,873,873,873,873,874,876,877,877,881,881,881,881,883,883,885,885,886,891,891,891,891,891,891,891,892,894,895,895,895,895,896,896,899,899,901,901,901,905,905,905,907,909,909,909,910,910,910,910,910,914,918,919,919,921,921,921,921,922,928,928,929,929,931,932,933,933,934,934,934,934,935,936,936,940,940,943,943,943,943,944,944,945,947,947,947,948,949,949,951,952,952,955,955,956,956,957,957,959,959,960,960,960,961,962,964,965,969,969,969,971,972,972,973,976,976,979,979,984,984,984,987,995,995,997,997,1000,1000,1000,1000,1004,1004,1004,1004,1008,1008,1009,1009,1009,1010,1010,1010,1012,1014,1014,1017,1017,1020,1020,1022,1022,1023,1025,1025,1025,1026,1027,1027,1028,1028,1030,1031,1033,1033,1034,1036,1037,1039,1039,1041,1041,1044,1044,1045,1047,1047,1049,1049,1050,1050,1050,1053,1053,1053,1055,1055,1056,1056,1056,1057,1061,1062,1062,1062,1063,1064,1064,1064,1065,1065,1066,1066,1066,1067,1069,1071,1071,1074,1074,1074,1080,1080,1086,1086,1089,1090,1093,1093,1094,1094,1094,1094,1096,1097,1100,1100,1100,1100,1101,1101,1101,1102,1103,1106,1107,1107,1107,1108,1108,1108,1110,1110,1111,1112,1112,1112,1112,1115,1115,1116,1117,1118,1119,1119,1121,1121,1121,1125,1125,1126,1129,1130,1133,1134,1135,1138,1138,1140,1140,1140,1141,1141,1141,1143,1144,1144,1147,1147,1148,1148,1150,1152,1153,1153,1156,1156,1158,1158,1158,1161,1162,1162,1163,1163,1163,1163,1166,1166,1166,1166,1167,1168,1169,1169,1171,1171,1174,1174,1175,1176,1176,1177,1180,1180,1181,1181,1182,1182,1183,1184,1185,1186,1186,1187,1187,1188,1190,1194,1194,1196,1196,1196,1198,1199,1199,1204,1204,1209,1210,1211,1212,1213,1214,1215,1215,1215,1216,1219,1220,1221,1222,1222,1223,1223,1225,1225,1226,1227,1227,1229,1229,1229,1229,1229,1230,1234,1239,1240,1241,1244,1245,1247,1247,1248,1249,1249,1249,1249,1249,1249,1249,1250,1250,1251,1251,1253,1255,1255,1255,1256,1256,1258,1258,1259,1261,1261,1261,1261,1262,1262,1263,1264,1264,1264,1264,1265,1265,1265,1265,1265,1265,1265,1267,1267,1267,1271,1272,1272,1272,1272,1273,1273,1274,1274,1274,1275,1276,1276,1277,1277,1280,1280,1283,1283,1285,1285,1289,1290,1295,1295,1296,1296,1298,1299,1299,1299,1299,1299,1300,1300,1302,1302,1305,1307,1307,1309,1309,1310,1310,1311,1312,1314,1314,1314,1315,1317,1317,1319,1319,1321,1321,1323,1323,1325,1326,1326,1329,1329,1330,1332,1333,1333,1334,1335,1337,1338,1340,1341,1342,1344,1345,1346,1347,1347,1347,1347,1347,1349,1350,1352,1354,1358,1360,1364,1364,1365,1365,1367,1367,1367,1370,1370,1373,1373,1374,1374,1378,1379,1379,1380,1380,1380,1381,1382,1383,1384,1384,1386,1387,1387,1387,1387,1387,1387,1389,1391,1392,1392,1392,1395,1396,1396,1396,1397,1398,1399,1399,1400,1401,1404,1404,1404,1407,1407,1410,1410,1411,1411,1411,1412,1412,1413,1413,1413,1413,1415,1416,1416,1416,1418,1419,1420,1420,1421,1424,1425,1427,1427,1428,1431,1432,1433,1434,1434,1435,1435,1436,1437,1437,1437,1440,1440,1442,1442,1443,1443,1445,1446,1446,1448,1449,1449,1449,1450,1450,1451,1455,1455,1455,1457,1458,1459,1459,1460,1460,1461,1461,1462,1463,1463,1463,1463,1463,1465,1465,1465,1465,1467,1470,1470,1471,1473,1474,1474,1475,1481,1481,1481,1482,1481,1481,1482,1482,1482,1482,1482,1486,1487,1487,1489,1490,1490,1491,1491,1491,1491,1491,1492,1492,1492,1494,1497,1497,1498,1501

1 Like

Hi,
I just checked your examples. “emphasis” takes 10 nodes, “emphasis” and “emphasize” together take 11 nodes. Your other example “role”/“roll” also checks out at my side - “role” takes 6 nodes and “role” and “roll” together also take 6 nodes.
I do not know how the data above was generated. My implementation has two node counts - of the ones that have been added to the graph and the ones that have been minimized after being re-linked for conciseness. You can see how the algorithm works as shown here: Directed Acyclic Word Graphs - Part 1 - The Basics.
The numbers generated above are quite possibly implementation dependent, so it is possible that you come up with different ones or even that there is no correspondence between yours and the given ones.
I hope this helps.

1 Like

Hi,
the data above was generated by running our two codes on sublists of the 1300-word list of Test 5. Element [[i]] is the number of nodes in a DAWG built out of the first [[i]] elements of the list, rather than the full list. These are well-defined DAWGs that just describe a smaller dictionary, so their minimum number of nodes should not be implementation dependent. I agree that these do not describe the intermediate steps of building the final 1300-word DAWG, which indeed can be implementation dependent. But I was still hoping that looking at those smaller sets of words may help fix the discrepancy.

I agree that “emphasis” takes 10 nodes and “emphasis, emphasize” take 11. This is also what I get. What I checked in addition, is that if I build a DAWG with the 389 words preceding “emphasize” (the last being “emphasis”) and then I build a separate DAWG with the same words plus “emphasize”, the two DAWGs differ by 1 node in my implementation, but they seem to differ by 2 nodes in yours. I am not claiming that your result is wrong - perhaps the previous 388 words make a difference - but maybe it helps understanding what is going on. (Same for “role, roll” - in my case the two DAWGs have the same number of nodes, in your case the second DAWG gains one extra node - but again, perhaps this is due to the 900+ words already in the DAWG).

I based my implementation on the very same algorithm, so it should be possible to understand what goes wrong. Actually, I closely followed the C# implementation given at the link you shared, although I think his code would not work directly (his nodes cannot be used as dictionary keys without overwriting equality and hash functions, and his function to determine the length of the common prefix seem to return 0 when a word exactly contains the previous one).

1 Like

Thanks for replying.
My impression of the C# implementation was also that the common prefix sum method was bugged. I wrote my own version which made sense seeing what the algorithm was doing. I did not get into much details because I did not feel like learning C# specific stuff.
It seems to me that the identification of the nodes which is used to check for equality might make the difference. I based this part on the python implementation linked in the same article.
I was not bothered to check if the implementation itself works as a whole.
I can confirm what you noted about “emphasis”/“emphasize” - there is really a difference of two nodes between DAWG to index 388 and 389, but as you mention the difference might come from the structure of the graph up to this point.
I would be happy to send you the code for reference in a private message. (if I do not get reprimanded by the CG staff :).

2 Likes

Thanks! I don’t feel there’s something wrong with sharing code in this case - we’re trying to see if the published test cases are correct, after all. If my solution is legit, this proves that there exists a DAWG with less nodes; but it could well be that my code misses some nodes and works just because of some loophole.

I’ll also try to write a solution in python following the one you mention. I found their explanation of the same algorithm less clear, but the code looks neat.

1 Like

Hi again,
after trying a simple test suggested by Westicles (“analysis”, “analyze”, “emphasis”, “emphasize”) I agree that there’s something wrong with my solution, possibly something subtle about overriding hashcodes in c#.

EDIT: found the mistake: in the way I overrode the equality operator, my code was identifying nodes with multiple children A → (B,C) with nodes with multiple generations, like A → B → C.

Thanks again Westicles and UnnamedCodingGamer for your time and patience!

3 Likes

How many nodes is need for this example?

I’m thinking with the number of people who’ve solved this I must be wrong but I can’t think why.

Doesn’t the initial example collapse to 5 nodes rather than 6?

(mermaid diagrams don’t seem to render but this should make it clear)

flowchart LR
    SOURCE --> t --> a --> p --> s --> SINK
    t --> o --> p
    p --> SINK

Hi, you can see the graph for that one in the wiki link in the description

Labelled edges are important. I’d got it in my head that there was no difference between a labelled edge and a node with the same label but of course there is.

Cheers yep that makes it much clearer.