Review of existing puzzles

Hi All,

I’ve been somewhat active with community puzzles recently, and I’m concerned that CodinGame may not be providing the best user experience, especially for newcomers who may stumble upon some puzzles that are misleading, miscategorized, or not on par with the rest of the site’s offerings. Puzzles are already removed when the rating drops below 3, but there are sometimes also good puzzle ideas with problematic implementations which could be a whole lot better.

What I’d like to suggest is establishing other criteria to review existing puzzles, a low bar if you will, or apply a triple nomination scheme or something similar, so that critical feedback provided beyond just the initial 3 approvals can be taken into account. My specific suggestion is to identify such puzzles based on completion success rate and occasionally take one through the approval process again, to at least identify weaknesses and decide if there is a pressing enough need to do anything about it.

In some but not all cases, the best course of action may be to change the puzzle. Of course we would want this to be rare since it can be extremely disruptive. But with procedures in place, the negatives can be mitigated. For instance, it may make sense to recreate a puzzle under a similar name and deprecate the old one. And I’d like to start talking about doing that specifically with Parse SQL Queries, which is an easy puzzle with a 12.5% success rate, less than half that of the next and far below the 65% bottom quartile for easy puzzles. With an average rating of 3.84 submitted by over 500 users who were able to solve it, the puzzle is not in question being removed.

I’m raising this as a discussion topic first because I don’t want to be too bold. Sometimes there are changes to be made that appear obvious to me but apparently not so to others, so I’d appreciate any feedback to align with community expectations.

If it weren’t for the rule about originality, my next course of action would be replicating Parse SQL Queries and fixing any issues in the design myself. That’s because permission to edit is cumbersome even when changes are relatively minor, and I have no reason to believe they will be here. With 500 solvers, furthermore, most edits are out of the question. Adjusting the difficulty level may be an exception, but the success rate is lower than even any medium puzzle, so that feels like a bandaid.

Any thoughts?

Cheers!
David Villa

2 Likes

Just a slight clarification first: a player can rate a puzzle after submitting their code once, even if the code doesn’t pass any validators! However, I assume most ratings are submitted by players who have successfully solved the puzzle being rated.

Parse SQL Queries is an interesting case because the author has removed their account, making it potentially impossible to obtain their permission for you to edit the puzzle. However, if you feel there are fundamental issues with the puzzle that need fixing, please proceed with the edits and leave a comment in the contribution area detailing what you’ve edited and why (I’ve done so myself, though usually for minor fixes). Alternatively, if you prefer to get feedback from fellow players before editing, you can post in the discussion thread of the puzzle here.

I’m not so sure about the idea of creating a new formal process to review / deprecate and recreate existing puzzles.

I fully agree that the difficulty level is miscategorised for some puzzles, but I’m hesitant to change it since difficulty can be highly subjective. I’d prefer a new system where difficulty is adjusted automatically based on success rate and the number of people who have solved the puzzle. This issue alone warrants starting a new forum topic.

1 Like

Ah, thanks for the clarification. The 3.84 stars for Parse SQL Queries is based on 188 ratings, presumably most after completing the puzzle, tho we can’t really tell from the stats. Importantly, it is not at risk of removal. It would take 26 more 1 star ratings, as many 1 star ratings again as it has now (but consecutive), to bring it below the threshold of 3. So in this case there is some failure that’s very apparent and that the automated system has not and is not going to catch.

The idea of adjusting the difficulty level automatically is entirely relevant here. I would like to establish a process for handling these cases, and an automated system is still a process that needs defining. My worry about making it automatic is that we don’t yet specifically know why there are issues, or at least I don’t. You have more experience. If it turns out all we’re lacking is a clear delineation between difficulty levels, then I’d be all for automation.

Let me say what I’ve seen based on comments. Only in one case so far does it appear to be solely a matter of the puzzle being misclassified. That may be so for Parse SQL Queries as well, if it truly deserves to be all the way up at the hard level, which is doubtful. I will push forward with investigation on the puzzle thread and summarize the results here. Yet there is a trend in the data that leads me to think this change in isolation would just be masking some underlying issue. It may seem somewhat arbitrary, but the stats fit fairly well:

Most easy puzzles have a success rate better than 1/2. Only 5.6% of active puzzles do not.
Most medium puzzles have a success rate better than 1/4. Only 2.5% of active puzzles do not.
Most hard puzzles have a success rate better than 1/8. Only 2.9% of active puzzles do not.
Most very hard puzzles have a success rate better than 1/16. Only 6.5% of active puzzles do not.

If we exclude the Netflix contest, which is intended to have a low success rate, and 2 puzzles that I would have already bumped up from easy except for the issue of permissions, then these figures fall below 5% at all levels. One potential reason medium and hard puzzles have a higher success rate is that there are misclassifications in the other direction, whereas very hard is more guarded. It could also be that, on the whole, easy puzzles tend to be written by authors who are less experienced and let more flaws slip in. Regardless, it should be very instructive to have some guideline to follow.

Just one minor thought in this interesting thread: I think success rates are heavily influenced by which players try which puzzles. I think many codingamers avoid medium/hard/very hard puzzles (me myself regularly solve an easy puzzle, but only when I feel very much like it I start looking in medium and above). So, higher success rates amongst medium and hard might simply be because those puzzles are only started by the upper experienced codingamers.

Related to this, I’m not sure on basing difficulty levels on success rates. While beginning codingamers can now safely try and have fun in the easy category, with a success rate based system it will regularly occur that beginning players will have a negative experience of not being able to solve a puzzle (eg for example a hard puzzle based on 10% success rate would mean you need 90% of players having a negative experience).