MIME Type puzzle discussion

diliscia · September 14, 2020, 8:44pm

Hello,
I did this using dictionaries in Python and was kind of straight-forward.
Then I tried it in C++. I thought that like with the other puzzles that I’m doing in both languages, “translating” the same algorithm but using the language tools, it’ll be quick. My mistake.
It took longer than I though, because even if using the libraries with the information of how to split a string, or use a dictionary (map), that I never used before, applying the concepts was not easy, but it worked… for the first 4th tests (with some little modifications).
I got stuck with the 5th test (Large dataset), so I tried to make the program more efficient, reducing not necessary operations (like keeping the name of the file, etc.) but anyway I was receiving the message of " Failure: Process has timed out. This may mean that your solution is not optimized enough to handle some cases.".
Surprisingly, after a lot of tries, and frustration, I took a pause. I came at the discussion forum, read some posts and then I came back to the puzzle page.
I swear I did nothing to the code, I just clicked the 5th test again… and it passed!
I submitted my code and received 100% score, then I clicked again, and I received the Failure error…

Jacque7 · September 16, 2020, 11:21am

Hello,
I am coding in Javascript

My code passes all the solutions but the “limit size in extension” validator. Anyone would know what it is about ? I read all of the solution in this thread but could not find anything that worked for me…
My code checks for filenames that end with “.” and filnames with just the extensions (example : “.html”). as well as filnames without any dots present.
It looks at a filname and reads the characters following the last “.” to extract the file extension.
I tried checking for length of the file extension but it did not change anything.
I don’t get what I am missing here, can anyone help me ?
thank you

anon72424297 · September 16, 2020, 1:25pm

The different “limit” validators are more for languages like C that have character strings of fixed sizes. You don’t have to handle the size limits cause no out of limit extension will be sent to your code.
So you probably have a specific case not handled by your code and which only appears in this validator, but unconnected to its title.

Jacque7 · September 17, 2020, 10:49am

Oh I see, thank you for your answer !

So there is a specific case in this validator that fails my code but there is no way for me to know which one it is right ? that’s a bit annoying…

If anyone knows what I am missing here is what my code does:

Take the file extensions and the associated MIME type and put them in separated arrays.
Find the last dot in the FileName (returns UNKNOWN if there is no dots)
take all the characters (the extension) after the last dots and put them in a variable (returns UNKNOWN if there are no characters after the dot)
Turn the variable with the extension into a regular expression that is not case sensitive
Check if the Regexp has a match in the array with the file extensions (returns UNKNOWN if no match is found)
return the index of that match
return the content of the array with MIME types at the same index

and repeat for all the filenames

Thank you all

anon72424297 · September 17, 2020, 12:27pm

I’m not sure that regex is the simplest way to solve this puzzle, but why not. ^^
Are you sure of your regex generation ?

rradarr · October 1, 2020, 6:06pm

This puzzle is definetly inconsistent with it’s Time outs. I wanted to implement a hash map on my own and I was stuck on this puzzle for three days because of the last test. I had multiple instances when the Large Dataset passed and then without any changes it failed immediatly after. One time it failed on a “clean” solution and then passed when I uncommented some helper debugging functions that iterated thru the whole data structure and printed a lot of stuff, definetly slowing the program down.
In the end I found a slight workaround for C++: instead of using cout to output line by line while the program runs you can store all of the output messages into a stringstream and then at the end of the program use cout just once to print it all. This saves enough time to get pass the Large Dataset, maybe someone struggling finds this usefull.

anon72424297 · October 1, 2020, 6:17pm

It seems that your hash map is simply not optimized.
With C++, using a hash map from the standard library, like std::map, allow to pass widely all the tests.

rradarr · October 1, 2020, 6:33pm

Honestly I’m not sure how would I optimize it further. From what I’ve read, when using chaining a load factor of 0.6 is okay and I don’t know where the problem lies.

Djoums · October 1, 2020, 7:17pm

Hard to tell without any info, but you’re doing something wrong. I checked my old javascript solution that uses a map too, and the last test poses no problem.

anon72424297 · October 1, 2020, 7:35pm

To be totally honest, I never tried to implement a hash map by myself, the native ones always suited me until now… I simply notice that others implementations than yours pass widely the largest tests. But I difficultly can help you with your implementation.

rradarr · October 1, 2020, 7:47pm

What info would you like? I don’t know if I can just start pasting code here, I’m pretty new to CG.
My main structure is a vector<vector<vector<string>>> hash_table (the second vectors are the buckets or chains and the third vector is to store the extension and media type). (BTW is there a cleaner way of doing such vectors? Beacuse this does look a bit nightmarish)
I’m using djb2 as the hashing function but I’ve changed the bit shift from 5 to 7 because it gives a better load factor in this case.
When adding an element I use the extension as the key and I push back the extension and media type (MT) into a vector (a bucket) at the index provided by the hashing function.
When looking up an element I again use the extension to get an index and then linearly search the bucket at that index for a matching extension. The linear serach here sounds bad, but with a load factor of 0.6 and max bucket size of 6 it can’t be that bad.

anon72424297 · October 1, 2020, 11:04pm

Hi again !
I just tried a rapid implem of a hash table in C++ with djb2 and it pass all the tests and validators.
I don’t know how you have coded it, but I don’t use vectors. A struct for the extension and the MIME type, and a pointer cause I represent the buckets with chained lists, and a fixed size array to contain all of that.

Djoums · October 2, 2020, 12:00am

Last time I made a map (a long time ago ) I remember using one way linked lists rather than vectors for buckets, that way you don’t get that triple vectors chain and save some memory. But with the amount of data from this puzzle that shouldn’t change much tbh.
You can use pastebin or any other similar site to post a link to your code here.

rradarr · October 2, 2020, 8:58am

Thank you both for your attention Here is my solution.
I’m very curious what’s slowing this program down so much.
I considered a simple singly linked list for the buckets but I figured that it probably wouldn’t be more optimized than vectors already are so I didn’t change them.

TwoSteps · October 2, 2020, 10:38am

This puzzle has been causing issues with C and C++ users for a while now. Not sure yet what’s the root issue. We’ve discussed this with a few coders who had solved it a long time ago and whose code was no longer working. They mentioned to output the answer once instead of doing multiple couts, but it seems that’s what you’re doing here. I don’t know

anon72424297 · October 2, 2020, 10:46am

I already have this problem, but when it happens launching the test again can be enough to pass. Here the big test never pass.
@rradarr : vectors can be a problem, cause they use dynamically allocated array internally. So when you resize a vector (when you use push_back) the array have to be re-allocated and the data moved.

Djoums · October 2, 2020, 10:55am

I modified your version to use an unordered_map instead of your own map, and sometimes (rarely but still) I get a timeout on the last test. So it probably comes down to what @TwoSteps wrote.

That aside, you’re using mutable strings in your code, heresy I say

Ezamey · October 3, 2020, 4:57pm

i can’t pass Correct extension cropping and Large dataset validators and i have no idea why. Pretty frustrating ^^

nevermind : multiple “.” doesnt matter.

rradarr · October 3, 2020, 8:25pm

hm, could you explain what do you mean and why is it bad?
Is it how I pass the string by reference to lower case it and modify it in the function instead of returning it?

d-s-x · October 5, 2020, 6:12am

I’ve solved the puzzle in plain C using vec-like structure (e.g. resizable array) to hold all mime types.
It works in poor O(Q*N) e.g, for every file name Q it executes linear search in the vec using strcmp() and calls printf(). No optimizations were needed as it never times out.