MIME Type puzzle discussion

anon72424297 · October 1, 2020, 6:17pm

It seems that your hash map is simply not optimized.
With C++, using a hash map from the standard library, like std::map, allow to pass widely all the tests.

rradarr · October 1, 2020, 6:33pm

Honestly I’m not sure how would I optimize it further. From what I’ve read, when using chaining a load factor of 0.6 is okay and I don’t know where the problem lies.

Djoums · October 1, 2020, 7:17pm

Hard to tell without any info, but you’re doing something wrong. I checked my old javascript solution that uses a map too, and the last test poses no problem.

anon72424297 · October 1, 2020, 7:35pm

To be totally honest, I never tried to implement a hash map by myself, the native ones always suited me until now… I simply notice that others implementations than yours pass widely the largest tests. But I difficultly can help you with your implementation.

rradarr · October 1, 2020, 7:47pm

What info would you like? I don’t know if I can just start pasting code here, I’m pretty new to CG.
My main structure is a vector<vector<vector<string>>> hash_table (the second vectors are the buckets or chains and the third vector is to store the extension and media type). (BTW is there a cleaner way of doing such vectors? Beacuse this does look a bit nightmarish)
I’m using djb2 as the hashing function but I’ve changed the bit shift from 5 to 7 because it gives a better load factor in this case.
When adding an element I use the extension as the key and I push back the extension and media type (MT) into a vector (a bucket) at the index provided by the hashing function.
When looking up an element I again use the extension to get an index and then linearly search the bucket at that index for a matching extension. The linear serach here sounds bad, but with a load factor of 0.6 and max bucket size of 6 it can’t be that bad.

anon72424297 · October 1, 2020, 11:04pm

Hi again !
I just tried a rapid implem of a hash table in C++ with djb2 and it pass all the tests and validators.
I don’t know how you have coded it, but I don’t use vectors. A struct for the extension and the MIME type, and a pointer cause I represent the buckets with chained lists, and a fixed size array to contain all of that.

Djoums · October 2, 2020, 12:00am

Last time I made a map (a long time ago ) I remember using one way linked lists rather than vectors for buckets, that way you don’t get that triple vectors chain and save some memory. But with the amount of data from this puzzle that shouldn’t change much tbh.
You can use pastebin or any other similar site to post a link to your code here.

rradarr · October 2, 2020, 8:58am

Thank you both for your attention Here is my solution.
I’m very curious what’s slowing this program down so much.
I considered a simple singly linked list for the buckets but I figured that it probably wouldn’t be more optimized than vectors already are so I didn’t change them.

TwoSteps · October 2, 2020, 10:38am

This puzzle has been causing issues with C and C++ users for a while now. Not sure yet what’s the root issue. We’ve discussed this with a few coders who had solved it a long time ago and whose code was no longer working. They mentioned to output the answer once instead of doing multiple couts, but it seems that’s what you’re doing here. I don’t know

anon72424297 · October 2, 2020, 10:46am

I already have this problem, but when it happens launching the test again can be enough to pass. Here the big test never pass.
@rradarr : vectors can be a problem, cause they use dynamically allocated array internally. So when you resize a vector (when you use push_back) the array have to be re-allocated and the data moved.

Djoums · October 2, 2020, 10:55am

I modified your version to use an unordered_map instead of your own map, and sometimes (rarely but still) I get a timeout on the last test. So it probably comes down to what @TwoSteps wrote.

That aside, you’re using mutable strings in your code, heresy I say

Ezamey · October 3, 2020, 4:57pm

i can’t pass Correct extension cropping and Large dataset validators and i have no idea why. Pretty frustrating ^^

nevermind : multiple “.” doesnt matter.

rradarr · October 3, 2020, 8:25pm

hm, could you explain what do you mean and why is it bad?
Is it how I pass the string by reference to lower case it and modify it in the function instead of returning it?

d-s-x · October 5, 2020, 6:12am

I’ve solved the puzzle in plain C using vec-like structure (e.g. resizable array) to hold all mime types.
It works in poor O(Q*N) e.g, for every file name Q it executes linear search in the vec using strcmp() and calls printf(). No optimizations were needed as it never times out.

Djoums · October 5, 2020, 6:49am

Yes I was referring to your to_lower method.
Strings have been considered immutable data for a long time now, and most languages won’t allow you to bypass that restriction. A few of them remain, like C/C++, but it’s generally considered bad practice.

Downdijk · October 11, 2020, 7:23pm

I dont get it… i pass every test except test 3.

It expect an .wav needs to be UNKNOWN

Standard Output Stream:

array (
‘wav’ => ‘audio/x-wav’,
‘mp3’ => ‘audio/mpeg’,
‘pdf’ => ‘application/pdf’,
)

‘a’
‘a.wav’
audio/x-wav
‘b.wav.tmp’
UNKNOWN
‘test.vmp3’
UNKNOWN
‘pdf’
‘.pdf’
application/pdf
‘mp3’
‘report…pdf’
application/pdf
‘defaultwav’
‘.mp3.’
UNKNOWN
‘final.’
UNKNOWN

Failure

Found:
audio/x-wav…
Expected:
UNKNOWN

Edit: I found the problem! i didnt set Unknown for the strings without a dot

Magellus · October 20, 2020, 4:47pm

I think there is a bug in this problem. i am coding in vb.net and in french.
The last test fail and say that a line require a “application/pdf” result.
But there is no pdf file in the supplied dataset (neither pfd).

edit: sorry, my bad. i did not know how to check properly the dataset. and i found my mistake.

anon72424297 · October 20, 2020, 4:57pm

I see pfd files in the dataset…

Cassiopeia_the_Great · October 21, 2020, 12:55am

Sooo is there an issue when it comes to C++ and the large dataset test? I fail because it takes too much processing time, however I can submit my solution and get 100%. Not sure how to make it more efficient. I used a map for the extensions and a vector for the files, and passed all other tests. I also tried only putting the extension to lower too. Any idea what could be taking up all the processing time?

anon72424297 · October 21, 2020, 1:41pm

Yes, it’s a known problem. Don’t bother about it if you can pass the validators.