It seems that your hash map is simply not optimized.
With C++, using a hash map from the standard library, like std::map, allow to pass widely all the tests.
Honestly Iâm not sure how would I optimize it further. From what Iâve read, when using chaining a load factor of 0.6 is okay and I donât know where the problem lies.
Hard to tell without any info, but youâre doing something wrong. I checked my old javascript solution that uses a map too, and the last test poses no problem.
To be totally honest, I never tried to implement a hash map by myself, the native ones always suited me until now⌠I simply notice that others implementations than yours pass widely the largest tests. But I difficultly can help you with your implementation.
What info would you like? I donât know if I can just start pasting code here, Iâm pretty new to CG.
My main structure is a vector<vector<vector<string>>> hash_table (the second vectors are the buckets or chains and the third vector is to store the extension and media type). (BTW is there a cleaner way of doing such vectors? Beacuse this does look a bit nightmarish)
Iâm using djb2 as the hashing function but Iâve changed the bit shift from 5 to 7 because it gives a better load factor in this case.
When adding an element I use the extension as the key and I push back the extension and media type (MT) into a vector (a bucket) at the index provided by the hashing function.
When looking up an element I again use the extension to get an index and then linearly search the bucket at that index for a matching extension. The linear serach here sounds bad, but with a load factor of 0.6 and max bucket size of 6 it canât be that bad.
Hi again !
I just tried a rapid implem of a hash table in C++ with djb2 and it pass all the tests and validators.
I donât know how you have coded it, but I donât use vectors. A struct for the extension and the MIME type, and a pointer cause I represent the buckets with chained lists, and a fixed size array to contain all of that.
Last time I made a map (a long time ago ) I remember using one way linked lists rather than vectors for buckets, that way you donât get that triple vectors chain and save some memory. But with the amount of data from this puzzle that shouldnât change much tbh.
You can use pastebin or any other similar site to post a link to your code here.
Thank you both for your attention Here is my solution.
Iâm very curious whatâs slowing this program down so much.
I considered a simple singly linked list for the buckets but I figured that it probably wouldnât be more optimized than vectors already are so I didnât change them.
This puzzle has been causing issues with C and C++ users for a while now. Not sure yet whatâs the root issue. Weâve discussed this with a few coders who had solved it a long time ago and whose code was no longer working. They mentioned to output the answer once instead of doing multiple couts, but it seems thatâs what youâre doing here. I donât know
I already have this problem, but when it happens launching the test again can be enough to pass. Here the big test never pass.
@rradarr : vectors can be a problem, cause they use dynamically allocated array internally. So when you resize a vector (when you use push_back) the array have to be re-allocated and the data moved.
I modified your version to use an unordered_map instead of your own map, and sometimes (rarely but still) I get a timeout on the last test. So it probably comes down to what @TwoSteps wrote.
That aside, youâre using mutable strings in your code, heresy I say
i canât pass Correct extension cropping and Large dataset validators and i have no idea why. Pretty frustrating ^^
nevermind : multiple â.â doesnt matter.
hm, could you explain what do you mean and why is it bad?
Is it how I pass the string by reference to lower case it and modify it in the function instead of returning it?
Iâve solved the puzzle in plain C using vec-like structure (e.g. resizable array) to hold all mime types.
It works in poor O(Q*N)
e.g, for every file name Q
it executes linear search in the vec using strcmp()
and calls printf()
. No optimizations were needed as it never times out.
Yes I was referring to your to_lower method.
Strings have been considered immutable data for a long time now, and most languages wonât allow you to bypass that restriction. A few of them remain, like C/C++, but itâs generally considered bad practice.
I dont get it⌠i pass every test except test 3.
It expect an .wav needs to be UNKNOWN
Standard Output Stream:
array (
âwavâ => âaudio/x-wavâ,
âmp3â => âaudio/mpegâ,
âpdfâ => âapplication/pdfâ,
)
âaâ
âa.wavâ
audio/x-wav
âb.wav.tmpâ
UNKNOWN
âtest.vmp3â
UNKNOWN
âpdfâ
â.pdfâ
application/pdf
âmp3â
âreportâŚpdfâ
application/pdf
âdefaultwavâ
â.mp3.â
UNKNOWN
âfinal.â
UNKNOWN
Failure
Found:
audio/x-wavâŚ
Expected:
UNKNOWN
Edit: I found the problem! i didnt set Unknown for the strings without a dot
I think there is a bug in this problem. i am coding in vb.net and in french.
The last test fail and say that a line require a âapplication/pdfâ result.
But there is no pdf file in the supplied dataset (neither pfd).
edit: sorry, my bad. i did not know how to check properly the dataset. and i found my mistake.
I see pfd files in the datasetâŚ
Sooo is there an issue when it comes to C++ and the large dataset test? I fail because it takes too much processing time, however I can submit my solution and get 100%. Not sure how to make it more efficient. I used a map for the extensions and a vector for the files, and passed all other tests. I also tried only putting the extension to lower too. Any idea what could be taking up all the processing time?
Yes, itâs a known problem. Donât bother about it if you can pass the validators.