MIME Type puzzle discussion

MrMoguro · April 22, 2018, 1:05am

The OS indeed does care about how you name your files. As you mentioned on linux based systems filenames starting with a dot are hidden for example .config

But on Windows such a filename wouldn’t be allowed because it needs at least a basefilename. The file extension is the indicator with which program the file should be opened. If the file extension is missing Windows asks every time what program should be used to open the file.

On linux based systems files don’t need a file extension because the system reads the file header and decides then what program should be used to open it.

Not to mention that some special characters are not allowed in filenames, depending on the OS.

I still believe that a .txt filename for example in this puzzle shouldn’t be identified as text/plain MIME-Type because of the reasons I metioned above. That is not how it works. If somebody can convince me that this works, then I said nothing.

chr-m · April 22, 2018, 1:16pm

You don’t get the distinction between the OS and the filemanager.

MrMoguro · April 22, 2018, 1:48pm

I’ve looked trough the topic and I have found that you actually can create a file only with a dot and file extension. That changes everything. Sorry for being stupid. Should have searched trough the topic before I complain. Thanks chr-m for the discussion!

chr-m · April 23, 2018, 10:15pm

I just realized something. In the article you linked they mention the Linux tool “mimetype” (see the sixth image). That is literally the program you are implementing in this puzzle. You can use it from the command line like this:

chrm@chrm-desktop[~]$ mkdir test

chrm@chrm-desktop[~]$ cd test/

chrm@chrm-desktop[~]$ touch image image.jpg image.jpg.png html text.html

chrm@chrm-desktop[~/test]$ ls -lasi
total 8
1980150 4 drwxrwxr-x  2 chrm chrm 4096 Apr 23 23:59 .
1444543 4 drwxr-xr-x 37 chrm chrm 4096 Apr 23 23:59 ..
1459634 0 -rw-rw-r--  1 chrm chrm    0 Apr 23 23:58 html
1459466 0 -rw-rw-r--  1 chrm chrm    0 Apr 23 23:58 image
1459629 0 -rw-rw-r--  1 chrm chrm    0 Apr 23 23:58 image.jpg
1459631 0 -rw-rw-r--  1 chrm chrm    0 Apr 23 23:58 image.jpg.png
1459635 0 -rw-rw-r--  1 chrm chrm    0 Apr 23 23:58 text.html

chrm@chrm-desktop[~/test]$ mimetype -b *
text/plain
text/plain
image/jpeg
image/png
text/html

MrMoguro · April 24, 2018, 2:04pm

Would a file named .txt under Linux seen as a text file or because of the dot as a hidden file? Or maybe as a hidden text file?

eulerscheZahl · April 24, 2018, 2:26pm

eulerschezahl@euler-PC:~$ touch .txt
eulerschezahl@euler-PC:~$ mimetype -b .txt
text/plain

MrMoguro · April 24, 2018, 3:21pm

Pretty interesting.

@chr-m Files without extensions will be identified as text files? Or did you create text files and truncated the extension?

nicola · April 24, 2018, 3:45pm

In Linux, you can identify files with mimetype (that reads the extension) or with file (that reads the beginning of the file).

DavidRENAUD · May 12, 2018, 5:31pm

Bonjour,
J’ai un souci de validation du test 5.
Sauf erreur de ma part, le test échoue sur le fichier “.ico” dont mon code retourne “image/vn…” alors que le test attends “UNKNOWN”.
Pourtant dans le test 3, le fichier “.pdf” est accepté en “application/pdf”.

Est-ce normal ? ou mon code est faux ?

Merci d’avance.
David.

random_acts · July 16, 2018, 12:32am

There is definitely something wrong with the ‘Limit size in extensions’ validation test in C (most people who have encountered this are using C)

Only use the last ‘.’ - strrchr()
No extension is UNKNOWN.
None of the test cases have an extension longer than 10, but I still only hash at most 10 characters.
Then I tried 9 characters to account for the null terminator.
Then I tried 5 characters for fun.

The only way I could see validating this is to give:
file.verylongextension1 and file.verylongextension2
then both map to the mime-type associated with extension ‘verylongex’
Which my program will do.

I wondered if they tried to register verylongextension1 and verylongextension2 to different mime-types? Which would cause an ambiguous case. Do we return the first association, the second, neither, both? My guess is the second because that’s how most built in hash maps work (which C does not have).

Nope. that didn’t work either…

Why would you not provide a test for this validation case? Or allow us to see the validation?
Still 95% - moving on from this busted puzzle.

Rolf · July 16, 2018, 3:28pm

I did it in C. And is a easy puzzle. So use stderr and print all you are doing.

Asto · July 20, 2018, 3:09pm

Looks like this puzzle has not full\incorrect specifications. Rules say: “The extension of a file is defined as the substring which follows the last occurrence, if any, of the dot character within the file name.”. That is literally the only definition.
Now let’s look at test case 3.
So, if the reference table says: “pdf application/pdf”, then the correct answer to line “report…pdf” is “pdf application/pdf”. But here is the responce to this:
Failure
Found: applicat…
Expected: UNKNOWN

Please fix.

Rolf · July 20, 2018, 3:22pm

a -> UNKNOWN
a.wav -> audio/x-wav
b.wav.tmp -> UNKNOWN
test.vmp3 -> UNKNOWN
pdf -> UNKNOWN
.pdf -> application/pdf
mp3 -> UNKNOWN
report…pdf -> application/pdf
defaultwav -> UNKNOWN
.mp3. -> UNKNOWN
final. -> UNKNOWN

Wishmaster_89 · July 30, 2018, 7:29am

The answer to report…pdf in test case 3 is application/pdf which is correct according to de definition of the problem

gregpineau · July 30, 2018, 8:51pm

Hello i have a little problem with the test 5 : every test is succes but there is one error in this test but I cant find the problem in the 9999 entry…

VaseSimion · August 16, 2018, 9:52am

Thanks a lot. Last year I stopped at this because it annoyed me and using hash was too much work. Now I solved it in 20 minutes using dictionary (last test case with big data set).

Chfou · August 28, 2018, 10:00am

My solution passed all the tests but if one of the test had a filed named mp3.mp3 it would have failed (i check if there is no dot by comparing if the first and last strings of a split(’.’) are the same.
Maybe add this case in one of the test.

kodetzaile · September 16, 2018, 8:45pm

Why is the correct solution to create a map with both all lowercase and all uppercase? If the right way to write each mime-type is only one (*) but I have to accept different cases, why is it wrong to accept them all (all lowercase or ALL UPPERCASE or WhATeveR)? Is there any real-world case that justifies this?

(*) It seems that the correct way is all lowercase (but it’s not accepted as a solution here):

etc

kodetzaile · September 16, 2018, 9:49pm

OK, found a solution that makes sense to me.

On one side we have a database/program that doesn’t follow the all lowercase convention but expects to receive information exactly as given to us (Let’s assume it’s in whaTEeVEr case), and on the other side we have the user (and we must accept any case used). To make them understand I have used two maps/dictionaries, one internal (mime as given to me to mime in lowercase) and one external (mime in lowercase to extension in lowercase).

Ozyx · October 23, 2018, 7:16pm

Impossible to validate large datasets with C. I’m using an array of struct with key/value and checking them one by one. Is this supposed to work or do you have to use a hash table? The difficulty spike compared to previous ones is a bit crazy…