WebPagetest Forums

Full Version: Auto-detect the correct MIME type regardless of file extension or response mime-type
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
This might be a corner case to start with but nevertheless would be good to identify mismatches of file extension or mime type with that of a true file type.


If you go http://www.httparchive.org/viewsite.php?pageid=9285021 and look at image requests by format you would conclude that most of this site would be png. Since WPT powers that data lets look at their real waterfall and inspecting response mime types which tell us again that its image/png

http://www.webpagetest.org/result/130628...1/details/

However if you download one of those files (say /department-bucket-folder/department_buckets/6-1-69-1-480x480-1370359903.png ) and use "file" command to really look the content of the file you shall see it is a JPEG

localhost:global pganti$ file test.png
test.png: JPEG image data, JFIF standard 1.01

You might just even look at strings on the file and still guess it is JPG

The same is true of other images. Most likely it might be caused by https://drupal.org/node/568772 or some variant wherein the content owners just use a convenient extension that bears no relation to the content type.

My final ask here is that how can we catch these kinds of mismatches in WPT so that when we can flag such content types independent of the mime/file extensions? Would it be helpful or too much to do?
(06-29-2013 07:44 AM)pganti Wrote: [ -> ]This might be a corner case to start with but nevertheless would be good to identify mismatches of file extension or mime type with that of a true file type.


If you go http://www.httparchive.org/viewsite.php?pageid=9285021 and look at image requests by format you would conclude that most of this site would be png. Since WPT powers that data lets look at their real waterfall and inspecting response mime types which tell us again that its image/png

http://www.webpagetest.org/result/130628...1/details/

However if you download one of those files (say /department-bucket-folder/department_buckets/6-1-69-1-480x480-1370359903.png ) and use "file" command to really look the content of the file you shall see it is a JPEG

localhost:global pganti$ file test.png
test.png: JPEG image data, JFIF standard 1.01

You might just even look at strings on the file and still guess it is JPG

The same is true of other images. Most likely it might be caused by https://drupal.org/node/568772 or some variant wherein the content owners just use a convenient extension that bears no relation to the content type.

My final ask here is that how can we catch these kinds of mismatches in WPT so that when we can flag such content types independent of the mime/file extensions? Would it be helpful or too much to do?

For images it is reasonably easy to detect the correct image format by looking at the first 2 bytes of the response (Gif, Png and jpeg all have unique signatures). It wouldn't be too hard to add another field that tracks the actual image type (though it's more of a feature request than a bug ;-) ).

Images are probably the only file type where auto-detection is feasible though.
Thank you

(07-02-2013 10:56 AM)pmeenan Wrote: [ -> ]Images are probably the only file type where auto-detection is feasible though.

Agreed. In general other than images anything else that can have a standard format with specific magic headers can be auto-detected, right?
In theory - though that really only opens up flash or video. None of the text resources (js, css, html) would be auto-detectable.
Reference URL's