March 11, 2009

Identifying MIME using mime-util library


Mime-util is a very small, easy-to-use MIME detection library for Java. It can be used for any type of Java application and can detect MIME types from different sources like File, InputStream, URLConnection, byte array etc. Recently 1.3 version of the library was released and had a look at it.


The new release brings in some changes in packages by deprecating eu.medsea.util and bringing in eu.medsea.mimeutil package. The library still has the old package for backward compatibility. The MIME detection is based on the Unix magic mime files which is used by Unix file command. When using the library, it actually tries to access this file for detecting the MIME type. For other platforms, the library uses an internal copy of magic.mime file. This is available in eu.medsea.mimeutil

Working with the library is very simple. I tried out a simple swing application where the user will select a file using JFileChooser and the MIME type is detected and displayed. Assuming that you have got the absolute path of the file, here is my code:



FileInputStream fis = new FileInputStream(dataDisplayLabel.getText());
BufferedInputStream bis = new BufferedInputStream(fis);
MimeUtil mimeUtilObject = new MimeUtil();
log.warn("Support for Mark and Reset: " + fis.markSupported());
log.warn("Support for Mark and Reset: " + bis.markSupported());
log.info("Stream size: " + fis.available());
Collection coll = mimeUtilObject.getMimeTypes(bis);
log.info(coll.size());
Iterator itr = coll.iterator();
while(itr.hasNext()) {
MimeType mt = itr.next();
log.info("Media type: " + mt.getMediaType());
log.info("Sub Type: " + mt.getSubType());
}
One important point to note is that FileInputStream does not support mark and reset methods. For MIME detection, you will have to provide a InputStream that supports mark and reset methods. In this case, I have used BufferedInputStream and it is fed into getMimeTypes method. After detection of MIME type all the methods they return a collection. You will have to iterate this collection and get media type and sub type using separate APIs.

Even though this library will not be frequently used, it can be used for validation of files during upload or transfer. It is not always safe to check the file extension and proceed with your program logic. You can download mime-util library from sourceforge.

3 comments:

smcardle said...

Hi Abdel,

Thanks for this blog on mime-util. I appreciate that this blog only covers the InputStream method but I would like to point out a few more features that your readers may find interesting or helpful.

The 1.3 release of the utility has an extend able MimeDetector strategy and by default there are 2 pre-registered MimeDetector(s) available. The first of these is the ExtensionMimeDetector which uses property files to map file extensions to mime types. Users can add there own mappings to external files allowing them to add new mappings or override existing mappings provided by the utility. These are case sensitive mappings so the library knows the difference between MyClass.c and MyClass.C if bothe .c and .C are defined. This MimeDetector does not require an actual file to exist as it operates only off of the longest extension name i.e. it knows the difference between myfile.gz and myfile.tar.gz.

The second MimeDetector is the MagicMimeMimeDetector which is the MimeDetector your blog example talks about and will use because no name exists in an InputStream to map extensions against. As you rightly pointed out this MimeDetector uses the Unix file(1) magic.mime files if they exist or the internally available copy if not. Again users can create new magic rules that can add to or override the existing rules. Our implementation also allows an extension to the general position mapping of magic values so it's possible to match for "SOME VALUE" somewhere within say the first 250 bytes of the file.

There is a third MimeDetector shipped with the utility that uses the Opendesktop Shared MIME database. This MimeDetector can be switched on in your code just by constructing an instance as in new OpendesktopMimeDetect(); that's it.

Creating new MimeDetector(s) is a breeze, all you need do is extend the AbstratMimeDetector class and implement 3 abstract methods.

De-registering MimeDetector(s) is also trivial, so if you don't want the magic file matching then you can de-register this by name (the name is always the fully qualified class name).

The utility will iterate over each of the registered MimeDetector(s) and accumulate and normalise the results from each of them, these are then returned to the client as a Collection of MimeType objects.

Each MimeType instance has a specificity factor to help determine how specific this MimeType is. So if the returned Collection contains multiple MimeType objects and two of the MimeDetector(s) returned the same MimeType it's specifity value will be higher than the rest and is probably the MimeType you would want to use.

As the returned Collection is actually an instance of a MimeTypeHashSet you could cast the returned Collection to this type or simply create it as a first class object to start with i.e. MimeTypeHashSet mimeTypes = MimeUtil.getMimeTypes(...); and then use mimeTypes.getMostSpecific();. If more than one MimeType shares the highest specificity value then the first in the list is returned.

The MimeType class also has a toString() method so you do not need to call the getMediaType() or getSubType() methods on the MimeUtil class unless you need them, a simple mimetype.toString() will suffice. Also the MimeType class has these methods available as well.

TIP: If you want a comma separated String representation of all of the MimeType(s) in the returned Collection just call toString() on the Collection.

Lastly, ALWAYS remember, mime type mapping is a "best guess" algorithm. You should NEVER depend on mime types being 100% correct.

Regards

Steve

Abdel Olakara said...

Thanks Steve,

I agree, I have left out MimeDetector and have covered only InputStream method. This is just a inductionary article.

daniel said...

I figured I'd help some other people trying to figure this out with a much cleaner solution:
--
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.OpendesktopMimeDetector");

MimeType m = MimeUtil.getMostSpecificMimeType(MimeUtil.getMimeTypes(YOURFILE));

System.out.println(m.getMediaType());

MimeUtil.unregisterMimeDetector("eu.medsea.mimeutil.detector.OpendesktopMimeDetector");
--