media-annotations — ginger's thoughts

Recently, I was asked to review the W3C Media Annotations specifications as they are about to go into Last Call (a state that comes before the request for implementations at the W3C).

The W3C Media Annotations group has defined a set of metadata that they believe is representative and common for media resources. The ontology consist of the following fields:

ma:identifier: a URI or string to identify a resource
ma:title: a string providing the title of the resource
ma:language: a language code describing the language used in the resource
ma:locator: the URI at which the resource can be accessed
ma:contributor: a URI or string identifying the contributor and the nature of the contribution
ma:creator: a URI or string identifying an author
ma:createDate: a date of creation or publication of the resource
ma:location: a string or geo code identifying where the resource has been shot/recorded
ma:description: a string describing the content of the resource
ma:keyword: a word or word combination providing a topic, keyword or tag representing the resource
ma:genre: a string providing the genre of the resource
ma:rating: rating value, including the rating scale
ma:relation: a URI and string identifying a related resource and the relationship
ma:collection: a URI or string providing the name of a collection to which the resource belongs
ma:copyright: a URI or string with the copyright statement.
ma:license: a string or URI with the usage license
ma:publisher: a string or URI with the publisher of the resource
ma:targetAudience: a URI and classification string providing the issuer of the classification and the classification value
ma:fragments: a list of string and URI values that identify media fragments and their type
ma:namedFragments: a list of string and URI values the provide names to media fragments
ma:frameSize: a width - height pair in pixels
ma:compression: a string providing the compression algorithm
ma:duration: a float to provide the resource duration in seconds
ma:format String: the mime type of the resource
ma:samplingrate: a float with the audio sampling rate
ma:framerate: a float with the video frame rate
ma:bitrate: a float providing the average bit rate in kbps
ma:numTracks: an int of the number of tracks

Note that some of these fields are not single values, but simple constructs of multiple values. Thus, they are actually more complex than name-value pairs that, e.g. are typically used in HTML meta headers or in Dublin Core. I regard this as an issue for implementations.

The fields were chosen as typical metadata being available about media resources. The media fragments fields are a bit dubious in this respect, but could be useful in future.

The metadata is determined either from within the resource itself or from a metadata collection about the resource. As such, the document maps several existing metadata and media resource formats to this interface, amongst them:

As they didn’t have a mapping table for Ogg content, I offered the following:

MAWG	Relation	Ogg properties	How to do the mapping	Datatype
Descriptive Properties (Core Set)
Identification
ma:identifier	exact	Name	Name field in skeleton header (new)	String
ma:title	exact	Title	TITLE field in vorbiscomment header	String
	exact	Title	Title field in skeleton header (new)	String
	related	Album	ALBUM title in vorbiscomment header	String
ma:language	exact	Language	Language field in skeleton header (new)	language code
ma:locator	exact		file URI from system	URI
Creation
ma:contributor	exact	Artist, Performer	ARTIST and PERFORMER vorbiscomment headers	Strings
ma:creator	related	Organization	ORGANIZATION field in vorbiscomment header
ma:createDate	exact	Date	DATE field in vorbiscomment header	ISO date format
ma:location	exact	Location	LOCATION field in vorbiscomment header	String
Content description
ma:description	exact	Description	DESCRIPTION field in vorbiscomment header	String
ma:keyword	N/A
ma:genre	exact	Genre	GENRE field in vorbiscomment header	String
ma:rating	N/A
Relational
ma:relation	related	Version, Tracknumber	VERSION (version of a title), TRACKNUMBER (CD track) fields in vorbiscomment header	Strings
ma:collection	related	Album	ALBUM field of vorbiscomment header	String
Rights
ma:copyright	exact	Copyright	COPYRIGHT field of vorbiscomment header	String
ma:license	exact	License	LICENSE field of vorbiscomment header	String
Distribution
ma:publisher	related	Organization	ORGNIZATION field of vorbiscomment header	String
ma:targetAudience	more specific	Role	Role field of Skeleton header (new)	String
Fragments
ma:fragments	N/A
ma:namedFragments	N/A
Technical Properties
ma:frameSize	exact		extract from binary header of video track	int, int (width x height)
ma:compression	exact	Content-type	Content-type field of Skeleton header	MIME type
ma:duration	exact		calculate as duration = last_sample_time - first_sample_time of OggIndex header of skeleton	Float (or rather: rational - rational)
ma:format	exact	Content-type	Content-type field of Skeleton header	MIME type
ma:samplingrate	exact		calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header	Rational (or rather int / int)
ma:framerate	exact		calculate as granulerate = granulerate_numerator / granulerate_denominator of Skeleton header	Rational (or rather int / int)
ma:bitrate	exact		calculate as bitrate = length_of_segment / duration from OggIndex headers of skeleton	Float
ma:numTracks	exact	Tracknumber	TRACKNUMBER field of vorbiscomment header (track number on album)	Int

You will notice that the table mentions 4 fields in skeleton with a “new” marker - they are actually proposed fields in skeleton - a bit of coding will be necessary to introduce them into software. The space for these fields already exists in message header fields, so it won’t require a change of the skeleton format.

In the second specification of the Media Annotations WG, the group offers a standard API to access (i.e. read) the defined fields. They also intend to create an API to write the fields, but I doubt that will be easy because of the vast amount of file types they intend to support.

There is basically a single function that allows the extraction of metadata: MAObject[] getProperty(in DOMString propertyName, in optional DOMString sourceFormat, in optional DOMString subtype, in optional DOMString language, in optional DOMString fragment );

I proposed it may be possible to include this into HTML5 as follows: interface HTMLMediaElement : HTMLElement { ... getter MAObject getProperty(in DOMString propertyName, in optional unsigned long trackIndex); ... }

This would either extract the property for a particular track in a media resource or for the complete resource if no track index is given. The only problem I see is that the returned object is different depending on the requested property - the MAObject is only a parent class for the returned object types. I am not sure it is therefore possible to specify this easily in HTML5.

Overall I thought the specification was a nice piece of work. I am not sure I agree with all the chosen fields, but that is always an issue with metadata. The most important fields are there and that’s what matters.

I spent last week in France, near Cannes, at the W3C TPAC meeting. This is the one big meeting that the W3C has every year to bring together all (or most) of the technical working groups and other active groups at the W3C.

It was not my first time at a standards body meeting - I have been part of ISO/MPEG before and also of IETF, and spoken with people at IEEE and SMPTE. However, this time was different. I felt like I was with people that spoke my language. I also felt like my experience was valued and will help solving some of the future challenges for the Web. I am very excited to be an invited expert on the Media Fragments and Media Annotations working groups and be able to provide input into HTML5.

In the Media Fragments working group we are developing a URI addressing scheme that enables direct linking to media fragments, in particular temporal and spatial segments. Experience from our earlier temporal URI scheme is one of the inputs to the scheme. Currently it looks likely that we will choose a scheme that has ”#” in it and then require changes to browsers, Web proxys, and servers to enable delivery of media fragments.

In the Media Annotations working group we are deciding upon an ontology to generically describe media resources - something based on Dublin Core but more extended and more appropriate for audio and video. We are currently looking at Adobe’s XMP specification.

As for HTML5 - there was not much of a discussion at the TPAC meeting about the audio and video elements (unless I missed it by attending the other groups). However, from some of the discussions it became clear to me that they are still in very early stage of specification and much can be done to help define the general architecture of how to publish video on the Web and its metadata, help define javascript APIs and DOM models, and help define accessibility.

I actually gave a lightning talk about the next challenges of HTML5 video at TPAC (see my “video slides”) which points out the need for standard definitions of video structure and annotations together with an API to reach them. I had lots of discussions with people afterwards and also learnt a lot more about how to do accessibility for Web video. I should really write it up in an article…

Of course, I also met a lot of cool people at TPAC, amongst them Larry Masinter, Ian Hickson, and Tim Berners-Lee - past and new heros of Web standards. :-) It was totally awesome and I am very grateful to Mozilla for sending me there and enabling me to learn more about the greater picture of video accessibility and the role it plays on the Web.

ginger's thoughts

Tag: media-annotations

W3C Media Annotations API standard

W3C Technical Plenary / Advisory Committee Meetings Week 2008