Thursday, June 05, 2014

Why is eBook Metadata So Hard?


Today many continue to pour the physical book content into the digital container and believe that they have addressed the digital opportunity. However many have also adopted the same approach to contextural data, metadata, bibliographic, burbs etc. It’s as if we think that what works in the physical world is equally applicable in the digital one.

Is this blind faith approach down to a conscious decision to do as little as necessary and incur as little cost as possible, or is it a fundamental misunderstanding of the digital environment? Some would suggest that it’s like continuing to distribute AI sheets like confetti via a fax machine in the email and internet age.

Way back in the eBook dark ages (the late ninties) we were part of the highly acclaimed publishing research series edited by Mark Bide and Mike Shatzkin, ‘Publishing in the 21st Century’ and one of the reports we produced was simply titled ‘From N to X’ and was about the growing importance of Context (metadata) over Content. What was clear then was that discoverability was as important, if not more important than the content and finding that digital needle in the Internet haystack was going to be a challenge to all.

Interestingly, the industry choose to retain their physical taxonomy of classification and apply restrictions often that only applied to the physical world. It as if all we could see was physical shelves and we had to place the titles thus. ONIX which did much to provide structure for the physical B2B supply chain started to become increasingly irrelevant in the B2C ebook supply chain. The major ebook retailers all adopted their own classification taxonomies some permitting many genre classifications others a few. Keywords became important but the response was varied. The important blurb was often just the physical one again merely poured into the digital description. Some indexed the first chapter, others the whole book, but searching index words often then became highly subjective and often irrelevant to the intended search. As seen with some generic search engines today even your individual profile may present different search results to others.

Some battled with the semantic tagging of the likes of illustrations. Some sectors recognised the importance of citations, references and continued to support those established to manage these in the physical world.
The point is that we have a very rich bank of potential information that can aid discoverability and qualification. The richest source of content metadata being in the book itself. Some 95% of all contextual data is available via the book with the remaining being often available via hyperlinks.

The other source of information comes from our own profiles on our tastes, preferences and habits. Some will say that one of the reasons why social book sites work well is that they aid discoverability, recommendations and thereby add context. Its little surprise therefore that Amazon bought Goodreads and have just launched their Twitter relationship. Book recommending social services have mushroomed but  have they actually increased sales to the same proportion or are they just an extension of today’s ‘me too’ social dating society?

We therefore need to rethink what we give away to promote, aid discoverability and add value to drive sales. Merely to continue to pour the same physical information into the digital boxes is not going to work. Equally waiting for the white knight to provide the solution may well now carry a health warning.


No comments: