Making Knowledge Work

March 13, 2010

3D 21st Century Taxonomies

Filed under: LIKE — Tags: , — virginiahenry @ 8:08 pm

Fran Alexander has an enviable talent for taking the terror out of taxonomy.  Her pre-dinner talk made LIKE 11 (our first anniversary meeting) a thoroughly enjoyable and enlightening event.

She began by explaining that people have been organising ideas and making lists for thousands of years. By the time of classical Greece, taxonomies were familiar things, developed from lists as a way of representing knowledge.

And people have been predicting the death of taxonomy for almost as long.  It hasn’t happened yet, as our minds tend to like things to be organised – they understand order and narrower relationships.  The notion of zooming in on something you want is familiar and instinctive.

Even though we have a world of Google and free-text search, classification is still really useful.  Fran said she often gets asked, and wakes up at 3 in the morning thinking, “why don’t we just bung Google on the lot and forget it”.  But there are some very good reasons not to.

For one thing, free-text search fails to help with knowledge discovery.  It’s great if you know what you’re looking for, but not great if you’re not sure and you want to understand what’s known in a field.  You can wander about following links, but won’t get a sense of the field.  You’ll get random pathways that can be very interesting, but won’t end up building a body of knowledge and an overview of a set of subject matter.

Anyone who’s wasted an afternoon following random links and not answering the question they were trying to answer will understand that.  You tend to miss obscure, but important, unexpected, non-commercial things.

Other problems include disambiguation, misspelling etc. Google has phenomenal synonym control and thesauruses underpinning its searches, but they can’t help in smaller domains where you don’t have everyone in the world doing searches to process those results.  And that ties in with the notion of ‘about-ness’.  When you’re doing a search and looking for specific words – you won’t necessarily find things that are about that topic, but don’t use those specific words.  It’s the same case with audio- visual assets – you need to get a sense of what the asset’s about, and all sorts of metadata might not capture that.  So you need some classification to help you.

Fran said the real killer for the BBC is comprehensiveness.  They’re expected to know everything they hold on a topic. So they can’t rely on a few keywords.

People ask her – “can’t you just do some folksonomy?    That’d be cheap – just free tag”.  But if folksonomies are going to be any use, you need to collapse them into taxonomies.  Because if you  get too many people tagging too many things with too many different viewpoints and too many different words you just end up with a lot of meaningless nothing.  Fran found it interesting that some people say “taxonomies represent a single viewpoint, folksonomies represent everybody’s viewpoint”   Her response was to ask what viewpoint is represented in a folksonomy?  And her answer: you don’t really know – there’s some kind of algorithm underneath it putting tags together and doing some disambiguation, and weighting the tags. There are assumptions the software developers have made…….. and you don’t really know what’s going on.  At least with a library classification you can see what it is and how it works.  If it’s got a western bias, you can see that.  If it was written for lawyers – you can see that.

But, she said, folksonomies are tremendously useful in helping us keep our taxonomies flexible.  We’re not in the rigid fixed world that classificationists of the 19th and early 20th centuries were faced with.  Back then there was an assumption that you could build a classification having a sense of who your users were.  They didn’t talk in terms of usability and findability as we do now.  But classificationists like Cutter and Bliss were very interested in how people used libraries, how they looked for things in different ways – and how you could meet those different kinds of needs.  But they had an assumption that you could have an answer to that: you could set up your classification and it would be stable.  And they were more or less right, because those systems stayed pretty stable for a long time, for all sorts of social and political and technological reasons.  If you spend a great deal of time and effort building a classification – say back in the 1950s, using pieces of paper and cards and writing on your books and so on.  You weren’t going to say – “ooh, not sure we did that bit of Biochemistry right, let’s go and reclassify all our books”.  So classifications tended to be left alone: it was easier to get humans to understand the classification and adapt to that.

Nowadays, Fran said, there’s no reason why you couldn’t have ten pathways to the same digital asset.  And there’s no reason not to quickly put another tag onto it.  Because the digital world is a totally different environment.  Users are more demanding now; people do tend to be fickle – they want to use the terms they understand; they want to pick up terminology quickly and they want us to react to it.  The old-style linear project planning was great – when you could say “I know who my users are, I know how they’re going to behave I can go out and do my traditional requirements gathering, tick all my boxes and set up my system, and it’ll stay stable”.  Businesses like you to do that, IT people love it – because you can fix your costs, set your parameters and say what you’ll do. You assume the world is going to stay the same and stable.  But with big projects, that doesn’t work.  Things change so much between the point where you do your requirements gathering and the point of delivery – you’re almost not delivering the same thing any more.  A nightmare for the finance people and the suppliers, because how do you cost something that’s constantly changing? And it’s a problem for us – how do we go about building a taxonomy today that’s going to be relevant in five years time?  It’s very hard, said Fran, but there is a way.  And that is to stop thinking of taxonomies as fixed classifications, but as organic and open entities, that need to grow and change.  One of the best ways we can make our taxonomies dynamic and open is to look at how we link them up with other taxonomies.  Once you start to think of your taxonomy not as a thing in its own right, that sits in a silo, that represents your knowledge, your view, your opinion, but look at it more like an application, an open port into your content repository, as a navigation method into your content, rather than a fixed thing in its own right – then you can open it up to other taxonomies.  So you can get round the problems of “this is a taxonomy for lawyers, and this is a taxonomy for salespeople.  This one is for marketing, and this for schoolchildren”.  Because what you do is take all these taxonomies and look at a mapping methodology – you look at how you can map them together.  By opening up your taxonomy, you immediately increase its range and the number of viewpoints it can serve.  And, she said, it also means that a trendy new technical taxonomy, some new terminology or your folksonomic terms can be harvested and bolted on to your main taxonomy.  So you’re not faced with major revisions.  You look at a link point, a route in – your taxonomy’s open, so you can fix other bits on to it.  Like a Meccano model of taxonomy.

This means your building starts to become a dynamic process, because as you bolt bits on they will inform how your main taxonomy is working.   So through the mapping process you can improve areas of weakness in your main taxonomy, responding to user needs, because you’re bolting on bits that are popular and not worrying about bits that are less popular.  You can create a very dynamic and exciting search experience for people that way, because you give them different routes in, different options. You can even allow them to navigate around their folksonomic tag clouds that sit around your main taxonomy – opening up the 3-D navigation by setting up all sorts of relationships through your content repository, and looking at it in all sorts of different ways.

Fran said that the semantic web and linked data technology is starting to be really useful in this area.  The basic principle is that semantic web and linked data languages – such as RDF, OWL and SKOS give you a way of expressing your taxonomy.

Basically it’s the computer coded bit, like XML, that sit around your taxonomy and mean that if your taxonomy’s expressed in SKOS format and so is someone else’s, all sorts of automated mapping can happen programmatically.  It takes a lot of the heavy lifting out of the taxonomy mapping process, so the idea of mapping two big data sets together becomes much more practicable.

The reason Fran doesn’t think the semantic web will lead us into one great unified consciousness is the amount of negotiation to be done.  Data sets need to match up; agreement on metadata standards is needed.  But, she explained, semantic web linked data is working really well in domains like the biological sciences:  if someone’s doing experiments on fruit flies they can use data from someone else doing experiments on fruit flies.

In organisations there’s a lot potential for this to have great benefits.  In the BBC they’re looking at taking the archive taxonomy and expressing it in a linked data format, to interact with people who are doing the public facing website navigation.  They’d then be able to do is pull in resources from the archive very easily, using their own terminology and their own web navigation systems and links.

Thinking in this way, Fran said, it quickly becomes obvious that you can surround your taxonomy with ontologies as well.  Ontologies are made of lots of taxonomies joined together, so the ontology fits into the semantic web world and fits into taxonomies, because it provides horizontal navigation between bits of your taxonomy.  It means you can dive off in all sorts of directions.  Which is tremendously exciting and we couldn’t really do with our old-style card classification systems because the number of cards we would have needed would have been unthinkable.

Fran gave an example of a really exciting project using linked data.  The Europeana project is creating cross-navigation of all sorts of cultural artefacts in museums, libraries and archives throughout Europe.  By mapping their taxonomies together they’re creating a single user-interface into all this data, immediately opening up all sorts of possibilities for researchers.  And the rest of us…..

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: