Solving Tag Pollution

Taxonomy has a big advantage over folksonomy, it’s immutable and therefore prone to fewer errors. But folksonomy has a big advantage over taxonomy, it’s personal and flexible and therefore has context for the user. This is an attempt to mate the two systems producing an offspring with the best genes of each.

Contextualizing Tags

If we analyze the tags associated with a piece of content we can start to categorize them depending on their context. By breaking things down to Global, Personal and System Tags we can make sure that only the tags relevant to a particular context get used in that context.

Global Tags

By setting a threshold for the number of people who have tagged something with a word or phrase before it can enter the global tag pool, we get an automatic system for filtering out errors and Personal Tags. In the diagram below, two people have to tag the item with the same phrase in order for it to enter the global pool. Brian’s typo tag [2.0] doesn’t make the pool thereby reducing pollution.

Personal Tags

If only one person has used a particular tag for a piece of content, chances are that it is either an error, or that tag is personal to that person. In the diagram, you can see that John has tagged the photo with “my birthday” [1.0]. If Laura wanted to search for photos of her birthday, she wouldn’t want John’s photo to show up, the tag is only relevant to John. And when John searches for “my birthday” the photo will show up.

System Tags

System Tags aren’t subject to the same threshold as tags entered by users as they should be automatically added by back-end processes. In the diagram, the system is tagging the photo with the date and camera model from the picture’s EXIF data. Automatic system tagging can also be tied to front end semantics; if a user uploads a document to an online Project Management application, the context in which the user hits the upload button (say, inside client->projectname) can be used to determine the system tags.

Bonus: Admin tags

If your system has trusted admin users, when these people tag something it should behave in the same way as system tags - with no threshold.

tag pollution diagram