June 11, 2009
Faviki is involved in the development of the new open tagging format – Common Tag, together with AdaptiveBlue, DERI (NUI Galway), Freebase, Yahoo!, Zemanta, and Zigtag. This is the first time that this number of web companies have stepped together from day one to introduce a tagging standard.
People use tags to organize, share and discover content on the Web. However, in the absence of a common tagging format, the benefits of tagging have been limited. Individual things like New York City are often represented by multiple tags (like “nyc”, “new_york_city”, and “newyork”), making it difficult to organize related content; and it is not always clear what a particular tag represents – does the tag “orange” represent the fruit or the color?
The Common Tag format was developed to address the current shortcomings of tagging and help everyone, including end users, publishers, and developers get more out of Web content. It is an outcome of an effort to develop the easiest way to let publishers get more out of their content by semantically marking it up.
Common Tag format is based on RDFa, a standard mechanism for placing structured content within HTML documents. The format uses the URIs of concepts defined on the Web as a way of anchoring the meaning of Tag objects. Common concepts can be found, among others, in two big databases of structured content (or controlled vocabularies, as librarians call it) – Freebase and DBpedia.
Common Tag is based on a small vocabulary defining:
- A class Tag, which holds the metadata provided by a Common Tag for a specific Resource.
- Two properties:
There are also few subclasses and optional properties, you can have a look at the whole specification. Also, developers may feel free to make use of RDFa’s flexibility to extend the expressiveness of the Common Tag format.
An example of two tags indicating that the document is about Twitter (DBpedia URI) and Web 2.0 (Freebase URI):
<body xmlns:ctag="http://commontag.org/ns#" rel="ctag:tagged"> <span typeof="ctag:Tag" rel="ctag:means" resource="http://dbpedia/resource/Twitter" /> <span typeof="ctag:Tag" rel="ctag:means" resource="http://rdf.freebase.com/ns/en/web_2_0" /> </body>
Faviki has implemented the Common Tag format (check out the extracted RDFa from Faviki Semantic Web topic page), and we hope that our users will benefit from it, as more publishers, developers and end users join in supporting the Common Tag format.
September 23, 2008
Image by Thomas Hawk via Flickr
They have to be translated first, so Faviki asks Google AJAX Language API for help A great thing is that you don’t need to specify the original language, it recognizes it automatically!
Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).
So, the whole process looks like this (simplified version):
- Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
- Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
- The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
- If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
- Finally, suggested tags are offered to a user.
Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.
September 19, 2008
Faviki is periodically synchronized with Wikipedia and now contains a little less than a million new tags - around 300.000 new English tags and 669.600 new tags in other languages! That means that currently there are 5.6 million tags in Faviki – 2.7 million English and 2.9 million tags from other 13 languages.
Since the September release and the multi-language tagging feature, you can tag in 14 different languages, and now there are 30% more non-English tags. After English, the largest languages are German (397.8K) and French (388.5K). The fastest growing languages are Italian (51.5% growth) and Polish (44.1%).
Wikipedia/DBpedia growth (values in thousands)
|Language||DBpedia 3.0*||DBpedia 3.1**||growth||growth (%)|
|Total (without Eng)||2255.8||2925.4||669.6||29.68%|
|Total (with Eng)||4655.8||5625.4||969.6||20.83%|
* Jan 08, Japanese version was built in November 2007
** Jun & July 08
Number of non-English tags (values in thousands)
Non-English tags growth
Faviki uses the information about tags from DBpedia datasets. DBpedia extracts structured data from Wikipedia, which is constantly growing. Last release – DBpedia 3.1 has been released recently, marking an increase of 27% over the previous version. The downloads are provided as N-Triples and in CSV format on this page.
August 15, 2008
Zemanta is a platform for accelerating on-line content production, by recognizing contextual content and instantly serving relevant images, smart links, keywords and text to the user. Recently they launched an early release of the API and allowed web developers to use Zemanta engine in their applications. The API allows developers to use simple RESTful interface to get suggested images, articles, tags and links in a structured format (XML, JSON, ..) for a given piece of text.
The fact that it suggests Wikipedia links, among others, was particularly interesting to me, so I tested it right away and figured out that it works very well. Now Faviki users don’t have to start from the stretch, because tag suggestions are given through an analysis of the web pages’ content. If a suggestion is OK user just needs to click on the ‘+’ button and the tag is added.
However, do not expect it to do the entire job for you – you are the one who makes the final decision!
We also started using Zemanta on our blog (it can be deployed on all major content publishing platforms). It’s easy to use and it saves time, so we highly recommend it. Great job, Zemanta!
Many thanks to Andraž Tori for the support.
Related articles by Zemanta
July 16, 2008
Having users insert correct tags is essential for Faviki community, so we’ve implemented new features which will hopefully make tagging easier:
Probably you have noticed that this feature exists on Faviki for some time now. More than few of our users have complained that one often can’t figure out what is the tag about and that an additional description is needed. For example, what is a difference between “Color Theory” and “Color theory”? When you hold the mouse cursor over the tag’s name in the bottom right corner a short explanation will show up.
We have noticed that some of our users frequently are saving the few bookmarks in a row that are related to the same subject or they are the follow-ups to the same post. In that case they share the same or the similar tags. The recent tags are the tags that you have entered for the previous bookmarks. Use the arrows to go through your bookmark history and click on the ‘+’ button to add the tag.
Thanks to all of our users who have given us the continuous feedback regarding the new features on Faviki.