Common Tag LogoAs strong believers in the semantic tagging (we wrote about it here and here), we are happy to announce that today one big  step toward realization of the idea is made.

Faviki is involved in the development of the  new open tagging format – Common Tag, together with AdaptiveBlue, DERI (NUI Galway), Freebase, Yahoo!, Zemanta, and Zigtag. This is the first time that this number of web companies have stepped together from day one to introduce a tagging standard.

People use tags to organize, share and discover content on the Web. However, in the absence of a common tagging format, the benefits of tagging have been limited. Individual things like New York City are often represented by multiple tags (like “nyc”, “new_york_city”, and “newyork”), making it difficult to organize related content; and it is not always clear what a particular tag represents – does the tag “orange” represent the fruit or the color?

The Common Tag format was developed to address the current shortcomings of tagging and help everyone, including end users, publishers, and developers get more out of Web content. It is an outcome of an effort to develop the easiest way to let publishers get more out of their content by semantically marking it up.

Common Tag format is based on RDFa, a standard mechanism for placing structured content within HTML documents. The format uses the URIs of concepts defined on the Web as a way of anchoring the meaning of Tag objects. Common concepts can be found, among others, in two big databases of structured content (or controlled vocabularies, as librarians call it) – Freebase and DBpedia.

Common Tag is based on a small vocabulary defining:

  • A class Tag, which holds the metadata provided by a Common Tag for a specific Resource.
  • Two properties:
    • tagged (connects a document to the Tag)
    • means (connects the Tag to the concept’s URI)

There are also few subclasses and optional properties, you can have a look at the whole specification. Also, developers may feel free to make use of RDFa’s flexibility to extend the expressiveness of the Common Tag format.

An example of two tags indicating that the document is about Twitter (DBpedia URI) and Web 2.0 (Freebase URI):

<body xmlns:ctag="http://commontag.org/ns#" rel="ctag:tagged">
    <span typeof="ctag:Tag"
              rel="ctag:means" resource="http://dbpedia/resource/Twitter" />

    <span typeof="ctag:Tag"
              rel="ctag:means" resource="http://rdf.freebase.com/ns/en/web_2_0" />
</body>

Faviki has implemented the Common Tag format (check out the extracted RDFa from Faviki Semantic Web topic page), and we hope that our users will benefit from it, as more publishers, developers and end users join in supporting the Common Tag format.

http://dbpedia/resource/Twitter
Reblog this post [with Zemanta]
Google Code

Image by Thomas Hawk via Flickr

Faviki is a featured project on Google Code for it’s creative usage of Google AJAX Language API!

This API allows you to translate and detect the language of blocks of text. Despite the fact it has a word “AJAX” in it’s name, the API can be also accessed from non-JavaScript environments.

What is it all about? As we have written recently, Faviki uses Zemanta API to make auto suggestions for tags. That’s OK for English pages, but what about other languages?

They have to be translated first, so Faviki asks Google AJAX Language API for help 🙂 A great thing is that you don’t need to specify the original language, it recognizes it automatically!

Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).

So, the whole process looks like this (simplified version):

  1. Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
  2. Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
  3. The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
  4. If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of  Wikipedia’s  titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
  5. Finally, suggested tags are offered to a user.

Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.

Reblog this post [with Zemanta]

A million new tags in Faviki

September 19, 2008

Faviki is periodically synchronized with Wikipedia and now contains a little less than a million new tags –  around 300.000 new English tags and 669.600 new tags in other languages! That means that currently there are 5.6 million tags in Faviki – 2.7 million English and 2.9 million tags from other 13 languages.

Since the September release and the multi-language tagging feature, you can tag in 14 different languages, and now there are 30% more non-English tags. After English, the largest languages are German (397.8K) and French (388.5K). The fastest growing languages are Italian (51.5% growth) and Polish (44.1%).

Wikipedia/DBpedia growth (values in thousands)

Language DBpedia 3.0* DBpedia 3.1** growth growth (%)
English 2400.0 2700.0 300.0 12.50%
German 335.3 397.8 62.5 18.64%
French 293.4 388.5 95.1 32.41%
Italian 190.7 288.9 98.2 51.49%
Dutch 223.0 288.3 65.3 29.28%
Polish 179.7 259.0 79.3 44.13%
Portuguese 178.7 248.3 69.6 38.95%
Spanish 171.5 228.9 57.4 33.47%
Japanese 164.6 202.3 37.7 22.90%
Russian 117.1 153.6 36.5 31.17%
Swedish 135.5 147.6 12.1 8.93%
Finnish 96.1 115.0 18.9 19.67%
Norwegian 86.9 104.5 17.6 20.25%
Chinese 83.3 102.7 19.4 23.29%
Total (without Eng) 2255.8 2925.4 669.6 29.68%
Total (with Eng) 4655.8 5625.4 969.6 20.83%

* Jan 08, Japanese version was built in November 2007

** Jun & July 08

Number of non-English tags (values in thousands)

Non-English tags growth

Faviki uses the information about tags from DBpedia datasets. DBpedia extracts structured data from Wikipedia, which is constantly growing.  Last release – DBpedia 3.1 has been released recently, marking an increase of 27% over the previous version. The downloads are provided as N-Triples and in CSV format on this page.

Read the rest of this entry »

We’re excited to announce that Faviki has officialy started to use Zemanta API to suggest possible Wikipedia concepts.

Zemanta is a platform for accelerating on-line content production, by recognizing contextual content and instantly serving relevant images, smart links, keywords and text to the user. Recently they launched an early release of the API and allowed web developers to use Zemanta engine in their applications. The API allows developers to use simple RESTful interface to get suggested images, articles, tags and links in a structured format (XML, JSON, ..) for a given piece of text.

The fact that it suggests Wikipedia links, among others, was particularly interesting to me, so I tested it right away and figured out that it works very well. Now Faviki users don’t have to start from the stretch, because tag suggestions are given through an analysis of the web pages’ content. If a suggestion is OK user just needs to click on the ‘+’ button and the tag is added.

However, do not expect it to do the entire job for you – you are the one who makes the final decision!

We also started using Zemanta on our blog (it can be deployed on all major content publishing platforms). It’s easy to use and it saves time, so we highly recommend it. Great job, Zemanta!

Many thanks to Andraž Tori for the support.

Reblog this post [with Zemanta]

Having users insert correct tags is essential for Faviki community, so we’ve implemented new features which will hopefully make tagging easier:

Tag description

Probably you have noticed that this feature exists on Faviki for some time now. More than few of our users have complained that one often can’t figure out what is the tag about and that an additional description is needed. For example, what is a difference between “Color Theory” and “Color theory”? When you hold the mouse cursor over the tag’s name in the bottom right corner a short explanation will show up.

Faviki tag description example

Faviki tag description example

Recent tags

We have noticed that some of our users frequently are saving the few bookmarks in a row that are related to the same subject or they are the follow-ups to the same post. In that case they share the same or the similar tags. The recent tags are the tags that you have entered for the previous bookmarks. Use the arrows to go through your bookmark history and click on the ‘+’ button to add the tag.

Faviki recent tags example

Thanks to all of our users who have given us the continuous feedback regarding the new features on Faviki.

Zemanta Pixie

What is it?

You probably noticed the ‘G’ button on the right hand side of the field for adding new tags, and of the ‘tags’ field in the search. That is the Google search button that we wrote about on our Help page here. However, we thought that this feature deserves its own post on the blog, because it helped us with finding tags many, many times.

How does it work?

With Google search button, you can search for tags as you would search Wikipedia pages on Google. For instance, if you type in ‘apple’, and click on the Google button, the system will automatically add ‘wikipedia’, so your query will actually be ‘apple wikipedia’, and search result will be retrieved from the domain en.wikipedia.org only.

Faviki google search api button

Experience showed us that this way of finding tags can be quite helpful and time saving. Sometimes it is hard to find the most appropriate tag with autocomplete list, and Google is pretty clever when it comes to finding the most popular/representative tag for an acronym or ambiguous term, for instance. So, it is often the case that the tag that you are looking for is at the top of the list. To add it just click on the ‘copy’ link.

Cases in which it beats the autocomplete list

  • Acronyms and their disambiguation:
    • EU = European Union
    • RHCP = Red Hot Chili Peppers
    • CSS = Cascading Style Sheets, Content Scramble System, Cansei de Ser Sexy
    • LCD = Liquid crystal display, Lacida, Lowest common denominator
    • SEO = Search engine optimization, Seasoned equity offering
    • RDF = Resource Description Framework, Robotech Defense Force, Radical Dance Faction
    • REST = Representational State Transfer
  • Ambiguous terms:
    • apple (fruit, digital technology corporation, Fiona Apple, bank…)
    • keyboard (computers, music, magazine…)
    • office (software, place where you work, series, film…)
    • flash (software, superhero, photography, song…)
  • Searching for the right term for the concept:
    • programming = Computer programming;
    • baby = Infant;
    • tiredness = Fatigue (medical);
    • moonlight sonata = Piano Sonata No. 14 (Beethoven);
    • rachmaninov = Sergei Rachmaninoff. (Note that in this case the term is not even spelled correctly)
  • When you know what you think of, but you don’t know/can’t remember how to name it:
    • belarus capital = Minsk
    • eu lead body = European Council
    • kaiser chiefs singer = Ricky Wilson (British musician)
  • If you wish to search for related tags or tags concerning a broad topic:
    • online social (Social network, Social software, Online identity, OpenSocial, Virtual community, Social bookmarking, Social computing)
    • vegetarian (Vegetarianism, Vegetarian cuisine, Vegetarian Society, World Vegetarian Day, Veganism)
    • olympic games (Olympic Games, Summer Olympic Games, Winter Olympic Games, Ancient Olympic Games, Youth Olympic Games)
  • If the tag contains non-English characters, and you don’t want to deal with them:
    • roisin murphy = Róisín Murphy
    • motorhead = Motörhead

Drawbacks

  • It is slightly different than autocomplete list, e.g. you have to click on the ‘copy’ link instead of on the tag name (which is a link to a Wikipedia page)
  • Search results list will also contain some Wikipedia pages which are not tags, like pages whose names start with ‘Special:’, ‘Template:’, ‘User:’, ‘Wikipedia:’, ‘Help:’, ‘User talk:’, ‘Wikipedia talk:’, ‘Category:’. These are special Wikipedia pages and obviously cannot be used for tags, so you cannot add them.

We hope we’ll be able to fix these issues soon.

Summary

Inserting correct tags is essential for Faviki in order to use its potentials to the maximum. But finding the right tag is sometimes a bit tricky. We hope that Google search API can make your tagging easier and more accurate.