Google Code

Image by Thomas Hawk via Flickr

Faviki is a featured project on Google Code for it’s creative usage of Google AJAX Language API!

This API allows you to translate and detect the language of blocks of text. Despite the fact it has a word “AJAX” in it’s name, the API can be also accessed from non-JavaScript environments.

What is it all about? As we have written recently, Faviki uses Zemanta API to make auto suggestions for tags. That’s OK for English pages, but what about other languages?

They have to be translated first, so Faviki asks Google AJAX Language API for help :) A great thing is that you don’t need to specify the original language, it recognizes it automatically!

Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).

So, the whole process looks like this (simplified version):

  1. Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
  2. Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
  3. The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
  4. If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of  Wikipedia’s  titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
  5. Finally, suggested tags are offered to a user.

Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.

Reblog this post [with Zemanta]

What is it?

You probably noticed the ‘G’ button on the right hand side of the field for adding new tags, and of the ‘tags’ field in the search. That is the Google search button that we wrote about on our Help page here. However, we thought that this feature deserves its own post on the blog, because it helped us with finding tags many, many times.

How does it work?

With Google search button, you can search for tags as you would search Wikipedia pages on Google. For instance, if you type in ‘apple’, and click on the Google button, the system will automatically add ‘wikipedia’, so your query will actually be ‘apple wikipedia’, and search result will be retrieved from the domain en.wikipedia.org only.

Faviki google search api button

Experience showed us that this way of finding tags can be quite helpful and time saving. Sometimes it is hard to find the most appropriate tag with autocomplete list, and Google is pretty clever when it comes to finding the most popular/representative tag for an acronym or ambiguous term, for instance. So, it is often the case that the tag that you are looking for is at the top of the list. To add it just click on the ‘copy’ link.

Cases in which it beats the autocomplete list

  • Acronyms and their disambiguation:
    • EU = European Union
    • RHCP = Red Hot Chili Peppers
    • CSS = Cascading Style Sheets, Content Scramble System, Cansei de Ser Sexy
    • LCD = Liquid crystal display, Lacida, Lowest common denominator
    • SEO = Search engine optimization, Seasoned equity offering
    • RDF = Resource Description Framework, Robotech Defense Force, Radical Dance Faction
    • REST = Representational State Transfer
  • Ambiguous terms:
    • apple (fruit, digital technology corporation, Fiona Apple, bank…)
    • keyboard (computers, music, magazine…)
    • office (software, place where you work, series, film…)
    • flash (software, superhero, photography, song…)
  • Searching for the right term for the concept:
    • programming = Computer programming;
    • baby = Infant;
    • tiredness = Fatigue (medical);
    • moonlight sonata = Piano Sonata No. 14 (Beethoven);
    • rachmaninov = Sergei Rachmaninoff. (Note that in this case the term is not even spelled correctly)
  • When you know what you think of, but you don’t know/can’t remember how to name it:
    • belarus capital = Minsk
    • eu lead body = European Council
    • kaiser chiefs singer = Ricky Wilson (British musician)
  • If you wish to search for related tags or tags concerning a broad topic:
    • online social (Social network, Social software, Online identity, OpenSocial, Virtual community, Social bookmarking, Social computing)
    • vegetarian (Vegetarianism, Vegetarian cuisine, Vegetarian Society, World Vegetarian Day, Veganism)
    • olympic games (Olympic Games, Summer Olympic Games, Winter Olympic Games, Ancient Olympic Games, Youth Olympic Games)
  • If the tag contains non-English characters, and you don’t want to deal with them:
    • roisin murphy = Róisín Murphy
    • motorhead = Motörhead

Drawbacks

  • It is slightly different than autocomplete list, e.g. you have to click on the ‘copy’ link instead of on the tag name (which is a link to a Wikipedia page)
  • Search results list will also contain some Wikipedia pages which are not tags, like pages whose names start with ‘Special:’, ‘Template:’, ‘User:’, ‘Wikipedia:’, ‘Help:’, ‘User talk:’, ‘Wikipedia talk:’, ‘Category:’. These are special Wikipedia pages and obviously cannot be used for tags, so you cannot add them.

We hope we’ll be able to fix these issues soon.

Summary

Inserting correct tags is essential for Faviki in order to use its potentials to the maximum. But finding the right tag is sometimes a bit tricky. We hope that Google search API can make your tagging easier and more accurate.

Follow

Get every new post delivered to your Inbox.