September 23, 2008
Image by Thomas Hawk via Flickr
They have to be translated first, so Faviki asks Google AJAX Language API for help 🙂 A great thing is that you don’t need to specify the original language, it recognizes it automatically!
Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).
So, the whole process looks like this (simplified version):
- Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
- Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
- The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
- If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
- Finally, suggested tags are offered to a user.
Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.