September 23, 2008
Image by Thomas Hawk via Flickr
They have to be translated first, so Faviki asks Google AJAX Language API for help A great thing is that you don’t need to specify the original language, it recognizes it automatically!
Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).
So, the whole process looks like this (simplified version):
- Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
- Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
- The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
- If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
- Finally, suggested tags are offered to a user.
Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.
September 19, 2008
Faviki is periodically synchronized with Wikipedia and now contains a little less than a million new tags - around 300.000 new English tags and 669.600 new tags in other languages! That means that currently there are 5.6 million tags in Faviki – 2.7 million English and 2.9 million tags from other 13 languages.
Since the September release and the multi-language tagging feature, you can tag in 14 different languages, and now there are 30% more non-English tags. After English, the largest languages are German (397.8K) and French (388.5K). The fastest growing languages are Italian (51.5% growth) and Polish (44.1%).
Wikipedia/DBpedia growth (values in thousands)
|Language||DBpedia 3.0*||DBpedia 3.1**||growth||growth (%)|
|Total (without Eng)||2255.8||2925.4||669.6||29.68%|
|Total (with Eng)||4655.8||5625.4||969.6||20.83%|
* Jan 08, Japanese version was built in November 2007
** Jun & July 08
Number of non-English tags (values in thousands)
Non-English tags growth
Faviki uses the information about tags from DBpedia datasets. DBpedia extracts structured data from Wikipedia, which is constantly growing. Last release – DBpedia 3.1 has been released recently, marking an increase of 27% over the previous version. The downloads are provided as N-Triples and in CSV format on this page.