December 9, 2008
Zemanta API analyses unstructured documents/texts and returns five types of content objects:
- machine readable static tags
- general categories and custom taxonomies
- named entities with links to objects from major online knowledge databases: Wikipedia, Amazon, IMDB, RottenTomatoes, CrunchBase,… and to selected pool of online media and blogs
- pictures from Flickr, CC sources and professional agencies
- articles from selected media sources and blogs
This is the first API that returns disambiguated entities linked to DBPedia, Freebase, MusicBrainz, and Semantic Crunchbase. The data can be returned in the standard format of Semantic web – RDF.
There is the extensive developers documentation available, including architecture overview, code samples for most popular programming languages, frontside integration SDK, developers forum and application gallery.
API is free to use for up to 10.000 API calls per month, and for a subscription fee above that.
Zemanta API adds great value to Faviki, by analyzing the text from web pages that are saved by users and suggesting related DBpedia concepts. This makes Faviki users’ lives much easier, because now they can add semantic tags with a just one click.
Zemanta API is a powerful technology that has lots of potential. We can’t recommend it highly enough. Keep up the good work Zemanta :)
Related articles by Zemanta
September 23, 2008
Image by Thomas Hawk via Flickr
They have to be translated first, so Faviki asks Google AJAX Language API for help :) A great thing is that you don’t need to specify the original language, it recognizes it automatically!
Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).
So, the whole process looks like this (simplified version):
- Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
- Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
- The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
- If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
- Finally, suggested tags are offered to a user.
Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.
September 19, 2008
Faviki is periodically synchronized with Wikipedia and now contains a little less than a million new tags – around 300.000 new English tags and 669.600 new tags in other languages! That means that currently there are 5.6 million tags in Faviki – 2.7 million English and 2.9 million tags from other 13 languages.
Since the September release and the multi-language tagging feature, you can tag in 14 different languages, and now there are 30% more non-English tags. After English, the largest languages are German (397.8K) and French (388.5K). The fastest growing languages are Italian (51.5% growth) and Polish (44.1%).
Wikipedia/DBpedia growth (values in thousands)
|Language||DBpedia 3.0*||DBpedia 3.1**||growth||growth (%)|
|Total (without Eng)||2255.8||2925.4||669.6||29.68%|
|Total (with Eng)||4655.8||5625.4||969.6||20.83%|
* Jan 08, Japanese version was built in November 2007
** Jun & July 08
Number of non-English tags (values in thousands)
Non-English tags growth
Faviki uses the information about tags from DBpedia datasets. DBpedia extracts structured data from Wikipedia, which is constantly growing. Last release – DBpedia 3.1 has been released recently, marking an increase of 27% over the previous version. The downloads are provided as N-Triples and in CSV format on this page.