December 9, 2008
Zemanta API analyses unstructured documents/texts and returns five types of content objects:
- machine readable static tags
- general categories and custom taxonomies
- named entities with links to objects from major online knowledge databases: Wikipedia, Amazon, IMDB, RottenTomatoes, CrunchBase,… and to selected pool of online media and blogs
- pictures from Flickr, CC sources and professional agencies
- articles from selected media sources and blogs
This is the first API that returns disambiguated entities linked to DBPedia, Freebase, MusicBrainz, and Semantic Crunchbase. The data can be returned in the standard format of Semantic web – RDF.
There is the extensive developers documentation available, including architecture overview, code samples for most popular programming languages, frontside integration SDK, developers forum and application gallery.
API is free to use for up to 10.000 API calls per month, and for a subscription fee above that.
Zemanta API adds great value to Faviki, by analyzing the text from web pages that are saved by users and suggesting related DBpedia concepts. This makes Faviki users’ lives much easier, because now they can add semantic tags with a just one click.
Zemanta API is a powerful technology that has lots of potential. We can’t recommend it highly enough. Keep up the good work Zemanta :)
Related articles by Zemanta
September 23, 2008
Image by Thomas Hawk via Flickr
They have to be translated first, so Faviki asks Google AJAX Language API for help :) A great thing is that you don’t need to specify the original language, it recognizes it automatically!
Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).
So, the whole process looks like this (simplified version):
- Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
- Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
- The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
- If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of Wikipedia’s titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
- Finally, suggested tags are offered to a user.
Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.
August 15, 2008
Zemanta is a platform for accelerating on-line content production, by recognizing contextual content and instantly serving relevant images, smart links, keywords and text to the user. Recently they launched an early release of the API and allowed web developers to use Zemanta engine in their applications. The API allows developers to use simple RESTful interface to get suggested images, articles, tags and links in a structured format (XML, JSON, ..) for a given piece of text.
The fact that it suggests Wikipedia links, among others, was particularly interesting to me, so I tested it right away and figured out that it works very well. Now Faviki users don’t have to start from the stretch, because tag suggestions are given through an analysis of the web pages’ content. If a suggestion is OK user just needs to click on the ‘+’ button and the tag is added.
However, do not expect it to do the entire job for you – you are the one who makes the final decision!
We also started using Zemanta on our blog (it can be deployed on all major content publishing platforms). It’s easy to use and it saves time, so we highly recommend it. Great job, Zemanta!
Many thanks to Andraž Tori for the support.