Hypernotation.orgHypernotation is a new method of publishing structured data on the Web, that results in browsable atomic data with machine-readable and hackable URLs.

To see what it looks like in practice, check out the DBpedia dataset published using Hypernotation. Before you start browsing the data it’s good idea to read through the examples.

more on Hypernotation.org

Today our friends and partners at Zemanta launched a public semantic API, as well as a front side SDK.

Zemanta API analyses unstructured documents/texts and returns five types of content objects:

  • machine readable static tags
  • general categories and custom taxonomies
  • named entities with links to objects from major online knowledge databases: Wikipedia, Amazon, IMDB, RottenTomatoes, CrunchBase,… and to selected pool of online media and blogs
  • pictures from Flickr, CC sources and professional agencies
  • articles from selected media sources and blogs

Zemanta API analyses unstructured documents and returns five types of content objects

This is the first API that returns disambiguated entities linked to DBPedia, Freebase, MusicBrainz, and Semantic Crunchbase. The data can be returned in the standard format of Semantic web – RDF.

There is the extensive developers documentation available, including architecture overview, code samples for most popular programming languages, frontside integration SDK, developers forum and application gallery.

API is free to use for up to 10.000 API calls per month, and for a subscription fee above that.

Zemanta API adds great value to Faviki, by analyzing the text from web pages that are saved by users and suggesting related DBpedia concepts. This makes Faviki users’ lives much easier, because now they can add semantic tags with a just one click.

Zemanta API is a powerful technology that has lots of potential. We can’t recommend it highly enough. Keep up the good work Zemanta :)

Reblog this post [with Zemanta]

Mashable has announced their 2nd Annual Open Web Awards. It is the international online voting competition that covers major innovations in web technology.

Nominations of sites/companies are made by community in 26 different categories. The category we’re competing in is social bookmarking.

You can vote for Faviki here by entering your e-mail address, and confirming it in the mail you’ll receive after voting.

You can also vote for our partner Zemanta here (blog plugins category).

Note that you may nominate a site/company in as many categories as you see fit. However, there is only one nomination per category per e-mail address.

Big thanks for your support! :-)

Reblog this post [with Zemanta]
Google Code

Image by Thomas Hawk via Flickr

Faviki is a featured project on Google Code for it’s creative usage of Google AJAX Language API!

This API allows you to translate and detect the language of blocks of text. Despite the fact it has a word “AJAX” in it’s name, the API can be also accessed from non-JavaScript environments.

What is it all about? As we have written recently, Faviki uses Zemanta API to make auto suggestions for tags. That’s OK for English pages, but what about other languages?

They have to be translated first, so Faviki asks Google AJAX Language API for help :) A great thing is that you don’t need to specify the original language, it recognizes it automatically!

Automatic translations made this way are not perfect, but they seem to be good enough for Zemanta to find appropriate concepts from English Wikipedia, which are finally translated again into user language (using DBpedia data about language connections).

So, the whole process looks like this (simplified version):

  1. Faviki fetches a web page and extracts a core text (without HTML and non-relevant content).
  2. Then it tries to figure out if a content is in English. If it isn’t, it is sent to Google language API, which detects the original language automatically, translates it into English and returns the translation.
  3. The content is then sent to and analyzed by Zemanta API, which then finds relevant links. Faviki uses links from English Wikipedia – titles are used as semantic tags.
  4. If users language is not English, we must translate them. Using DBpedia datasets “Links to Wikipedia Article” , we can find names of  Wikipedia’s  titles in one of 13 languages. These datasets actually contain the connections between English Wikipedia articles and articles from Wikipedia in other languages.
  5. Finally, suggested tags are offered to a user.

Faviki combines three services to make multilingual semantic tags possible. We hope this will help our non English speaking users to tag their bookmarks faster and more easily. These great services will continue improving in time, so expect that the suggested tags will be better, too.

Reblog this post [with Zemanta]

A million new tags in Faviki

September 19, 2008

Faviki is periodically synchronized with Wikipedia and now contains a little less than a million new tags –  around 300.000 new English tags and 669.600 new tags in other languages! That means that currently there are 5.6 million tags in Faviki – 2.7 million English and 2.9 million tags from other 13 languages.

Since the September release and the multi-language tagging feature, you can tag in 14 different languages, and now there are 30% more non-English tags. After English, the largest languages are German (397.8K) and French (388.5K). The fastest growing languages are Italian (51.5% growth) and Polish (44.1%).

Wikipedia/DBpedia growth (values in thousands)

Language DBpedia 3.0* DBpedia 3.1** growth growth (%)
English 2400.0 2700.0 300.0 12.50%
German 335.3 397.8 62.5 18.64%
French 293.4 388.5 95.1 32.41%
Italian 190.7 288.9 98.2 51.49%
Dutch 223.0 288.3 65.3 29.28%
Polish 179.7 259.0 79.3 44.13%
Portuguese 178.7 248.3 69.6 38.95%
Spanish 171.5 228.9 57.4 33.47%
Japanese 164.6 202.3 37.7 22.90%
Russian 117.1 153.6 36.5 31.17%
Swedish 135.5 147.6 12.1 8.93%
Finnish 96.1 115.0 18.9 19.67%
Norwegian 86.9 104.5 17.6 20.25%
Chinese 83.3 102.7 19.4 23.29%
Total (without Eng) 2255.8 2925.4 669.6 29.68%
Total (with Eng) 4655.8 5625.4 969.6 20.83%

* Jan 08, Japanese version was built in November 2007

** Jun & July 08

Number of non-English tags (values in thousands)

Non-English tags growth

Faviki uses the information about tags from DBpedia datasets. DBpedia extracts structured data from Wikipedia, which is constantly growing.  Last release – DBpedia 3.1 has been released recently, marking an increase of 27% over the previous version. The downloads are provided as N-Triples and in CSV format on this page.

Read the rest of this entry »

Faviki is proud to announce the first major upgrade of our service. There are several new features/improvements:

Semantic tagging in 14 languages

Faviki is the first social bookmarking service to offer semantic tagging in various languages! Now users can tag in their own language the same way they have tagged in English.

This is possible thanks to DBpedia, the project which generates datasets containing the information about connections between concepts in English Wikipedia and 13 other language Wikipedias.

Take, for instance, Wikipedia page about ‘Nobel prize’. In German Wikipedia, this page has a title ‘Nobelpreise’ and in Russian Wikipedia – ‘Нобелевская премия’. These page titles act as translated words in a dictionary. If a tag in particular language has no translation, the English term will be used instead.

The system keeps connecting all web pages with English Wikipedia terms, and is able to translate them, thanks to DBpedia datasets. For instance, a Japanese user who has been tagging in English, now with one click can translate his tags into Japanese language. And his friend from France will see the same tags in French.

Beside English, included languages are: German, Spanish, Portuguese, French, Italian, Japanese, Chinese, Dutch, Norwegian, Polish, Russian, Swedish and Finnish.

Private bookmarks

Private bookmarks in Faviki are the bookmarks only you can see.

Tags you’ve added to them are also invisible to others. However, they are used as suggestions when another user saves the same bookmark, and are counted when showing the number of people who saved the link. In both cases, the anonymity is respected.

To make a bookmark private, just check the ‘private’ checkbox on the ‘Edit more’ part in the bookmarklet window. Your private bookmarks will have a small lock icon.

Enhanced UI

The user interface is simplified and (hopefully) improved. Tag clouds are replaced with tag lists containing visual representation of frequencies of tags that can be sorted by name or by count. There is also additional information about related tags.

Thanks to all of our users who have given us the feedback regarding the new features on Faviki.

Reblog this post [with Zemanta]

Nova Spivack, the founder of Twine, held an interesting presentation about the future of the Web on the Next Web conference in Amsterdam. He thinks that we are currently in the process of Internet evolution in which tags are having an increasing significance. He predicts that in the next 10 to 15 years tags will have an increasingly important part while keywords will gradually disappear.

An interesting discussion about the subject took place on Techcrunch when Eric Schonfeld posted this thread asking the question “Is Keyword Search About To Hit Its Breaking Point?“. 97 comments have been posted so far and one of them especially caught my attention:

John Clarke Mills

Tags are nothing new, that is for sure. But what if you could tag an object, or entity, with another object. So instead of tagging objects with strings, which falls back on a simple full-text search, you could tag something with an actual representation?

I think that John has really nailed the point.

The problem with both keywords and tags is that they are just words. But what would happen if, instead of words, we used objects? What if we used unique concepts that would always and everywhere have the same name and would refer to the specific object?

Wikipedia & DBpedia

How can we reach an agreement on the names of such a large number of concepts? Well, it’s already been done and can be found in the largest collection of concepts in the world – Wikipedia. Wikipedia, besides having a standardized way of displaying articles, also has a standardized way of naming titles, which have been created and are constantly perfected by social consensus.

Currently there are over 2.36 million articles in English language on Wikipedia. The titles of Wikipedia articles are unique and cover almost all the concepts we can imagine.

However, the “problem” with Wikipedia is that it is not made for machines, but for humans. Its search capabilities are limited to full-text search, which only allows very limited access to this valuable knowledge-base.

Fortunately, there is DBpedia, which represents community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia.org project uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web.

For example, the web page about Semantic Web on Wikipedia looks like this, while on DBpedia it looks like this (there is also an alternative that is easier to read by humans).

This practically means that based on the name of the tag we can learn more useful information about that tag, its properties and connections to other tags. That is why I believe that DBpedia web pages are good representatives of the “objects”, the references of which will be tags.

Characteristics of new tags

Unique name

Unlike classic tags, which are just words, new tags represent references to unique concepts that have their own URL. For example, the tag “Coca-Cola” has a reference to URL http://dbpedia.org/data/Coca-Cola http://dbpedia.org/resource/Coca-Cola (actually, the name of the tag is just the last part of the URL).

So, instead of having different tags for the same concept, which is the case with classic tags (cocacola, coca-cola, coca+cola, CocaCola) there will be just the one unique “Coca-Cola” tag.

Disambiguation

But what if we wanted to add a tag that has more than one meaning? Let us look at the example of “library”. What are we referring to – “a collection of books”, “collection of subroutines used to develop software” or “the Seinfeld episode called ‘The Library'”?

It is simple – we’ll just use different tags: Library,
Library (computing)
and The Library (Seinfeld episode).

Tag properties and its connections to other tags

New tags are references to objects, and objects, as we know, have certain properties. In DBpedia there are some properties that are common to all tags, such as: an abstract, a picture (if existing), labels in multiple languages, type and subject to whom the tag belongs.

For example, if we look at DBpedia page for Keith Richards we can learn some additional properties about him (year of birth, type of voice, genre of music he plays…) as well as his connections to other tags (born in Dartford, current member of The Rolling Stones, plays Fender Telecaster and Gibson Les Paul, occupation: Music producer, Musician and Songwriter…).

Classification of tags

As I mentioned earlier, tags belong to different groups and form a structure. A system that supports such tags has an advantage over other systems because it automatically classifies tags and so “knows” what Microformat, RDFa, Web Ontology Language and Thesauris have in common. They all belong to the subject Knowledge representation. That’s why with Faviki it is possible to follow the content by subject and not only using one tag (see Knowledge representation page).

Conclusion

I think that tags will truly dominate in the near future. But those will not be the tags that we are used to, but their “smarter” offspring. I believe that the results of this evolution will make the foundation for the future Internet which will handle objects and their properties instead of just web pages. Present situation is not ideal but it makes a good foundation for the development of the universal language that could connect people and the Internet in new and exciting ways.

Follow

Get every new post delivered to your Inbox.