Relatedness as the first quantitative measure

The first thing that you see in qlaara is the word relatedness graph. It was also where development started two years ago.

The graph shows which words are related to the query word. Relatedness is like similarity, only wider. Dog and puppy are similar, but dog is also related to things like animal, terrier, tail, bark, kennel, cat, bone, obedience, canine, chien and many others. These relations could be classified into various types like synonyms, equivalents or cohyponyms. There are (semi)automatic ways of doing that too, which is one of the things we plan to get to at some stage.

But more importantly, the relations can be quantified, which is what qlaara is about. It is not binary, related or not. Dog and cat are probably related much more closely, both in existing texts and in the minds of people, than dog and tiger, despite the fact that technically or biologically the latter relation should not be too different and there is also a strong relation between cat and tiger. This relatedness measure is what you see as the line thickness on the graph and as a number in the table.

To obtain the relatedness numbers, we use two empirical data sources: text corpora and human intuitions.

In corpora, we use methods of vector semantics to determine the distributional similarity of words. This is based on the observation that you shall now a word by the company it keeps1. I’ll write more on this in the next posts, but the basic idea is that relatedness between two words is highest if these words are used a lot in the same context, not many other words are used in that context and these words are not used too much in other contexts. This mechanical method has its well-known limitations and depends significantly on the corpus used, the selection of parameters and fine details of the calculation method, all of which we continue to work on.

The human touch is added by people like yourself. You can vote relations up or down in table view, or if you have sufficient access privileges in a dictionary, you can also directly enter the value you prefer. (Yes, you can create your own dictionary in qlaara, also with a free account, see your account menu after logging in to get started.) Another important source of human judgment is qlaara labs, where you can help both qlaara and other researchers while having fun with word games. Find the link to qlaara labs on the menu bar or in your account menu, and try out some of the games there.

Arvi



1 Firth, J. R. (1957). A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis, pp. 1–32. Blackwell, Oxford, p 11.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s