|Example of an empirically grounded conceptual graph. It appears that the same colour is called blu in Italian and lilla in Estonian, both colour names transparent for English speakers.|
A paper just came out about dictionary data structures in the rare situation when we do have access to concepts: How blue is azzurro?Representing probabilistic equivalency of colour terms in a dictionaryby Mari Uusküla and myself. There were two ideas in that paper, only one of which has survived to be used in qlaara.
- First, we included explicit representations of concepts in the graph data structure. Words would then only have relations to concepts, not other words, and any word relatedness would be expressed by being related to the same concept(s). Ontological relations between concepts were allowed, although not implemented in the toy examples of the paper. Explicit concepts were contrasted with graph structures proposed earlier (notably Helge Dyvik), where concepts emerge as a result of some clustering operation. For the paper’s example we selected colours as a very convenient class of concepts that could be presented to people for naming. There are very few other things that you can actually stick to a piece of cardboard and show to people: numbers perhaps, and that’s about it. For all other semantic classes the concept entries would be placeholders with nothing more than the concept number in them, serving only as a connecting point for related words. (Unless of course the lexicographer would manually write definitions for the concepts. That would be so last century, taking decades of manual work and being borderline prescriptionist.)
- Second, we quantified the relations. Traditional dictionaries are binary, a word either has or does not have a meaning, equivalent, synonym etc. This has only been moderated by vague hedges like “frequently”, “sometimes” or “also”. It made sense when writing the paper, and still does, to quantify the relations based on some empirical data source, either psycholinguistic experiments like in this paper, or corpus research.
The main problem with such conceptual graphs is that in realistic situations, we do not have any kind of access to concepts. We can still only create the placeholder concept nodes by clustering, in which case their explicitation is just a matter of form. In situations like future updates to the dictionary (think adding a language), it may even unnecessarily complicate the compilation process.
That’s why the qlaara graph only has nodes for words – the only things that can be directly represented in dictionaries (or any other publications for that matter). Meanings are only in the reader’s mind, and qlaara leaves them there, trusting each reader to make their own inferences from the word graph. Quantification of relations, on the other hand, is central to the qlaara graph. Relation weights are empirical, coming from distributional semantics and human judgement, and intended to make that inference process even easier.