Evolutionary models of ontogeny of linguistic categories and rules

Abstract

Language development is a process by means of which a human baby constructs an adequate competence to encode & decode meanings in language of her parents. Computationally it can be described as a trinity of mutually interconnected problems : clustering of all tokens which baby heard into 1) semantic and 2) grammatical categories ; and 3) discovery of grammatical rules allowing to combine the members of diverse equivalence classes into syntactically correct and meaningful phrases. A theoretical, « psycholinguistic » claim of our Thesis is that similary to those theories which explain emergence of cultural or creative thinking as the result of evolutionary process occuring within an individual mind, the emergence of linguistic representations and faculties within a human individuum can be also considered as a case where basic tenets of Universal Darwinism apply. The practical, « cybernetic » aim of the Thesis is to create a computational models of concept learning, part-of-speech induction and grammar induction having comparable performance to existing models but based principially on evolutionary algorithms. It shall be argued that the « fitness function » , which determines the « survival rate » of « candidate grammars » emerging and disappearing in baby’s mind should be based upon the idea that the most fit is such a grammar G which « minimizes the distance »
between the utterances successfully parsed from linguistic environment E by the application of grammar G and the utterances potentially generated by the grammar G.

Keywords
evolutionary computing, language acquisition, genetic epistemology, part-of-speech induction, grammar induction, optimal clustering, machine learning, concept construction, grammar systems, motherese, toddlerese

WARNING: this is not the doctoral Thesis but "just" a gentle (and already overpassed) introduction to it