. . . . . . . . . . . . . . RAMSCAR . . . . . . . . . LAB . . . . . . . . . PAPERS . . . . . . . . . LAB PEOPLE . . . . . . . . . RESEARCH . . . . . . . . . LIBRARIES
cognition language & learning lab
justine explains...
modeling language as prediction
department of psychology, stanford university

Language Learning Through Similarity-Based Generalization

Language Models and the Problem of Data-Sparsity
The question of how people are able to produce a seemingly limitless number of utterances, many of which are unique and original, is addressed by two main categories of theories. One maintains that language is based upon a complex and unlearnable system of recursive rules that govern the basic syntax of language, and from which the multitude of surface-forms can be constructed. The other category argues that people acquire knowledge of a language by generating probabilistic models from previous linguistic samples they observed, then use this information to process language structures they had not previously encountered. However, the latter view is challenged by the issue of data-sparsity. The objection is that the linguistic sample a language learner encounters is insufficient to produce a probabilistic model that allows him or her to understand complex underlying patterns and exploit them to process novel constructions of language. In other words, in order to train a model to detect sophisticated patterns, more data is required than can be collected.