Mon . 20 Jun 2020
TR | RU | UK | KK | BE |

Computational linguistics

computational linguistics, computational linguistics programs
Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective

Traditionally, computational linguistics was performed by computer scientists who had specialized in the application of computers to the processing of a natural language But little if any success was made Computational linguists often work as members of interdisciplinary teams, which can include regular linguists, experts in the target language, and computer scientists In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, Quantum Physicists, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others

Computational linguistics has theoretical and applied components Theoretical computational linguistics focuses on issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use

The Association for Computational Linguistics defines computational linguistics as:

the scientific study of language from a computational perspective Computational linguists are interested in providing computational models of various kinds of linguistic phenomena


  • 1 Origins
  • 2 Approaches
    • 21 Developmental approaches
    • 22 Structural approaches
    • 23 Production approaches
      • 231 Text-based interactive approach
      • 232 Speech-based interactive approach
    • 24 Comprehension approaches
  • 3 Applications
  • 4 Subfields
  • 5 Legacy
  • 6 See also
  • 7 References
  • 8 External links


Computational linguistics is often grouped within the field of artificial intelligence, but actually was present before the development of artificial intelligence Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English Since computers can make arithmetic calculations much faster and more accurately than humans, it was thought to be only a short matter of time before they could also begin to process language Computational and quantitative methods are also used historically in attempted reconstruction of earlier forms of modern languages and subgrouping modern languages into language families Earlier methods such as lexicostatistics and glottochronology have been proven to be premature and inaccurate However, recent interdisciplinary studies which borrow concepts from biological studies, especially gene mapping, have proved to produce more sophisticated analytical tools and more trustful results

When machine translation also known as mechanical translation failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed Computational linguistics was born as the name of the new field of study devoted to developing algorithms and software for intelligently processing language data When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages

In order to translate one language into another, it was observed that one had to understand the grammar of both languages, including both morphology the grammar of word forms and syntax the grammar of sentence structure In order to understand syntax, one had to also understand the semantics and the lexicon or 'vocabulary', and even something of the pragmatics of language use Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers

Nowadays research within the scope of computational linguistics is done at computational linguistics departments, computational linguistics laboratories, computer science departments, and linguistics departments Some research in the field of computational linguistics aims to create working speech or text processing systems while others aim to create a system allowing human-machine interaction Programs meant for human-machine communication are called conversational agents


Just as computational linguistics can be performed by experts in a variety of fields and through a wide assortment of departments, so too can the research fields broach a diverse range of topics The following sections discuss some of the literature available across the entire field broken into four main area of discourse: developmental linguistics, structural linguistics, linguistic production, and linguistic comprehension

Developmental approaches

Language is a cognitive skill which develops throughout the life of an individual This developmental process has been examined using a number of techniques, and a computational approach is one of them Human language development does provide some constraints which make it harder to apply a computational method to understanding it For instance, during language acquisition, human children are largely only exposed to positive evidence This means that during the linguistic development of an individual, only evidence for what is a correct form is provided, and not evidence for what is not correct This is insufficient information for a simple hypothesis testing procedure for information as complex as language, and so provides certain boundaries for a computational approach to modeling language development and acquisition in an individual

Attempts have been made to model the developmental process of language acquisition in children from a computational angle, leading to both statistical grammars and connectionist models Work in this realm has also been proposed as a method to explain the evolution of language through history Using models, it has been shown that languages can be learned with a combination of simple input presented incrementally as the child develops better memory and longer attention span This was simultaneously posed as a reason for the long developmental period of human children Both conclusions were drawn because of the strength of the neural network which the project created

The ability of infants to develop language has also been modeled using robots in order to test linguistic theories Enabled to learn as children might, a model was created based on an affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure, vastly simplifying the learning process and shedding light on information which furthers the current understanding of linguistic development It is important to note that this information could only have been empirically tested using a computational approach

As our understanding of the linguistic development of an individual within a lifetime is continually improved using neural networks and learning robotic systems, it is also important to keep in mind that languages themselves change and develop through time Computational approaches to understanding this phenomenon have unearthed very interesting information Using the Price Equation and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution, but also gives insight into the evolutionary history of modern-day languages This modeling effort achieved, through computational linguistics, what would otherwise have been impossible

It is clear that the understanding of linguistic development in humans as well as throughout evolutionary time has been fantastically improved because of advances in computational linguistics The ability to model and modify systems at will affords science an ethical method of testing hypotheses that would otherwise be intractable

Structural approaches

In order to create better computational models of language, an understanding of language’s structure is crucial To this end, the English language has been meticulously studied using computational approaches to better understand how the language works on a structural level One of the most important pieces of being able to study linguistic structure is the availability of large linguistic corpora This grants computational linguists the raw data necessary to run their models and gain a better understanding of the underlying structures present in the vast amount of data which is contained in any single language One of the most cited English linguistic corpora is the Penn Treebank Containing over 45 million words of American English, this corpus has been annotated for part-of-speech information This type of annotated corpus allows other researchers to apply hypotheses and measures that would otherwise be impossible to perform

Theoretical approaches to the structure of languages have also been developed These works allow computational linguistics to have a framework within which to work out hypotheses that will further the understanding of the language in a myriad of ways One of the original theoretical theses on internalization of grammar and structure of language proposed two types of models In these models, rules or patterns learned increase in strength with the frequency of their encounter The work also created a question for computational linguists to answer: how does an infant learn a specific and non-normal grammar Chomsky Normal Form without learning an overgeneralized version and getting stuck Theoretical efforts like these set the direction for research to go early in the lifetime of a field of study, and are crucial to the growth of the field

Structural information about languages allows for the discovery and implementation of similarity recognition between pairs of text utterances For instance, it has recently been proven that based on the structural information present in patterns of human discourse, conceptual recurrence plots can be used to model and visualize trends in data and create reliable measures of similarity between natural textual utterances This technique is a strong tool for further probing the structure of human discourse Without the computational approach to this question, the vastly complex information present in discourse data would have remained inaccessible to scientists

Information regarding the structural data of a language is available for English as well as other languages, such as Japanese Using computational methods, Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length Though the exact cause of this lognormality remains unknown, it is precisely this sort of intriguing information which computational linguistics is designed to uncover This information could lead to further important discoveries regarding the underlying structure of Japanese, and could have any number of effects on the understanding of Japanese as a language Computational linguistics allows for very exciting additions to the scientific knowledge base to happen quickly and with very little room for doubt

Without a computational approach to the structure of linguistic data, much of the information that is available now would still be hidden under the vastness of data within any single language Computational linguistics allows scientists to parse huge amounts of data reliably and efficiently, creating the possibility for discoveries unlike any seen in most other approaches

Production approaches

The production of language is equally as complex in the information it provides and the necessary skills which a fluent producer must have That is to say, comprehension is only half the problem of communication The other half is how a system produces language, and computational linguistics has made some very interesting discoveries in this area

Alan Turing: computer scientist and namesake developer of the Turing Test as a method of measuring the intelligence of a machine

In a now famous paper published in 1950 Alan Turing proposed the possibility that machines might one day have the ability to "think" As a thought experiment for what might define the concept of thought in machines, he proposed an "imitation test" in which a human subject has two text-only conversations, one with a fellow human and another with a machine attempting to respond like a human Turing proposes that if the subject cannot tell the difference between the human and the machine, it may be concluded that the machine is capable of thought Today this test is known as the Turing test and it remains an influential idea in the area of artificial intelligence

Joseph Weizenbaum: former MIT professor and computer scientist who developed ELIZA, a primitive computer program utilizing natural language processing

One of the earliest and best known examples of a computer program designed to converse naturally with humans is the ELIZA program developed by Joseph Weizenbaum at MIT in 1966 The program emulated a Rogerian psychotherapist when responding to written statements and questions posed by a user It appeared capable of understanding what was said to it and responding intelligently, but in truth it simply followed a pattern matching routine that relied on only understanding a few keywords in each sentence Its responses were generated by recombining the unknown parts of the sentence around properly translated versions of the known words For example, in the phrase "It seems that you hate me" ELIZA understands "you" and "me" which matches the general pattern "you me", allowing ELIZA to update the words "you" and "me" to "I" and "you" and replying "What makes you think I hate you" In this example ELIZA has no understanding of the word "hate", but it is not required for a logical response in the context of this type of psychotherapy

Some projects are still trying to solve the problem which first started computational linguistics off as its own field in the first place However, the methods have become more refined and clever, and consequently the results generated by computational linguists have become more enlightening In an effort to improve computer translation, several models have been compared, including hidden Markov models, smoothing techniques, and the specific refinements of those to apply them to verb translation The model which was found to produce the most natural translations of German and French words was a refined alignment model with a first-order dependence and a fertility model They also provide efficient training algorithms for the models presented, which can give other scientists the ability to improve further on their results This type of work is specific to computational linguistics, and has applications which could vastly improve understanding of how language is produced and comprehended by computers

Work has also been done in making computers produce language in a more naturalistic manner Using linguistic input from humans, algorithms have been constructed which are able to modify a system's style of production based on a factor such as linguistic input from a human, or more abstract factors like politeness or any of the five main dimensions of personality This work takes a computational approach via parameter estimation models to categorize the vast array of linguistic styles we see across individuals and simplify it for a computer to work in the same way, making human-computer interaction much more natural

Text-based interactive approach

Many of the earliest and simplest models of human-computer interaction, such as ELIZA for example, involve a text-based input from the user to generate a response from the computer By this method, words typed by a user trigger the computer to recognize specific patterns and reply accordingly, through a process known as keyword spotting

Speech-based interactive approach

Recent technologies have placed more of an emphasis on speech-based interactive systems These systems, such as Siri of the iOS operating system, operate on a similar pattern-recognizing technique as that of text-based systems, but with the former, the user input is conducted through speech recognition This branch of linguistics involves the processing of the user's speech as sound waves and the interpreting of the acoustics and language patterns in order for the computer to recognize the input

Comprehension approaches

Much of the focus of modern computational linguistics is on comprehension With the proliferation of the internet and the abundance of easily accessible written human language, the ability to create a program capable of understanding human language would have many broad and exciting possibilities, including improved search engines, automated customer service, and online education

Early work in comprehension included applying Bayesian statistics to the task of optical character recognition, as illustrated by Bledsoe and Browing in 1959 in which a large dictionary of possible letters were generated by "learning" from example letters and then the probability that any one of those learned examples matched the new input was combined to make a final decision Other attempts at applying Bayesian statistics to language analysis included the work of Mosteller and Wallace 1963 in which an analysis of the words used in The Federalist Papers was used to attempt to determine their authorship concluding that Madison most likely authored the majority of the papers

In 1971 Terry Winograd developed an early natural language processing engine capable of interpreting naturally written commands within a simple rule governed environment The primary language parsing program in this project was called SHRDLU, which was capable of carrying out a somewhat natural conversation with the user giving it commands, but only within the scope of the toy environment designed for the task This environment consisted of different shaped and colored blocks, and SHRDLU was capable of interpreting commands such as "Find a block which is taller than the one you are holding and put it into the box" and asking questions such as "I don't understand which pyramid you mean" in response to the user's input While impressive, this kind of natural language processing has proven much more difficult outside the limited scope of the toy environment Similarly a project developed by NASA called LUNAR was designed to provide answers to naturally written questions about the geological analysis of lunar rocks returned by the Apollo missions These kinds of problems are referred to as question answering

Initial attempts at understanding spoken language were based on work done in the 1960s and 70s in signal modeling where an unknown signal is analyzed to look for patterns and to make predictions based on its history An initial and somewhat successful approach to applying this kind of signal modeling to language was achieved with the use of hidden Markov models as detailed by Rabiner in 1989 This approach attempts to determine probabilities for the arbitrary number of models that could be being used in generating speech as well as modeling the probabilities for various words generated from each of these possible models Similar approaches were employed in early speech recognition attempts starting in the late 70s at IBM using word/part-of-speech pair probabilities

More recently these kinds of statistical approaches have been applied to more difficult tasks such as topic identification using Bayesian parameter estimation to infer topic probabilities in text documents


Modern computational linguistics is often a combination of studies in computer science and programming, math, particularly statistics, language structures, and natural language processing Combined, these fields most often lead to the development of systems that can recognize speech and perform some task based on that speech Examples include speech recognition software, such as Apple's Siri feature, spellcheck tools, speech synthesis programs, which are often used to demonstrate pronunciation or help the disabled, and machine translation programs and websites, such as Google Translate and Word Reference

Computational linguistics can be especially helpful in situations involving social media and the Internet For example, filters in chatrooms or on website searches require computational linguistics Chat operators often use filters to identify certain words or phrases and deem them inappropriate so that users cannot submit them Another example of using filters is on websites Schools use filters so that websites with certain keywords are blocked from children to view There are also many programs in which parents use Parental controls to put content filters in place Computational linguists can also develop programs that group and organize content through Social media mining An example of this is Twitter, in which programs can group tweets by subject or keywords


Computational linguistics can be divided into major areas depending upon the medium of the language being processed, whether spoken or textual; and upon the task being performed, whether analyzing language recognition or synthesizing language generation

Speech recognition and speech synthesis deal with how spoken language can be understood or created using computers Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages The possibility of automatic language translation, however, has yet to be realized and remains a notorious branch of computational linguistics

Some of the areas of research that are studied by computational linguistics include:

  • Computational complexity of natural language, largely modeled on automata theory, with the application of context-sensitive grammar and linearly bounded Turing machines
  • Computational semantics comprises defining suitable logics for linguistic meaning representation, automatically constructing them and reasoning with them
  • Computer-aided corpus linguistics, which has been used since the 1970s as a way to make detailed advances in the field of discourse analysis
  • Design of parsers or chunkers for natural languages
  • Design of taggers like POS-taggers part-of-speech taggers
  • Machine translation as one of the earliest and most difficult applications of computational linguistics draws on many subfields
  • Simulation and study of language evolution in historical linguistics/glottochronology


The subject of Computational Linguistics has had a recurring impact on popular culture:

  • The 1983 film WarGames features a young computer hacker who interacts with an artificially intelligent supercomputer
  • A 1997 film, Conceiving Ada, focuses on Ada Lovelace, considered one of the first computer scientists, as well as themes of computational linguistics
  • Her, a 2013 film depicts a man's interactions with the "world's first artificially intelligent operating system"
  • 2014's The Imitation Game follows the life of computer scientist Alan Turing, developer of the Turing Test
  • The 2015 film Ex Machina centers around human interaction with artificial intelligence

See also

  • AI portal
  • Mind and Brain portal


  1. ^ Hans Uszkoreit What Is Computational Linguistics Department of Computational Linguistics and Phonetics of Saarland University
  2. ^ The Association for Computational Linguistics What is Computational Linguistics Published online, Feb, 2005
  3. ^ John Hutchins: Retrospect and prospect in computer-based translation Proceedings of MT Summit VII, 1999, pp 30–44
  4. ^ Arnold B Barach: Translating Machine 1975: And the Changes To Come
  5. ^ T Crowley, C Bowern An Introduction to Historical Linguistics Auckland, NZ: Oxford UP, 1992 Print
  6. ^ Natural Language Processing by Liz Liddy, Eduard Hovy, Jimmy Lin, John Prager, Dragomir Radev, Lucy Vanderwende, Ralph Weischedel
  7. ^ "Computational Linguistics and Phonetics"
  8. ^ "Yatsko's Computational Linguistics Laboratory"
  9. ^ "CLIP"
  10. ^ Computational Linguistics – Department of Linguistics – Georgetown College
  11. ^ "UPenn Linguistics: Computational Linguistics"
  12. ^ Jurafsky, D, & Martin, J H 2009 Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition Upper Saddle River, NJ: Pearson Prentice Hall
  13. ^ Bowerman, M 1988 The "no negative evidence" problem: How do children avoid constructing an overly general grammar Explaining language universals
  14. ^ a b c d Braine, MDS 1971 On two types of models of the internalization of grammars In DI Slobin Ed, The ontogenesis of grammar: A theoretical perspective New York: Academic Press
  15. ^ Powers, DMW & Turk, CCR 1989 Machine Learning of Natural Language Springer-Verlag ISBN 978-0-387-19557-5
  16. ^ a b Elman, J 1993 Learning and development in neural networks: The importance of starting small Cognition, 71-99
  17. ^ Salvi, G, Montesano, L, Bernardino, A, & Santos-Victor, J 2012 Language bootstrapping: learning word meanings from perception-action association IEEE transactions on systems, man, and cybernetics Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society, 423, 660-71 doi:101109/TSMCB20112172420
  18. ^ Gong, T; Shuai, L; Tamariz, M & Jäger, G 2012 E Scalas, ed "Studying Language Change Using Price Equation and Pólya-urn Dynamics" PLoS ONE 7 3: e33171 doi:101371/journalpone0033171 
  19. ^ Marcus, M & Marcinkiewicz, M 1993 "Building a large annotated corpus of English: The Penn Treebank" PDF Computational linguistics 19 2: 313–330 
  20. ^ a b Angus, D; Smith, A & Wiles, J 2012 "Conceptual recurrence plots: revealing patterns in human discourse" IEEE transactions on visualization and computer graphics 18 6: 988–97 doi:101109/TVCG2011100 
  21. ^ a b Furuhashi, S & Hayakawa, Y 2012 "Lognormality of the Distribution of Japanese Sentence Lengths" Journal of the Physical Society of Japan 81 3: 034004 doi:101143/JPSJ81034004 
  22. ^ Turing, A M 1950 "Computing machinery and intelligence" Mind 59 236: 433–460 doi:101093/mind/lix236433 JSTOR 102307/2251299 
  23. ^ Weizenbaum, J 1966 "ELIZA—a computer program for the study of natural language communication between man and machine" Communications of the ACM 9 1: 36–45 doi:101145/365153365168 
  24. ^ Och, F J; Ney, H 2003 "A Systematic Comparison of Various Statistical Alignment Models" Computational Linguistics 29 1: 19–51 doi:101162/089120103321337421 
  25. ^ Mairesse, F 2011 "Controlling user perceptions of linguistic style: Trainable generation of personality traits" Computational Linguistics 37 3: 455–488 doi:101162/COLI_a_00063 
  26. ^ Language Files The Ohio State University Department of Linguistics 2011 pp 624–634 ISBN 9780814251799 
  27. ^ Bledsoe, W W & Browning, I 1959 Pattern recognition and reading by machine Papers presented at the December 1–3, 1959, eastern joint IRE-AIEE-ACM computer conference on - IRE-AIEE-ACM ’59 Eastern New York, New York, USA: ACM Press pp 225–232 doi:101145/14602991460326 
  28. ^ Mosteller, F 1963 "Inference in an authorship problem" Journal of the American Statistical Association 58 302: 275–309 doi:102307/2283270 JSTOR 102307/2283270 
  29. ^ Winograd, T 1971 "Procedures as a Representation for Data in a Computer Program for Understanding Natural Language" Report 
  30. ^ Woods, W; Kaplan, R & Nash-Webber, B 1972 "The lunar sciences natural language information system" Report 
  31. ^ Rabiner, L 1989 "A tutorial on hidden Markov models and selected applications in speech recognition" Proceedings of the IEEE 
  32. ^ Bahl, L; Baker, J; Cohen, P; Jelinek, F 1978 "Recognition of continuously read natural corpus" Acoustics, Speech, and Signal 3: 422–424 doi:101109/ICASSP19781170402 
  33. ^ Blei, D & Ng, A 2003 "Latent dirichlet allocation" The Journal of Machine Learning 3: 993–1022 
  34. ^ a b "Careers in Computational Linguistics" California State University Retrieved 19 September 2016 
  35. ^ Marujo, Lu i ´ s et al "Automatic Keyword Extraction on Twitter" Language Technologies Institute, Carnegie Melon University, nd Web 19 Sept 2016
  36. ^ Oettinger, A G 1965 Computational Linguistics The American Mathematical Monthly, Vol 72, No 2, Part 2: Computers and Computing, pp 147-150
  37. ^ McEnery, Thomas 1996 Corpus Linguistics: An Introduction Edinburgh: Edinburgh University Press p 114 ISBN 0748611657 
  38. ^ Badham, John 1983-06-03, WarGames, retrieved 2016-02-22 
  39. ^ Hershman-Leeson, Lynn 1999-02-19, Conceiving Ada, retrieved 2016-02-22 
  40. ^ Jonze, Spike 2014-01-10, Her, retrieved 2016-02-18 
  41. ^ Tyldum, Morten 2014-12-25, The Imitation Game, retrieved 2016-02-18 
  42. ^ Garland, Alex 2015-04-24, Ex Machina, retrieved 2016-02-18 

External links

  • Association for Computational Linguistics ACL
    • ACL Anthology of research papers
    • ACL Wiki for Computational Linguistics
  • CICLing annual conferences on Computational Linguistics
  • Computational Linguistics – Applications workshop
  • Free online introductory book on Computational Linguistics at the Wayback Machine archived January 25, 2008
  • Language Technology World
  • Resources for Text, Speech and Language Processing
  • The Research Group in Computational Linguistics

computational linguistics, computational linguistics graduate program, computational linguistics jobs, computational linguistics journal, computational linguistics masters, computational linguistics olympiad, computational linguistics pdf, computational linguistics phd, computational linguistics programs, computational linguistics salary

Computational linguistics Information about

Computational linguistics

  • user icon

    Computational linguistics beatiful post thanks!


Computational linguistics
Computational linguistics
Computational linguistics viewing the topic.
Computational linguistics what, Computational linguistics who, Computational linguistics explanation

There are excerpts from wikipedia on this article and video

Random Posts

Ralph Neville, 2nd Earl of Westmorland

Ralph Neville, 2nd Earl of Westmorland

Ralph Neville, 2nd Earl of Westmorland 4 April 1406 – 3 November 1484 was an English peer Content...
Mamprusi language

Mamprusi language

The Mamprusi language, Mampruli Mampelle, Ŋmampulli, is a Gur language spoken in northern Ghana by t...
Singapore Changi Airport

Singapore Changi Airport

Singapore Changi Airport IATA: SIN, ICAO: WSSS, or simply Changi Airport, is the primary civili...
Christian Siriano

Christian Siriano

Christian Siriano born November 18, 1985 is an American fashion designer and member of the Council o...