World Library  
Flag as Inappropriate
Email this Article

Outline of natural language processing

Article Id: WHEBN0037764426
Reproduction Date:

Title: Outline of natural language processing  
Author: World Heritage Encyclopedia
Language: English
Subject: History of natural language processing, List of text corpora, Outline of technology, Natural language processing
Collection: Natural Language Processing, Outlines
Publisher: World Heritage Encyclopedia

Outline of natural language processing

The following outline is provided as an overview of and topical guide to natural language processing:

Natural language processing – computer activity in which computers are entailed to analyze, understand, alter, or generate natural language. This includes the automation of any or all linguistic forms, activities, or methods of communication, such as conversation, correspondence, reading, written composition, dictation, publishing, translation, lip reading, and so on. Natural language processing is also the name of the branch of computer science, artificial intelligence, and linguistics concerned with enabling computers to engage in communication using natural language(s) in all forms, including but not limited to speech, print, writing, and signing.


  • Nature of natural language processing 1
  • Prerequisite technologies 2
  • Subfields of natural language processing 3
  • Related fields 4
  • Structures used in natural language processing 5
  • Processes of NLP 6
    • Applications 6.1
    • Component processes 6.2
      • Component processes of natural language understanding 6.2.1
      • Component processes of natural language generation 6.2.2
  • History of natural language processing 7
    • Timeline of NLP software 7.1
  • General natural language processing concepts 8
  • Natural language processing tools 9
    • Corpora 9.1
    • Natural language processing toolkits 9.2
    • Named entity recognizers 9.3
    • Translation software 9.4
    • Other software 9.5
    • Chatterbots 9.6
      • Classic chatterbots 9.6.1
      • General chatterbots 9.6.2
      • Instant messenger chatterbots 9.6.3
  • Natural language processing organizations 10
    • Natural language processing-related conferences 10.1
    • Companies involved in natural language processing 10.2
  • Natural language processing publications 11
    • Books 11.1
      • Book series 11.1.1
    • Journals 11.2
  • Persons influential in natural language processing 12
  • See also 13
  • References 14

Nature of natural language processing

Natural language processing can be described as all of the following:

  • A field of
    • An applied science – field that applies human knowledge to build or design useful things.
      • A field of computer science – scientific and practical approach to computation and its applications.
        • A branch of artificial intelligence – intelligence of machines and robots and the branch of computer science that aims to create it.
        • A subfield of computational linguistics – interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective.
    • An application of engineering – science, skill, and profession of acquiring and applying scientific, economic, social, and practical knowledge, in order to design and also build structures, machines, devices, systems, materials and processes.
      • An application of software engineering – application of a systematic, disciplined, quantifiable approach to the design, development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software.[2][3][4]
        • A subfield of computer programming – process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages (such as Java, C++, C#, Python, etc.). The purpose of programming is to create a set of instructions that computers use to perform specific operations or to exhibit desired behaviors.
  • A type of system – set of interacting or interdependent components forming an integrated whole or a set of elements (often called 'components' ) and relationships which are different from relationships of the set or its elements to other elements or sets.
    • A system that includes software – software is a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it. Software refers to one or more computer programs and data held in the storage of the computer. In other words, software is a set of programs, procedures, algorithms and its documentation concerned with the operation of a data processing system.
  • A type of computer technology – computers and their application. NLP makes use of computers, image scanners, microphones, and many types of software programs.

Prerequisite technologies

The following technologies make natural language processing possible:

Subfields of natural language processing

Related fields

Natural language processing contributes to, and makes use of (the theories, tools, and methodologies from), the following fields:

  • Automated reasoning – area of computer science and mathematical logic dedicated to understanding various aspects of reasoning, and producing software which allows computers to reason completely, or nearly completely, automatically. A sub-field of artificial intelligence, automatic reasoning is also grounded in theoretical computer science and philosophy of mind.
  • Linguistics – scientific study of human language. Natural language processing requires understanding of the structure and application of language, and therefore it draws heavily from linguistics.
    • Applied linguistics – interdisciplinary field of study that identifies, investigates, and offers solutions to language-related real-life problems. Some of the academic fields related to applied linguistics are education, linguistics, psychology, computer science, anthropology, and sociology. Some of the subfields of applied linguistics relevant to natural language processing are:
      • Bilingualism / Multilingualism
      • Computer-mediated communication (CMC) – any communicative transaction that occurs through the use of two or more networked computers.[6] Research on CMC focuses largely on the social effects of different computer-supported communication technologies. Many recent studies involve Internet-based social networking supported by social software.
      • Contrastive linguistics – practice-oriented linguistic approach that seeks to describe the differences and similarities between a pair of languages.
      • Conversation analysis (CA) – approach to the study of social interaction, embracing both verbal and non-verbal conduct, in situations of everyday life. Turn-taking is one aspect of language use that is studied by CA.
      • Discourse analysis – various approaches to analyzing written, vocal, or sign language use or any significant semiotic event.
      • Forensic linguistics – application of linguistic knowledge, methods and insights to the forensic context of law, language, crime investigation, trial, and judicial procedure.
      • Interlinguistics – study of improving communications between people of different first languages with the use of ethnic and auxiliary languages (lingua franca). For instance by use of intentional international auxiliary languages, such as Esperanto or Interlingua, or spontaneous interlanguages known as pidgin languages.
      • Language assessment – assessment of first, second or other language in the school, college, or university context; assessment of language use in the workplace; and assessment of language in the immigration, citizenship, and asylum contexts. The assessment may include analyses of listening, speaking, reading, writing or cultural understanding, with respect to understanding how the language works theoretically and the ability to use the language practically.
      • Language pedagogy – science and art of language education, including approaches and methods of language teaching and study. Natural language processing is used in programs designed to teach language, including first and second language training.
      • Language planning
      • Language policy
      • Lexicography
      • Literacies
      • Pragmatics
      • Second language acquisition
      • Stylistics
      • Translation
    • Computational linguistics – interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective. The models and tools of computational linguistics are used extensively in the field of natural language processing, and vice versa.
      • Computational semantics
      • Corpus linguistics – study of language as expressed in samples (corpora) of "real world" text. Corpora is the plural of corpus, and a corpus is a specifically selected collection of texts (or speech segments) composed of natural language. After it is constructed (gathered or composed), a corpus is analyzed with the methods of computational linguistics to infer the meaning and context of its components (words, phrases, and sentences), and the relationships between them. Optionally, a corpus can be annotated ("tagged") with data (manually or automatically) to make the corpus easier to understand (e.g., part-of-speech tagging). This data is then applied to make sense of user input, for example, to make better (automated) guesses of what people are talking about or saying, perhaps to achieve more narrowly focused web searches, or for speech recognition.
    • Metalinguistics
    • Sign linguistics – scientific study and analysis of natural sign languages, their features, their structure (phonology, morphology, syntax, and semantics), their acquisition (as a primary or secondary language), how they develop independently of other languages, their application in communication, their relationships to other languages (including spoken languages), and many other aspects.
  • Human–computer interaction – the intersection of computer science and behavioral sciences, this field involves the study, planning, and design of the interaction between people (users) and computers. Attention to human-machine interaction is important, because poorly designed human-machine interfaces can lead to many unexpected problems. A classic example of this is the Three Mile Island accident where investigations concluded that the design of the human–machine interface was at least partially responsible for the disaster.
  • Information retrieval (IR) – field concerned with storing, searching and retrieving information. It is a separate field within computer science (closer to databases), but IR relies on some NLP methods (for example, stemming). Some current research and applications seek to bridge the gap between IR and NLP.
  • Knowledge representation (KR) – area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge. Knowledge Representation research involves analysis of how to reason accurately and effectively and how best to use a set of symbols to represent a set of facts within a knowledge domain.
  • Machine learning

Structures used in natural language processing

  • Corpus – body of data, optionally tagged (for example, through part-of-speech tagging), providing real world samples for analysis and comparison.
    • Text corpus – large and structured set of texts, nowadays usually electronically stored and processed. They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific subject (or domain).
    • Speech corpus – database of speech audio files and text transcriptions. In Speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition engine). In Linguistics, spoken corpora are used to do research into phonetic, conversation analysis, dialectology and other fields.
  • Ontology – formal representation of a set of concepts within a domain and the relationships between those concepts.
    • Taxonomy – practice and science of classification, including the principles underlying classification, and the methods of classifying things or concepts.
      • Hyponymy and hypernymy – the linguistics of hyponyms and hypernyms. A hyponym shares a type-of relationship with its hypernym. For example, pigeon, crow, eagle and seagull are all hyponyms of bird (their hypernym); which, in turn, is a hyponym of animal.
      • Taxonomy for search engines – typically called a "taxonomy of entities". It is a tree in which nodes are labelled with entities which are expected to occur in a web search query. These trees are used to match keywords from a search query with the keywords from relevant answers (or snippets).

Processes of NLP


  • Automated essay scoring (AES) – the use of specialized computer programs to assign grades to essays written in an educational setting. It is a method of educational assessment and an application of natural language processing. Its objective is to classify a large set of textual entities into a small number of discrete categories, corresponding to the possible grades—for example, the numbers 1 to 6. Therefore, it can be considered a problem of statistical classification.
  • Automatic summarization – process of reducing a text document with a computer program in order to create a summary that retains the most important points of the original document. Often used to provide summaries of text of a known type, such as articles in the financial section of a newspaper.
    • Types
    • Methods and techniques
      • Extraction-based summarization –
      • Abstraction-based summarization –
      • Maximum entropy-based summarization –
      • Sentence extraction
      • Aided summarization –
        • Human aided machine summarization (HAMS) –
        • Machine aided human summarization (MAHS) –
  • Coreference resolution – in order to derive the correct interpretation of text, or even to estimate the relative importance of various mentioned subjects, pronouns and other referring expressions need to be connected to the right individuals or objects. Given a sentence or larger chunk of text, coreference resolution determines which words ("mentions") refer to which objects ("entities") included in the text.
    • Anaphora resolution – concerned with matching up pronouns with the nouns or names that they refer to. For example, in a sentence such as "He entered John's house through the front door", "the front door" is a referring expression and the bridging relationship to be identified is the fact that the door being referred to is the front door of John's house (rather than of some other structure that might also be referred to).
  • Dialog system
  • Grammar checking – the act of verifying the grammatical correctness of written text, especially if this act is performed by a computer program.
  • Information retrieval
  • Machine translation (MT) – aims to automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "AI-complete", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) in order to solve properly.
  • Natural language programming – interpreting and compiling instructions communicated in natural language into computer instructions (machine code).
  • Natural language search
  • Optical character recognition (OCR) – given an image representing printed text, determine the corresponding text.
  • Question answering – given a human-language question, determine its answer. Typical questions have a specific right answer (such as "What is the capital of Canada?"), but sometimes open-ended questions are also considered (such as "What is the meaning of life?").
  • Sentiment analysis – extracts subjective information usually from a set of documents, often using online reviews to determine "polarity" about specific objects. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing.
  • Speech recognition – given a sound clip of a person or people speaking, determine the textual representation of the speech. This is the opposite of text to speech and is one of the extremely difficult problems colloquially termed "AI-complete" (see above). In natural speech there are hardly any pauses between successive words, and thus speech segmentation is a necessary subtask of speech recognition (see below). Note also that in most spoken languages, the sounds representing successive letters blend into each other in a process termed coarticulation, so the conversion of the analog signal to discrete characters can be a very difficult process.
  • Speech synthesis (Text-to-speech) –
  • Text-proofing
  • Text simplification – automated editing a document to include fewer words, or use easier words, while retaining its underlying meaning and information.

Component processes

Component processes of natural language understanding

  • Automatic document classification (text categorization) –
  • Compound term processing – category of techniques that identify compound terms and match them to their definitions. Compound terms are built by combining two (or more) simple terms, for example "triple" is a single word term but "triple heart bypass" is a compound term.
  • Automatic taxonomy induction
  • Corpus processing –
  • Deep linguistic processing
  • Discourse analysis – includes a number of related tasks. One task is identifying the discourse structure of connected text, i.e. the nature of the discourse relationships between sentences (e.g. elaboration, explanation, contrast). Another possible task is recognizing and classifying the speech acts in a chunk of text (e.g. yes-no questions, content questions, statements, assertions, orders, suggestions, etc.).
  • Information extraction
    • Text mining – process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning.
      • Biomedical text mining – (also known as BioNLP), this is text mining applied to texts and literature of the biomedical and molecular biology domain. It is a rather recent research field drawing elements from natural language processing, bioinformatics, medical informatics and computational linguistics. There is an increasing interest in text mining and information extraction strategies applied to the biomedical and molecular biology literature due to the increasing number of electronically available publications stored in databases such as PubMed.
      • Decision tree learning
      • Sentence extraction
    • Terminology extraction
  • Latent semantic indexing
  • Lemmatisation
  • Morphological segmentation – separates words into individual morphemes and identifies the class of the morphemes. The difficulty of this task depends greatly on the complexity of the morphology (i.e. the structure of words) of the language being considered. English has fairly simple morphology, especially inflectional morphology, and thus it is often possible to ignore this task entirely and simply model all possible forms of a word (e.g. "open, opens, opened, opening") as separate words. In languages such as Turkish, however, such an approach is not possible, as each dictionary entry has thousands of possible word forms.
  • capitalization can aid in recognizing named entities in languages such as English, this information cannot aid in determining the type of named entity, and in any case is often inaccurate or insufficient. For example, the first word of a sentence is also capitalized, and named entities often span several words, only some of which are capitalized. Furthermore, many other languages in non-Western scripts (e.g. Chinese or Arabic) do not have any capitalization at all, and even languages with capitalization may not consistently use it to distinguish names. For example, German capitalizes all nouns, regardless of whether they refer to names, and French and Spanish do not capitalize names that serve as adjectives.
  • Parsing – determines the parse tree (grammatical analysis) of a given sentence. The grammar for natural languages is ambiguous and typical sentences have multiple possible analyses. In fact, perhaps surprisingly, for a typical sentence there may be thousands of potential parses (most of which will seem completely nonsensical to a human).
  • Part-of-speech tagging – given a sentence, determines the part of speech for each word. Many words, especially common ones, can serve as multiple parts of speech. For example, "book" can be a noun ("the book on the table") or verb ("to book a flight"); "set" can be a noun, verb or adjective; and "out" can be any of at least five different parts of speech. Note that some languages have more such ambiguity than others. Languages with little inflectional morphology, such as English are particularly prone to such ambiguity. Chinese is prone to such ambiguity because it is a tonal language during verbalization. Such inflection is not readily conveyed via the entities employed within the orthography to convey intended meaning.
  • Query expansion
  • Relationship extraction – given a chunk of text, identifies the relationships among named entities (e.g. who is the wife of whom).
  • Sentence breaking (also known as sentence boundary disambiguation and sentence detection) – given a chunk of text, finds the sentence boundaries. Sentence boundaries are often marked by periods or other punctuation marks, but these same characters can serve other purposes (e.g. marking abbreviations).
  • Speech segmentation – given a sound clip of a person or people speaking, separates it into words. A subtask of speech recognition and typically grouped with it.
  • Stemming
  • Text chunking
  • Tokenization
  • Topic segmentation and recognition – given a chunk of text, separates it into segments each of which is devoted to a topic, and identifies the topic of the segment.
  • Truecasing
  • Word segmentation – separates a chunk of continuous text into separate words. For a language like English, this is fairly trivial, since words are usually separated by spaces. However, some written languages like Chinese, Japanese and Thai do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the vocabulary and morphology of words in the language.
  • Word sense disambiguation – because many words have more than one meaning, word sense disambiguation is used to select the meaning which makes the most sense in context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or from an online resource such as WordNet.

Component processes of natural language generation

History of natural language processing

History of natural language processing

  • History of machine translation
  • History of automated essay scoring
  • History of natural language user interface
  • History of natural language understanding
  • History of optical character recognition
  • History of question answering
  • History of speech synthesis
  • Turing test – test of a machine's ability to exhibit intelligent behavior, equivalent to or indistinguishable from, that of an actual human. In the original illustrative example, a human judge engages in a natural language conversation with a human and a machine designed to generate performance indistinguishable from that of a human being. All participants are separated from one another. If the judge cannot reliably tell the machine from the human, the machine is said to have passed the test. The test was introduced by Alan Turing in his 1950 paper "Computing Machinery and Intelligence," which opens with the words: "I propose to consider the question, 'Can machines think?'"
  • Universal grammar – theory in linguistics, usually credited to Noam Chomsky, proposing that the ability to learn grammar is hard-wired into the brain.[8] The theory suggests that linguistic ability manifests itself without being taught (see poverty of the stimulus), and that there are properties that all natural human languages share. It is a matter of observation and experimentation to determine precisely what abilities are innate and what properties are shared by all languages.
  • ALPAC – was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular. Its report, issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasizing the need for basic research in computational linguistics; this eventually caused the U. S. Government to reduce its funding of the topic dramatically.
  • Conceptual dependency theory – a model of natural language understanding used in artificial intelligence systems. Roger Schank at Stanford University introduced the model in 1969, in the early days of artificial intelligence.[9] This model was extensively used by Schank's students at Yale University such as Robert Wilensky, Wendy Lehnert, and Janet Kolodner.
  • Augmented transition network – type of graph theoretic structure used in the operational definition of formal languages, used especially in parsing relatively complex natural languages, and having wide application in artificial intelligence. Introduced by William A. Woods in 1970.
  • Distributed Language Translation (project) –

Timeline of NLP software

Software  Year   Creator Description Reference
Georgetown experiment 1954 IBM involved fully automatic translation of more than sixty Russian sentences into English.
STUDENT 1964 Daniel Bobrow could solve high school algebra word problems.[10]
ELIZA 1964 Joseph Weizenbaum a simulation of a Rogerian psychotherapist, rephrasing her response with a few grammar rules.[11]
SHRDLU 1970 Terry Winograd a natural language system working in restricted "blocks worlds" with restricted vocabularies, worked extremely well
PARRY 1972 Kenneth Colby A chatterbot
KL-ONE 1974 Sondheimer et al. a knowledge representation system in the tradition of semantic networks and frames; it is a frame language.
MARGIE 1975 Roger Schank
TaleSpin (software) 1976 Meehan
QUALM Lehnert
LIFER/LADDER 1978 Hendrix a natural language interface to a database of information about US Navy ships.
SAM (software) 1978 Cullingford
PAM (software) 1978 Robert Wilensky
Politics (software) 1979 Carbonell
Plot Units (software) 1981 Lehnert
Jabberwacky 1982 Rollo Carpenter chatterbot with stated aim to "simulate natural human chat in an interesting, entertaining and humorous manner".
MUMBLE (software) 1982 McDonald
Racter 1983 William Chamberlain and Thomas Etter chatterbot that generated English language prose at random.
MOPTRANS 1984 Lytinen
KODIAK (software) 1986 Wilensky
Absity (software) 1987 Hirst
Watson (artificial intelligence software) 2006 IBM A question answering system that won the Jeopardy! contest, defeating the best human players in February 2011.

General natural language processing concepts

Natural language processing tools

  • Google Ngram Viewer – graphs n-gram usage from a corpus of more than 5.2 million books
    • Google Ngram datasets – can be used for looking up the frequency of an n-gram, for comparing the frequency of n-grams, etc.


Natural language processing toolkits

The following natural language processing toolkits are popular collections of natural language processing software. They are suites of libraries, frameworks, and applications for symbolic, statistical natural language and speech processing.

Name Language License Creators Website
Apertium C++, Java GPL (various) [1]
DELPH-IN LISP, C++ LGPL, MIT, ... Deep Linguistic Processing with HPSG Initiative [2]
Distinguo C++ Commercial Ultralingua Inc. [3]
General Architecture for Text Engineering (GATE) Java LGPL GATE open source community [4]
Gensim Python LGPL Radim Řehůřek [5]
Learning Based Java Java BSD Cognitive Computation Group at the University of Illinois [6]
LingPipe Java Free for research Alias-I, Inc., NY,USA [7]
LinguaStream Java Free for research University of Caen, France [8]
Mallet Java Common Public License University of Massachusetts Amherst [9]
Modular Audio Recognition Framework Java BSD The MARF Research and Development Group, Concordia University [10]
MontyLingua Python, Java Free for research MIT [11]
Mwetoolkit Python Free Carlos Ramisch [12]
Natural Language Toolkit (NLTK) Python Apache 2.0 [13]
NooJ (based on INTEX) .NET Framework-based Free for research University of Franche-Comté, France [14]
Apache OpenNLP Java Apache License 2.0 Online community [15]
UIMA Java / C++ Apache 2.0 Apache [16]

Named entity recognizers

  • ABNER (A Biomedical Named Entity Recognizer) – open source text mining program that uses linear-chain conditional random fields. It automatically tags genes, proteins and other entity names in text. Written by Burr Settles of the University of Wisconsin-Madison.

Translation software

Other software

  • CTAKES – open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated. Also known as Apache cTAKES.
  • DMAP
  • ETAP-3 – proprietary linguistic processing system focusing on English and Russian.[12] It is a rule-based system which uses the Meaning-Text Theory as its theoretical foundation.
  • Iris – personal assistant application for Android. The application uses natural language processing to answer questions based on user voice request.
  • JAPE – the Java Annotation Patterns Engine, a component of the open-source General Architecture for Text Engineering (GATE) platform. JAPE is a finite state transducer that operates over annotations based on regular expressions.
  • LOLITA – "Large-scale, Object-based, Linguistic Interactor, Translator and Analyzer". LOLITA was developed by Roberto Garigliano and colleagues between 1986 and 2000. It was designed as a general-purpose tool for processing unrestricted text that could be the basis of a wide variety of applications. At its core was a semantic network containing some 90,000 interlinked concepts.
  • Maluuba – intelligent personal assistant for Android devices, that uses a contextual approach to search which takes into account the user's geographic location, contacts, and language.
  • METAL MT – machine translation system developed in the 1980s at the University of Texas and at Siemens which ran on Lisp Machines.
  • Never-Ending Language Learning – semantic machine learning system developed by a research team at Carnegie Mellon University, and supported by grants from DARPA, Google, and the NSF, with portions of the system running on a supercomputing cluster provided by Yahoo!.[13] NELL was programmed by its developers to be able to identify a basic set of fundamental semantic relationships between a few hundred predefined categories of data, such as cities, companies, emotions and sports teams. Since the beginning of 2010, the Carnegie Mellon research team has been running NELL around the clock, sifting through hundreds of millions of web pages looking for connections between the information it already knows and what it finds through its search process – to make new connections in a manner that is intended to mimic the way humans learn new information.[14]


Chatterbot – text-based conversation agent that can interact with human users through some medium, such as an instant message service. Some chatterbots are designed for specific purposes, while others converse with human users on a wide range of topics.

Classic chatterbots

General chatterbots

Instant messenger chatterbots

Natural language processing organizations

Natural language processing-related conferences

Companies involved in natural language processing

  • Google, Inc. – the Google search engine is an example of automatic summarization, utilizing keyphrase extraction.
  • NetBase Solutions, Inc. – developer of natural language processing technology.
  • Calais (Reuters product) – provider of a natural language processing services.
  • AlchemyAPI – service provider of a natural language processing API.

Natural language processing publications


  • Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics – by Daniel Jurafsky and James H. Martin.[18] First book to thoroughly cover language technology.

Book series


Persons influential in natural language processing

  • Daniel Bobrow
  • Rollo Carpenter – creator of Jabberwacky and Cleverbot.
  • Noam Chomsky – author of the seminal work Syntactic Structures, which revolutionized Linguistics with 'universal grammar', a rule based system of syntactic structures.[19]
  • Kenneth Colby
  • David Ferrucci – principal investigator of the team that created Watson, IBM's AI computer that won the quiz show Jeopardy!
  • Daniel Jurafsky – Professor of Linguistics and Computer Science at Stanford University. With James Martin, he wrote the textbook Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics
  • Roger Schank – introduced the conceptual dependency theory for natural language understanding.[20]
  • Alan Turing – originator of the Turing Test.
  • Joseph Weizenbaum – author of the ELIZA chatterbot.
  • Terry Winograd – professor of computer science at Stanford University, and co-director of the Stanford Human-Computer Interaction Group. He is known within the philosophy of mind and artificial intelligence fields for his work on natural language using the SHRDLU program.
  • William Aaron Woods
  • Maurice Gross – author of the concept of local grammar,[21] taking finite automata as the competence model of language.[22] Local grammars consisting of finite automata, coupled with morpho-syntactic dictionaries support automatic text analysis[21][23] by Intex software (now NooJ) developed by Max Silberztein and by Unitex/GramLab developed by the Gaspard-Monge Computer Science Laboratory (LIGM).

See also


  1. ^ "... modern science is a discovery as well as an invention. It was a discovery that nature generally acts regularly enough to be described by laws and even by mathematics; and required invention to devise the techniques, abstractions, apparatus, and organization for exhibiting the regularities and securing their law-like descriptions." —p.vii, J. L. Heilbron, (2003, editor-in-chief) The Oxford Companion to the History of Modern Science New York: Oxford University Press ISBN 0-19-511229-6
    • "science". Merriam-Webster Online Dictionary.  
  2. ^  
  3. ^ ACM (2006). "Computing Degrees & Careers". ACM. Retrieved 2010-11-23. 
  4. ^ Laplante, Phillip (2007). What Every Engineer Should Know about Software Engineering. Boca Raton: CRC.  
  5. ^ Input device Computer Hope
  6. ^ McQuail, Denis. (2005). Mcquail's Mass Communication Theory. 5th ed. London: SAGE Publications.
  7. ^ Yucong Duan, Christophe Cruz (2011), [http –// Formalizing Semantic of Natural Language through Conceptualization from Existence]. International Journal of Innovation, Management and Technology(2011) 2 (1), pp. 37-42.
  8. ^ McGill University, Tool Module: Chomsky’s Universal Grammar
  9. ^ Roger Schank, 1969, A conceptual dependency parser for natural language Proceedings of the 1969 conference on Computational linguistics, Sång-Säby, Sweden pages 1-3
  10. ^ McCorduck 2004, p. 286, Crevier 1993, pp. 76−79, Russell & Norvig 2003, p. 19
  11. ^ McCorduck 2004, pp. 291–296, Crevier 1993, pp. 134−139
  13. ^ "Aiming to Learn as We Do, a Machine Teaches Itself".  
  14. ^ Project Overview, Carnegie Mellon University. Accessed October 5, 2010.
  15. ^ Gibes, Al (2002-03-25). "Circle of buddies grows ever wider". Las Vegas Review-Journal (Nevada). 
  16. ^ "ActiveBuddy Introduces Software to Create and Deploy Interactive Agents for Text Messaging; ActiveBuddy Developer Site Now Open:". Business Wire. 2002-07-15. Retrieved 2014-01-16. 
  17. ^ Lenzo, Kevin (Summer 1998). "Infobots and Purl". The Perl Journal 3 (2). Retrieved 2010-07-26. 
  18. ^ Jurafsky, James; James H. Martin (2008). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (2nd ed.). Upper Saddle River (N.J.): Prentice Hall. p. 2. 
  19. ^ "SEM1A5 - Part 1 - A brief history of NLP". Retrieved 2010-06-25. 
  20. ^ Roger Schank, 1969, A conceptual dependency parser for natural language Proceedings of the 1969 conference on Computational linguistics, Sång-Säby, Sweden, pages 1-3
  21. ^ a b 34.HermèsIbrahim, Amr Helmy. 2002. "Maurice Gross (1934-2001). À la mémoire de Maurice Gross".
  22. ^ .Maurice Gross Memorial LetterDougherty, Ray. 2001.
  23. ^ 46:1, pp. 145-158.Travaux de linguistiqueLamiroy, Béatrice. 2003. « In memoriam Maurice Gross »,
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.