NLP Resources

NLP systems often require access to large amounts of multilingual training data, lexicons, ontologies, knowledge bases, etc. 听On this page we鈥檝e made a start at listing useful resources, many developed here.

Associated Publications

LiLT

Linguistic Issues in Language Technology (LiLT) is a new open-access journal that focusses on relationships between linguistic insights, which can prove valuable to language technology, and language technology, which can enrich linguistic research. The Editorial Board of LiLT believes that, in conjunction with machine learning and statistical techniques, deeper and more sophisticated models of language and speech are needed to make significant progress in newly emerging areas of computational language analysis. LiLT provides a forum for such work. LiLT takes an eclectic view on methodology.

听

ACL Anthology

The ACL Anthology contains open access links to all of the ACL related journals and conferences for our field, including the Computational Linguistics Journal, Transactions of the ACL and all of the ACL and LRE related conferences. Several other relevant publications are hosted there as well.

听

NLP Corpora

Sketch Engine

Sketch Engine is a corpus manager and text analysis software that was developed by Lexical Computing Limited in 2003 and is regularly updated. Its purpose is to enable people studying language behavior to search large text collections according to complex and linguistically motivated queries.

听

LDC-Corpora

Colorado has been a member of the Linguistic Data Consortium for decades, and we have accumulated a fair amount of data from that source. 听Here are links to frequently requested datasets that are available on a local server here at CU. 听You will need a 鈥榲erbs鈥� account to access this data. 听To access the discs in the LDC library, contact Ghazaleh Kazeminejad.

听

Computational Lexical Resources

PropBank

The Proposition Bank (PropBank) is first a valency lexicon consisting of sense-specific argument structures for 听well over 6000 verb lemmas. Second it comprises the millions of words of annotated text data that associates those predicate argument structures with the syntactic trees of sentences in context (Kingsbury & Palmer, 2002; Palmer, et. al., 2005; Palmer, et.al., 2010)

听

VerbNet听

VerbNet (VN) is a large, hierarchical, domain-independent broad-coverage verb lexicon that is intended for use in Natural Language Processing applications (Dang, et. al., 1998, Kipper, et. al., 2000, Kipper Schuler, 2006). 听It groups semantically similar verbs into classes and provides syntactic realizations, thematic roles and pre-conditions and post-conditions in first order logic as semantic representations for every class.听

听

Unified Verb Index

There are several computational lexical resources that are frequently incorporated into Natural Language Processing systems. 听Several of these are hosted here at CU, including听听and听. 听Additional popular resources are听,听,听听and听. The more coarse-grained groupings of WordNet verb senses known as the听听were also developed at CU. The 麻豆免费版下载Unified Verb Index web site facilitates searching through all of these resources for individual lexical items.

听

CompSem Wiki

This wiki page has information about the Computational Semantics lab meetings that are held every Wednesday in Fleming 279 at 10:30am. 听It also contains a link to the听.听

麻豆免费版下载

Search

Other ways to search:

NLP Resources