- English Gigaword corpora: Seven distinct international sources of English newswire.
- (1010 files, 26348MB, 9325Gzip-MB, 4032686 words, 9876086 Docs)
- Movie-DiC: a Movie Dialogue Corpus for Research and Development
- 619 movie scripts parsed from IMSDb
- MPQA Lexicon, Opinion Finder by Wilson et., al ‘2005(a,b)
- Semantic Orientation Lexicon by Takamura et., al 2005
- JEmAS – Jena Emotion Analysis System
- JEmAS is an open source command line tool for measuring the emotional content of a textual document of arbitrary length. It employs a simple bag-of-words and lexicon-based approach. It follows the psychological Valence-Arousal-Dominance model of emotion so that an emotion will be represented as three-dimensional vector of numerical values. The elements of this emotion vector refer to Valence (the degree of pleasentness or unpleasentness of an emotion), Arousal (degree of calmness or excitement), and Dominance (the degree of perceived control ranging from submissive to dominant).
- Bachdel Test (Popular in cognitive psychology and sentiment & opinion analysis)
- The Bechdel test (/ˈbɛkdəl/ bek-dəl) asks whether a work of fiction features at least two women or girls who talk to each other about something other than a man or boy. The requirement that the two women or girls must be named is sometimes added. About half of all films meet these requirements, according to user-edited databases and the media industry press. The test is used as an indicator for the active presence of women in films and other fiction, and to call attention to gender inequality in fiction.
- Speech Act Verbs by Wierzbicka, 1987
- Book NLP Pipeline BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including:
Part-of-speech tagging (Stanford)
Dependency parsing (MaltParser)
Named entity recognition (Stanford)
Character name clustering (e.g., “Tom”, “Tom Sawyer”, “Mr. Sawyer”, “Thomas Sawyer” -> TOM_SAWYER)
Quotation speaker identification
Pronominal coreference resolution
- Semafor SEMAFOR automatically processes English sentences according to the form of semantic analysis in Berkeley FrameNet.