|Content Coding|Linguistic Inquiry and Word Count (LIWC) [link]
LIWC is the gold standard in psychological text analysis software. Codes languages samples for ~80 psychological dimensions.
RIOTLite / RIOTScan [link]
open-source content coding software. RIOTLite works pretty much the
same way as LIWC, but it does not come with the LIWC dictionary. Has a
bit more flexibility in terms of phrases and wildcards.
coding system that allows you to add weights to your words. Can be used
to code words or specific characters for whatever word properties are
of interest to you. Can also be used to score texts using pre-trained
word vector models (e.g., GloVe, word2vec).
Under development. More details coming soon...
Topic Modeling / Data-driven Text Analysis
Meaning Extraction Helper [link]
entire system for conducting bottom-up, data-driven text analyses. MEH
takes your input texts and provides frequency lists for all of your
words/phrases, extracts n-grams, and builds a document-by-word matrix
dataset for topic modeling and other types of analyses.
VADER-Tots [link]Sentiment analysis based on
Hutto & Gilbert's (2014) VADER system. Best used for sentiment analysis of Twitter data.
Senti-Gent [link]Sentiment analysis using Stanford's CoreNLP framework.
Part of Speech Tagging
POSTModern [link]Part of Speech tagger build around Stanford's CoreNLP framework. Comes with several pre-trained models, including E
Spanish, French German, Swedish, Chinese, and Arabic. Also comes with
the GATE pre-trained model for English Twitter data.
KoToken [link]Preprocesses your Korean texts by tokenizing them. ZhToken [link]Preprocesses your Chinese texts by tokenizing them.
Text Manipulation / Extraction
words and their immediate context. For example, if you want to see how
people are using the word "pain", you can "contextualize" them by
extracting all words that appear in close proximity to the word "pain".
ConverSplitter Plus! [link]Separates the contents of transcripts into separate files, by speaker.
Repeatalizer [link]Measures the repetition within a text in a "rolling word window" fashion.
pre-trained word vectors, you can extract words with similar meanings.
Very useful/helpful for creating new text analysis / content coding
dictionaries. Can also be used in conjunction with the TAPA software
(mentioned above) to perform cosine similarity calculations between a
text and specific domains.
Data Preparation / Cleaning
Takes your text from a spreadsheet file (e.g., CSV) and aggregates it into separate .txt files.
ExamineTXT [link]Provides basic information about your text and text-based files, such as their size, encodings, and so on. TranscodeTXT [link]Convert text-based files from one encoding to another. Best used in conjunction with ExamineTXT. TextEmend [link]Regex-driven
"find and replace" in text-based files. Useful for cleaning and
replacing texts prior to processing with other software. SlimCSV [link]Strips columns out of a CSV file to result in smaller/more manageable datasets. Transmogrifier [link]Process your texts through the Google Translate API.
Plug N Chug [link]Recursive code generator. Useful for when you need to generate large batches of code with systematic variations.