Welcome to pyliwc
Overview
PyLIWC is a Python package designed to provide an interface for analyzing text using the LIWC (Linguistic Inquiry and Word Count) tool. This package allows users to interact with the LIWC CLI from within Python, offering features for processing various data formats, performing linguistic style matching, and analyzing narrative arcs in text data. It can handle folders, text files or Pandas dataframes.
As the LIWC dictionary is proprietary software, this requires that you have installed the latest version of the LIWC software on your machine, with an activated licence (academic licence).
Manifest
The LIWC (Linguistic Inquiry and Word Count) software by James W. Pennebaker, Roger J. Booth, and Martha E. Francis has been instrumental for countless researchers in analyzing linguistic and psycholinguistic data. Linguistic Inquiry and Word Count (LIWC) is the gold standard of dictionary-based approaches for analyzing word use. It can be used to study a single individual, groups of people over time, or all of social media.However, LIWC has traditionally been available through software, necessitating the usage of an outside software to the Python environement.
Recognizing the growing popularity of Python in the scientific and research community, there is an important need for researchers and data scientists for integrating LIWC directly into their Python workflows. Thus, pyliwc brings (many of) the functionality of LIWC to a wider audience without the need use the LIWC application as GUI. pyliwc is open-source, released under the MIT license, and is designed to enable researchers to perform sophisticated linguistic analysis directly in Python.
Features
The package offers a wide range of features, including:
- LIWC Text Analysis:
- Analyze text data from various sources, including CSV files, directories, Pandas DataFrames, and individual strings.
- Supports internal dictionaries (e.g., LIWC22, LIWC2015) as well as adhoc dictionaries
- Output results directly in a convenient Pandas DataFrame for easy integration with other data processing tools.
- Linguistic Style Matching (LSM):
- Perform person and group-level LSM analysis using a DataFrame to evaluate the alignment of linguistic styles in conversational data.
- Supports pairwise LSM calculations for detailed analysis of interpersonal communication dynamics.
- Narrative Arc Analysis:
- Analyze the narrative arc of text data to understand staging, progression, and cognitive tension, offering deep insights into storytelling elements.
- Graphics capabilities are included, allowing users to visualize narrative structures: staging, plot progression, and cognitive tension over time.
- Provides customizable scaling methods and segment options for precise control over the analysis process.
Integration with LIWC CLI: Seamlessly execute LIWC commands and capture output for further processing, leveraging the full power of LIWC’s linguistic analysis capabilities. Multithreading support for improved performance and faster analysis across large datasets.
Output Options: Flexible output formats, including CSV, JSON, and direct integration with Pandas DataFrames, ensuring compatibility with a wide range of data analysis workflows.
Missing Features: In the current version, the following features have not yet been ported : Word frequencies, Meaning extraction and Contextualizer.