# liwc = Liwc('LIWC-22-cli.exe', verbose=True)
# s = "As Leclerc entered the Invalides, with his cortege of exaltation in the sun of Africa and the battles of Alsace, enter here, Jean Moulin, with your terrible cortege."
# r = liwc.analyze_string_to_json(s)API reference
The
LIWC module provides a Python interface to the Linguistic Inquiry and Word Count (LIWC) tool, allowing users to perform text analysis.
Initialization
Liwc
Liwc (liwc_cli_path:str='LIWC-22-cli', threads:Optional[int]=None, verbose:bool=False)
Initialize the LIWC Class.
| Type | Default | Details | |
|---|---|---|---|
| liwc_cli_path | str | LIWC-22-cli | LIWC CLI Path. |
| threads | Optional | None | Number of threads to use. Defaults to the number of CPU cores minus one. |
| verbose | bool | False | Display printing and progress bar. Defaults to False. |
LIWC Analysis
Liwc.analyze_string_to_json
Liwc.analyze_string_to_json (input_string:str, liwc_dict:str='LIWC22')
*Analyze a single string and return the result as JSON.
Returns: dict:*
| Type | Default | Details | |
|---|---|---|---|
| input_string | str | The string to analyze. | |
| liwc_dict | str | LIWC22 | Dictionary to use for analysis. Defaults to “LIWC22”. |
| Returns | dict | Analysis results in JSON format. |
Liwc.analyze_string
Liwc.analyze_string (input_string:str, output_location:str, liwc_dict:str='LIWC22')
Analyze a single string using LIWC and save to csv.
| Type | Default | Details | |
|---|---|---|---|
| input_string | str | The string to analyze. | |
| output_location | str | Path to save the analysis output (.csv). | |
| liwc_dict | str | LIWC22 | |
| Returns | None | Dictionary to use for analysis. Defaults to “LIWC22”. |
Liwc.analyze_df
Liwc.analyze_df (text:pandas.core.series.Series, return_input:bool=False, liwc_dict:str='LIWC22')
Analyze text data from a DataFrame using LIWC.
| Type | Default | Details | |
|---|---|---|---|
| text | Series | Pandas Series containing text data. | |
| return_input | bool | False | Whether to return the input text with the output. Defaults to False. |
| liwc_dict | str | LIWC22 | Dictionary to use for analysis. Defaults to “LIWC22”. |
| Returns | DataFrame | pd.DataFrame: DataFrame containing the analysis results. |
Liwc.analyze_folder
Liwc.analyze_folder (input_folder:str, output_location:str, liwc_dict:str='LIWC22')
Analyze all text files in a folder using LIWC.
| Type | Default | Details | |
|---|---|---|---|
| input_folder | str | Path to the folder containing text files. | |
| output_location | str | Path to save the analysis output. | |
| liwc_dict | str | LIWC22 | |
| Returns | None | Dictionary to use for analysis. Defaults to “LIWC22”. |
Liwc.analyze_csv
Liwc.analyze_csv (input_file:str, output_location:str, row_id_indices:str, column_indices:str, liwc_dict:str='LIWC22')
Analyze text data from a CSV file using LIWC.
| Type | Default | Details | |
|---|---|---|---|
| input_file | str | Path to the input CSV file. | |
| output_location | str | Path to save the analysis output. | |
| row_id_indices | str | Indices of row IDs in the CSV. | |
| column_indices | str | Indices of text columns in the CSV. | |
| liwc_dict | str | LIWC22 | |
| Returns | None | Dictionary to use for analysis. Defaults to “LIWC22”. |
# desired_keys = ['WC', 'Analytic', 'Clout', 'Authentic', 'Tone']
# filtered_dict = {key: r[key] for key in desired_keys if key in r}
# print(filtered_dict)Language Style Matching
Liwc.analyze_lsm
Liwc.analyze_lsm (df:pandas.core.frame.DataFrame, calculate_lsm:str='person-and-group', group_column:str='GroupID', person_column:str='PersonID', text_column:str='Text', output_type:str='pairwise', expanded_output:bool=False, omit_speakers_number_of_turns:int=0, omit_speakers_word_count:int=10, segmentation:str='none', wsl_mode:bool=True)
Analyzes Linguistic Style Matching (LSM) based on the provided DataFrame.
| Type | Default | Details | |
|---|---|---|---|
| df | DataFrame | Input DataFrame containing the text data to be analyzed. | |
| calculate_lsm | str | person-and-group | Sets the type of LSM calculation. Options are: - “person”: Calculate only person-level LSM. - “group”: Calculate only group-level LSM. - “person-and-group”: Calculate both person and group-level LSM. Default is “person-and-group”. |
| group_column | str | GroupID | The column name in df representing the Group ID. Default is ‘GroupID’. |
| person_column | str | PersonID | The column name in df representing the Person ID. Default is ‘PersonID’. |
| text_column | str | Text | The column name in df representing the text data. Default is ‘Text’. |
| output_type | str | pairwise | Sets the type of output. Options are: - “one-to-many”: One-to-many comparison. - “pairwise”: Pairwise comparison. Default is “pairwise”. |
| expanded_output | bool | False | Adds an option to get an expanded LSM output. Default is False. |
| omit_speakers_number_of_turns | int | 0 | |
| omit_speakers_word_count | int | 10 | Omit speakers if the word count is less than this value. Default is 10. |
| segmentation | str | none | Segmentation options for splitting the text. Options are: - “none”: No segmentation. - “not= - “nofst= - “nofwc= - “now= - “boc= - “regexp= Default is “none”. |
| wsl_mode | bool | True | Whether to convert paths for WSL. Defaults to True. |
| Returns | Union | The resulting LSM analysis. The output format depends on the specified output_format. |
Narrative arc
Liwc.plot_narrative_arc
Liwc.plot_narrative_arc (df:pandas.core.frame.DataFrame, legend_labels:list=None)
Plots the narrative arc for the given DataFrame, showing Staging, Plot Progression, and Cognitive Tension.
| Type | Default | Details | |
|---|---|---|---|
| df | DataFrame | Input DataFrame containing the narrative arc data. Note: ‘output_individual_data_points=True’ in narrative_arc to get all required data to plot the narractive arc. |
|
| legend_labels | list | None | List of labels for the legend, corresponding to each row in the DataFrame. |
| Returns | Figure | The resulting plot figure of the narrative arcs. |
Liwc.narrative_arc
Liwc.narrative_arc (df:pandas.core.frame.DataFrame, column_names:Optional[list]=None, output_individual_data_points:bool=True, scaling_method:str='0-100', segments_number:int=5, skip_wc:int=10)
Analyzes the narrative arc of text data based on the provided DataFrame.
| Type | Default | Details | |
|---|---|---|---|
| df | DataFrame | Input DataFrame containing the text data to be analyzed. | |
| column_names | Optional | None | List of column names in df that should be processed. If None, all columns are processed. Default is None. |
| output_individual_data_points | bool | True | If True, outputs individual data points for each segment. If False, aggregates the data. Default is True. |
| scaling_method | str | 0-100 | Method for scaling the data. Options are: - “0-100”: Scale values between 0 and 100. - “Z-score”: Scale values using Z-score normalization. Default is “0-100”. |
| segments_number | int | 5 | Number of segments into which the text is divided for analysis. Default is 5. |
| skip_wc | int | 10 | Skip any texts with a word count less than this value. Default is 10. |
| Returns | DataFrame | The resulting DataFrame with the narrative arc analysis. |