pysmatch.visualization
- pysmatch.visualization.compare_categorical(matcher, return_table: bool = False, plot_result: bool = True)[source]
Compares categorical variables between groups before and after matching.
For each categorical covariate identified in the matcher object: 1. Calculates the proportional difference (test % - control %) for each category
level before and after matching.
Performs Chi-Square tests of independence between the variable and the group indicator (yvar) before and after matching (using matcher.prop_test).
If plot_result is True, generates bar plots showing the proportional differences for each category before and after matching, annotated with the Chi-Square p-values.
Collects the Chi-Square test results into a DataFrame.
- Parameters:
matcher (Matcher) – An instance of the pysmatch.Matcher class, containing original data, matched data, yvar, xvars, exclude.
return_table (bool, optional) – If True, returns the Chi-Square test results (variable name, p-value before, p-value after) as a pandas DataFrame. Defaults to False.
plot_result (bool, optional) – If True, displays the bar plots comparing proportional differences. Defaults to True.
- Returns:
- If return_table is True, returns a DataFrame summarizing
the Chi-Square test results for each categorical covariate. Otherwise, returns None. Returns an empty DataFrame or None if no categorical variables are found or if an error occurs.
- Return type:
Optional[pd.DataFrame]
中文注释: 对分类变量在匹配前后做卡方检验并绘制比例差异图
- pysmatch.visualization.compare_continuous(matcher, return_table: bool = False, plot_result: bool = True)[source]
Compares continuous variables between groups before and after matching.
For each continuous covariate identified in the matcher object: 1. Calculates standardized median and mean differences before and after matching. 2. Performs permutation tests based on Chi-square distance before and after matching. 3. Performs bootstrap Kolmogorov-Smirnov (KS) tests before and after matching. 4. If plot_result is True, generates side-by-side Empirical Cumulative
Distribution Function (ECDF) plots comparing the distributions before and after matching, annotated with the calculated statistics (KS p-value, permutation p-value, standardized differences).
Collects these statistics into a DataFrame.
- Parameters:
matcher (Matcher) – An instance of the pysmatch.Matcher class, which must contain the original data (matcher.data), matched data (matcher.matched_data), target variable name (matcher.yvar), covariate list (matcher.xvars), and excluded columns list (matcher.exclude).
return_table (bool, optional) – If True, returns the calculated statistics as a pandas DataFrame. Defaults to False.
plot_result (bool, optional) – If True, displays the ECDF comparison plots for each continuous variable. Defaults to True.
- Returns:
- If return_table is True, returns a DataFrame summarizing
the balance statistics for each continuous covariate. Otherwise, returns None. Returns an empty DataFrame or None if no continuous variables are found or if an error occurs.
- Return type:
Optional[pd.DataFrame]
中文注释: 对连续变量在匹配前后做分布对比
- pysmatch.visualization.plot_matched_scores(data: DataFrame, yvar: str, control_color: str = '#1F77B4', test_color: str = '#FF7F0E') None [source]
Plots the distribution of propensity scores after matching.
Generates Kernel Density Estimate (KDE) plots to visualize the overlap of propensity scores between the test and control groups present in the matched dataset.
- Parameters:
data (pd.DataFrame) – The matched DataFrame, must contain the yvar column and a ‘scores’ column.
yvar (str) – The name of the binary column indicating group membership (0 or 1).
control_color (str, optional) – Hex color code for the control group plot. Defaults to “#1F77B4”.
test_color (str, optional) – Hex color code for the test group plot. Defaults to “#FF7F0E”.
- Returns:
Displays the matplotlib plot.
- Return type:
None
- Raises:
ValueError – If the input data is empty or lacks the ‘scores’ column.
中文注释: 绘制匹配后测试组与对照组的分数分布
- pysmatch.visualization.plot_scores(data: DataFrame, yvar: str, control_color: str = '#1F77B4', test_color: str = '#FF7F0E') None [source]
Plots the distribution of propensity scores before matching.
Generates Kernel Density Estimate (KDE) plots to visualize the overlap of propensity scores between the test and control groups in the original (unmatched) dataset.
- Parameters:
data (pd.DataFrame) – The original DataFrame containing scores, must include the yvar column and a ‘scores’ column.
yvar (str) – The name of the binary column indicating group membership (0 or 1).
control_color (str, optional) – Hex color code for the control group plot. Defaults to “#1F77B4”.
test_color (str, optional) – Hex color code for the test group plot. Defaults to “#FF7F0E”.
- Returns:
Displays the matplotlib plot.
- Return type:
None
- Raises:
ValueError – If the ‘scores’ column is not found in the input data.
中文注释: 绘制匹配前测试组与对照组的分数分布