pysmatch.visualization

pysmatch.visualization.compare_categorical(matcher, return_table: bool = False, plot_result: bool = True)[source]

Compares categorical variables between groups before and after matching.

For each categorical covariate, this function computes proportional differences and Chi-square p-values before/after matching, and can optionally plot the proportional differences.

Parameters:
  • matcher (Matcher) – An instance of the pysmatch.Matcher class, containing original data, matched data, yvar, xvars, exclude.

  • return_table (bool, optional) – If True, returns the Chi-Square test results (variable name, p-value before, p-value after) as a pandas DataFrame. Defaults to False.

  • plot_result (bool, optional) – If True, displays the bar plots comparing proportional differences. Defaults to True.

Returns:

If return_table is True, returns a DataFrame summarizing

the Chi-Square test results for each categorical covariate. Otherwise, returns None. Returns an empty DataFrame or None if no categorical variables are found or if an error occurs.

Return type:

Optional[pd.DataFrame]

中文注释: 对分类变量在匹配前后做卡方检验并绘制比例差异图

pysmatch.visualization.compare_continuous(matcher, return_table: bool = False, plot_result: bool = True)[source]

Compares continuous variables between groups before and after matching.

For each continuous covariate, this function computes summary balance statistics before/after matching and can optionally render ECDF plots.

Parameters:
  • matcher (Matcher) – An instance of the pysmatch.Matcher class, which must contain the original data (matcher.data), matched data (matcher.matched_data), target variable name (matcher.yvar), covariate list (matcher.xvars), and excluded columns list (matcher.exclude).

  • return_table (bool, optional) – If True, returns the calculated statistics as a pandas DataFrame. Defaults to False.

  • plot_result (bool, optional) – If True, displays the ECDF comparison plots for each continuous variable. Defaults to True.

Returns:

If return_table is True, returns a DataFrame summarizing

the balance statistics for each continuous covariate. Otherwise, returns None. Returns an empty DataFrame or None if no continuous variables are found or if an error occurs.

Return type:

Optional[pd.DataFrame]

中文注释: 对连续变量在匹配前后做分布对比

pysmatch.visualization.plot_matched_scores(data: DataFrame, yvar: str, control_color: str = '#1F77B4', test_color: str = '#FF7F0E') None[source]

Plots the distribution of propensity scores after matching.

Generates Kernel Density Estimate (KDE) plots to visualize the overlap of propensity scores between the test and control groups present in the matched dataset.

Parameters:
  • data (pd.DataFrame) – The matched DataFrame, must contain the yvar column and a ‘scores’ column.

  • yvar (str) – The name of the binary column indicating group membership (0 or 1).

  • control_color (str, optional) – Hex color code for the control group plot. Defaults to “#1F77B4”.

  • test_color (str, optional) – Hex color code for the test group plot. Defaults to “#FF7F0E”.

Returns:

Displays the matplotlib plot.

Return type:

None

Raises:

ValueError – If the input data is empty or lacks the ‘scores’ column.

中文注释: 绘制匹配后测试组与对照组的分数分布

pysmatch.visualization.plot_scores(data: DataFrame, yvar: str, control_color: str = '#1F77B4', test_color: str = '#FF7F0E') None[source]

Plots the distribution of propensity scores before matching.

Generates Kernel Density Estimate (KDE) plots to visualize the overlap of propensity scores between the test and control groups in the original (unmatched) dataset.

Parameters:
  • data (pd.DataFrame) – The original DataFrame containing scores, must include the yvar column and a ‘scores’ column.

  • yvar (str) – The name of the binary column indicating group membership (0 or 1).

  • control_color (str, optional) – Hex color code for the control group plot. Defaults to “#1F77B4”.

  • test_color (str, optional) – Hex color code for the test group plot. Defaults to “#FF7F0E”.

Returns:

Displays the matplotlib plot.

Return type:

None

Raises:

ValueError – If the ‘scores’ column is not found in the input data.

中文注释: 绘制匹配前测试组与对照组的分数分布