FAQ

When should I use `linear` vs `tree` vs `knn`?

Start with linear for strong baseline interpretability.
Use tree for nonlinear relationships and mixed feature types.
Use knn as a local-structure baseline and compare sensitivity.

Is high model accuracy always better for matching?

Not necessarily. Very high separability may indicate weak overlap, which can reduce matchability. Balance diagnostics matter more than raw classifier accuracy.

Should I use over- or under-sampling?

over: usually keeps more majority information; good default.
under: faster/smaller training sets; useful for sensitivity checks.

How do I make runs reproducible?

set np.random.seed(...)
keep fixed package versions
record model/matching parameters in experiment logs

Additional resources

Sekhon, J. S. (2011), Multivariate and propensity score matching software with automated balance optimization: The Matching package for R. Journal of Statistical Software, 42(7), 1-52. Link
Rosenbaum, P. R., & Rubin, D. B. (1983), The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. Link

Contributing

Contributions are welcome. Please open an issue or pull request in this repository.

License

pysmatch is released under the MIT License.