StaggeredDifferenceInDifferences#

class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#

A class to analyse data from staggered adoption Difference-in-Differences settings.

This estimator uses an imputation-based approach: it fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.

Parameters:

data (pd.DataFrame) – A pandas dataframe with panel data (unit x time observations).
formula (str) – A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.
unit_variable_name (str) – Name of the column identifying units.
time_variable_name (str) – Name of the column identifying time periods.
treated_variable_name (str, optional) – Name of the column indicating treatment status (0/1). Defaults to “treated”.
treatment_time_variable_name (str, optional) – Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.
never_treated_value (Any, optional) – Value indicating never-treated units in treatment_time column. Defaults to np.inf.
model (PyMCModel or RegressorMixin, optional) – A model for the untreated outcome. Defaults to None.
event_window (tuple[int, int], optional) – Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.
reference_event_time (int, optional) – Event-time to use as reference (normalized to zero effect) in plots. Defaults to -1.

data_#

Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.

Type:: pd.DataFrame

att_group_time_#

Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t.

Type:: pd.DataFrame

att_event_time_#

Event-time ATT estimates: ATT(e) for each event-time e = t - G.

Type:: pd.DataFrame

Example

>>> import causalpy as cp
>>> from causalpy.data.simulate_data import generate_staggered_did_data
>>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42)
>>> result = cp.StaggeredDifferenceInDifferences(
...     df,
...     formula="y ~ 1 + C(unit) + C(time)",
...     unit_variable_name="unit",
...     time_variable_name="time",
...     treated_variable_name="treated",
...     treatment_time_variable_name="treatment_time",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "tune": 100,
...             "draws": 200,
...             "chains": 2,
...             "progressbar": False,
...         }
...     ),
... )

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.

Methods

`StaggeredDifferenceInDifferences.__init__`(...)
`StaggeredDifferenceInDifferences.effect_summary`([...])	Generate a decision-ready summary of causal effects.
`StaggeredDifferenceInDifferences.fit`(*args, ...)
`StaggeredDifferenceInDifferences.get_plot_data`(...)	Recover the data of an experiment along with the prediction and causal impact information.
`StaggeredDifferenceInDifferences.get_plot_data_bayesian`([...])	Get plotting data for Bayesian model.
`StaggeredDifferenceInDifferences.get_plot_data_ols`()	Get plotting data for OLS model.
`StaggeredDifferenceInDifferences.input_validation`()	Validate the input data and parameters.
`StaggeredDifferenceInDifferences.plot`(*args, ...)	Plot the model.
`StaggeredDifferenceInDifferences.print_coefficients`([...])	Ask the model to print its coefficients.
`StaggeredDifferenceInDifferences.summary`([...])	Print summary of main results.

Attributes

`idata`	Return the InferenceData object of the model.
`supports_bayes`
`supports_ols`
`labels`

__init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#

Parameters:

data (DataFrame)
formula (str)
unit_variable_name (str)
time_variable_name (str)
treated_variable_name (str)
treatment_time_variable_name (str | None)
never_treated_value (Any)
model (PyMCModel | RegressorMixin | None)
event_window (tuple[int, int] | None)
reference_event_time (int)
kwargs (dict)

Return type:

None

classmethod __new__(*args, **kwargs)#