MaxDiffMixedLogit#

class pymc_marketing.customer_choice.maxdiff.MaxDiffMixedLogit(task_df, items, respondent_id='respondent_id', task_id='task_id', item_col='item_id', best_col='is_best', worst_col='is_worst', random_intercepts=True, reference_item=None, model_config=None, sampler_config=None, non_centered=True, item_attributes=None, utility_formula=None, random_attributes=None, full_covariance=False, lkj_eta=2.0)[source]#

Hierarchical MaxDiff (Best-Worst Scaling) model.

Estimates item-level utilities from best-worst choice data with optional per-respondent random intercepts. The likelihood is the Louviere sequential best-worst model:

\[\begin{split}P(\\text{best}_t = b \\mid \\text{subset}_t) &= \\operatorname{softmax}(U)_b \\\\ P(\\text{worst}_t = w \\mid \\text{subset}_t, b) &= \\operatorname{softmax}(-U_{\\setminus b})_w\end{split}\]

implemented as two pm.Categorical observed distributions so that pm.sample_posterior_predictive yields best/worst draws directly.

Parameters:
task_dfpd.DataFrame

Long-format MaxDiff data; see prepare_maxdiff_data().

itemslist[str]

Full item pool. Defines the items coord.

respondent_idstr, default “respondent_id”

Column in task_df identifying respondents.

task_idstr, default “task_id”

Column identifying tasks (unique within respondent).

item_colstr, default “item_id”

Column naming the shown item (must be in items).

best_colstr, default “is_best”

0/1 column flagging the best pick within each task.

worst_colstr, default “is_worst”

0/1 column flagging the worst pick.

random_interceptsbool, default True

When True, each respondent draws item-level deviations from the population item utilities (HB-MaxDiff). When False, only population utilities are estimated.

reference_itemstr, optional

Item pinned to utility 0 for identification. Defaults to items[-1].

model_configdict, optional

Priors for beta_item_ (population utilities) and sigma_item (per-item heterogeneity scale).

sampler_configdict, optional

Arguments passed to pm.sample.

non_centeredbool, default True

Non-centered parameterisation for the respondent-level deviations.

item_attributespd.DataFrame, optional

One row per item, with the item name as the index and one column per attribute. When provided together with utility_formula, switches the model into part-worths mode: utilities become \(U_i = X_i^\\top \\beta_{\\mathrm{feat}}\) where \(X\) is the patsy-expanded design matrix. Extrapolates naturally to new items via their attributes. Must cover every item in items.

utility_formulastr, optional

Patsy formula describing the attribute contribution to utility, e.g. "~ 0 + C(brand) + price + quality". Required iff item_attributes is given. Use a leading 0 + (no intercept) so the model is identified without a reference item.

random_attributeslist[str], optional

Names of patsy-expanded feature columns that should vary across respondents (respondent part-worths). Remaining features are treated as population-level fixed effects. Only meaningful in part-worths mode; ignored otherwise. Defaults to an empty list (pure fixed part-worths).

Note

Other customer-choice models in this package use Wilkinson pipe notation "~ covariate | random_covariate" to declare random coefficients. MaxDiff deliberately diverges: there is no per-alternative equation structure here (the same attributes describe every item), so the pipe formula is ambiguous. An explicit list is cleaner and less error-prone.

Notes

Input format example:

respondent_id  task_id  item_id  is_best  is_worst
r1             1        apple    0        0
r1             1        banana   1        0
r1             1        cherry   0        1
r1             1        date     0        0
r1             2        apple    0        1
...

Each (respondent_id, task_id) group must contain exactly one row with is_best == 1 and one with is_worst == 1, and the two must differ. Each task must show at least two items. Subset sizes may vary across tasks; they are padded to K_max internally.

In the default (item-intercept) mode only item-utility contrasts against the reference item are identified; absolute levels are not. In part-worths mode reference_item / random_intercepts are ignored — identification comes from the no-intercept formula (~ 0 + ...) and respondent heterogeneity is controlled by random_attributes.

Posterior predictive limitations

The Louviere best-worst likelihood is sequential: worst is drawn from the remaining items after the best has been removed. In the PyMC graph this is implemented by masking the best position out of the worst-pick softmax using best_pos as a pm.Data node.

sample_posterior_predictive() therefore produces a partially conditioned joint:

  • best_pick is sampled correctly from softmax(U).

  • worst_pick is sampled from softmax(-U \\ {observed_best}), i.e. it is still conditioned on the observed best position, not on the freshly sampled best_pick.

This makes the joint (best_pick, worst_pick) draws incoherent for generative use — the two picks may designate the same position. sample_posterior_predictive() remains valid for in-sample posterior predictive checks: verifying that the model’s worst-pick distribution is consistent with the data, given that the best pick was what was actually recorded.

For any counterfactual or out-of-sample simulation use predict_choices() (or apply_intervention()), which samples the joint (best, worst) generatively — best first, then worst conditioned on the sampled best — producing a coherent joint draw.

Methods

MaxDiffMixedLogit.__init__(task_df, items[, ...])

Initialize model configuration and sampler configuration for the model.

MaxDiffMixedLogit.apply_intervention(new_task_df)

Simulate choices under a counterfactual task design.

MaxDiffMixedLogit.attrs_to_init_kwargs(attrs)

Rehydrate init kwargs from serialised idata attrs.

MaxDiffMixedLogit.build_from_idata(idata)

Rebuild the PyMC model from a loaded InferenceData.

MaxDiffMixedLogit.build_model(**kwargs)

Build the PyMC model using the cached task_df.

MaxDiffMixedLogit.create_idata_attrs()

Serialise init kwargs so the model can be reloaded from idata.

MaxDiffMixedLogit.fit([task_df, ...])

Fit the model via NUTS and attach the result to self.idata.

MaxDiffMixedLogit.graphviz(**kwargs)

Get the graphviz representation of the model.

MaxDiffMixedLogit.idata_to_init_kwargs(idata)

Create the model configuration and sampler configuration from the InferenceData to keyword arguments.

MaxDiffMixedLogit.load(fname[, check])

Create a ModelBuilder instance from a file.

MaxDiffMixedLogit.load_from_idata(idata[, check])

Create a ModelBuilder instance from an InferenceData object.

MaxDiffMixedLogit.make_model(arrays[, observed])

Build the MaxDiff PyMC model.

MaxDiffMixedLogit.predict_choices(task_df[, ...])

Fully generative (best, worst) simulation under a new task design.

MaxDiffMixedLogit.preprocess_model_data(task_df)

Run prepare_maxdiff_data() and cache its outputs on the model.

MaxDiffMixedLogit.sample([...])

Run prior predictive, fit, and posterior predictive in sequence.

MaxDiffMixedLogit.sample_posterior_predictive([...])

Sample from the posterior predictive distribution.

MaxDiffMixedLogit.sample_prior_predictive([...])

Sample from the prior predictive distribution.

MaxDiffMixedLogit.save(fname, **kwargs)

Save the model's inference data to a file.

MaxDiffMixedLogit.score_new_items(...)

Compute posterior share-of-preference after introducing new items.

MaxDiffMixedLogit.set_idata_attrs([idata])

Set attributes on an InferenceData object.

MaxDiffMixedLogit.table(**model_table_kwargs)

Get the summary table of the model.

MaxDiffMixedLogit.transform_attributes(new_attrs)

Apply the fitted patsy formula to a new attribute frame.

Attributes

default_model_config

Default priors — returns only the priors used by the active mode.

default_sampler_config

Default sampler configuration.

fit_result

Get the posterior fit_result.

id

Generate a unique hash value for the model.

output_var

Primary observed variable name.

posterior

Access the 'posterior' attribute of the InferenceData object.

posterior_predictive

Access the 'posterior_predictive' attribute of the InferenceData object.

predictions

Access the 'predictions' attribute of the InferenceData object.

prior

Access the 'prior' attribute of the InferenceData object.

prior_predictive

Access the 'prior_predictive' attribute of the InferenceData object.

version

idata

sampler_config

model_config