среда, 12 августа 2009 г.

Can anyone point to routines or software, preferably for R, that use baseline data to identify "best matched sets (e.g. pairs)" of units over which to randomize? I am trying to do this for the case where exact matches won't be possible, in which case some kind of balance statistic will be necessary to find optimal matches over which to
randomize. I can imagine a combinatorial search that tries out all possible pair (or trio, or whatever) combinations and then evaluates them somehow, or perhaps something that samples units, and then tries to best match them, and does so over oodles of samples, and then ranks the pairings somehow.

Wondering if someone has already invented this wheel.

2 possibilities are CEM, http://gking.harvard.edu/cem,
and BlockTools, http://rtm.wustl.edu/software.blockTools.htm.

http://gking.harvard.edu

The software available here
(http://rtm.wustl.edu/software.blockTools.htm) addresses this problem
with several algorithms, arbitrary group sizes (pairs, triples, etc.),
and some methods for dealing with outliers, conducting the
randomization, and outputting easy-to-read results. For an
application in designing and conducting a randomized field experiment,
see

King, Gary, Emmanuela Gakidou, Nirmala Ravishankar, Ryan T. Moore, Jason Lakin, Manett Vargas, Martha María Téllez-Rojo, Juan Eugenio Hernández Ávila, Mauricio Hernández Ávila and Héctor Hernández Llamas. 2007. "A 'Politically Robust' Experimental Design for Public Policy Evaluation, with Application to the Mexican Universal Health Insurance Program". Journal of Policy Analysis and Management, 26(3): 479-509.

Based on responses I've received, I think I need to clarify what I was asking.

I am not looking for routines that match data for observational studies.

I am looking for something that automates the creation of matched sets over which treatments **will** be randomized. That is, the experimental treatments have **not yet been assigned.**

We want to automate the search for matched sets if units over which we will actually be randomizing treatment assignment.

For example, if we have a dataset with 1000 units, and we have a budget to treat 50 units, we want a routine that searches for an optimal set, based on input criteria, of 50 pairs within which we will randomize treatment.

So one possible approach could be to use our priors about what "matters" to stratify the 1000 units into 50 blocking cells (not necessarily evenly sized) and then examine all pair combinations within each cell to see which is a best matched pair. That's one example off the top of my head.

Hopefully that's more clear. But if I'm missing something obvious, please let me know.

Комментариев нет: