четверг, 21 мая 2009 г.

logistic regression on aggregate data

I have recently been engaged in an analysis of Australian electoral behaviour during the Great Depression. I have developed a database of the social composition of electoral districts (4 mutually exclusive groups: Catholic manual workers, Protestant manual workers, Employers + Self-employed, white-collar employees) and then used linear regression to develop estimates of levels of left party support among different social groups. The analysis is reported in a 2005 paper at SSHA and my recent book; When the Labor Party Dreams: Class, politics and politics in New South Wales 1930-32. However as electoral behaviour is highly polarised by class I suspect that a linear model overstates levels of left party support among workers and Catholics, the same problem that arises in the use of linear regression to analyse African-American voting. Thus I am attempting to use logistic regression.

However I am at a loss to work out how to do this. Every source I can find on logistic regression refers to its use to analyse individual data. Can anyone point me in the direction of a how-to manual (preferably using SPSS) or offer some tips that explains the use of logistic regression to estimate individual voting behaviour from aggregate data?

Geoff Robinson

Dr Geoffrey Robinson
Lecturer in History & Politics
Faculty of Arts & Education
Deakin University
Pigdons Rd
Geelong
VIC 3217
Australia
Tel: +61 3 5227 1452
Fax: +61 3 5227 3380
Email: geoffrey.robinson@deakin.edu.au
Web: http://geoffrobinson.info
CRICOS Provider Code 00113B

I don't know your book and your dataset, but regress a logistic equation is
very simple and the outputs is easier than a linear reg with multiple
vars.Well,
I don't know any manual to do this on SPSS, I have one to STATA and R. But I
think that you can find more about it on SPSS manuals site.
If you want you can explain more how yours variables are.

Daniel
University of Brasilia

Dr. Robinson,

Political Analysis, Vol. 10, No. 1 has several articles plus
references to some others on analyzing multi-party aggregate data.
The seemingly unrelated regression version in several of the articles
is quite easy to set up in stata. I will be glad to offer advice if
you wish.

good luck with your project.

John Jackson

Dear Geoff,

I think you should use an ecological inference method; unfortunately
there are not any ecological inference methods in SPSS.

For more information about ecological inference you should visit
http://gking.harvard.edu/projects/ecinf.shtml

Ioannis Andreadis
Lecturer
Laboratory of Applied Political Research
Department of Political Sciences
Aristotle University Thessaloniki
46 Egnatias Str.,Thessaloniki
54625 Hellas (Greece)
Tel: +302310991992, +302310991950
Fax: +302310991983

Geoff,

If I understand your question correctly and you have access to Stata, you should be able to estimate this model using the nl command, like so:

nl ( vote = (1/(1 + exp( - ({b1} *x1 + {b2} * x2 + {b3}* constant )))) )

The "vote" variable should vary between 0 and 1 as the percentage of left part support. You'll need to manually create the "constant" variable as well. You can use the "help nl" command to get some more details.

I should note that you might have an ecological inference problem in this design, though I can't know for sure unless I see more of the details.


--

Justin Esarey
Assistant Professor of Political Science
Emory University
Office: (404) 727-6583
Fax: (404) 727-4586
E-mail: jesarey@emory.edu
Web Site: http://userwww.service.emory.edu/~jesarey

As I understand it, Geoff has the proportion of the vote won by the
Labor party as the response variable of interest; nothing I've read in
the setup suggests we've got a multi-party setup. The predictors of
interest are the demographic compositions of the districts. The
concern is that linear regression might not appropriate here.

Logistic regression will at least ensure that the model generates
predictions that are actually proportions. The non-linear model above
will give you just that. Perhaps an easier way to implement this
might be to take the log-odds of the ALP vote proportions, and just
run a regression on those. For vote shares, say, in the 35-65 range,
the logistic transform (or a probit transform) is pretty linear, and
if most of your data is in that range, then garden-variety regressions
on the raw proportions will be good enough.

Yes, there is an ecological inference issue here, particularly since
Geoff's email suggests that he really is interested in recovering
estimates of the Labor vote shares within social groups. It might be
worth thinking about how to make the problem conform to one of the EI
type approaches available in software out there.

Another interesting issue presented by these data could be wrt the
functional form linking the covariates to the proportions. There
maybe interesting non-linearities and interactions at work here; as a
given social group reaches some threshold proportion relative to other
groups, its support might "jump" or display some kind of non-linearity
with respect to the Labor's raw proportions of the vote (or even wrt
the logits of the Labor vote shares); what I have in mind here are the
parties and the candidates engaging in some rational allocation of
resources (e.g., anti-Labor "runs dead" in working class strongholds),
leading to over/under-performance relative to the social composition
of the district. Compulsory voting in the Australian context might
smooth some of this out, at least on the voter mobilization/turnout
side, but it might be something to look at.

regards

-- simon jackman

Professor Simon Jackman,
Jan-August 2009
Visiting Professor,
United States Studies Centre
University of Sydney, NSW 2006
Australia
+61 2 9036 9208 (w)
+61 401 620 725 (m)

Depts of Political Science & (by courtesy) Statistics,
Stanford University, Stanford, CA 94305-6044, USA.
http://jackman.stanford.edu

Комментариев нет: