четверг, 21 мая 2009 г.

Displaying regression coefficients and standardized partial regression coefficients РаспечататьВ виде HTMLСвойства письма

It may be that I have misinterpreted a few things, but here is my question
in two parts. I would greatly appreciate any thoughts, references, ideas,
or insights.



First, given that p-values in a regression only allow us to reject the null
hypothesis that the parameter is less-than or equal to (greater-than or
equal to) zero, why do we display the regression coefficients (point
estimates), especially since the actual parameter may be very different that
the point estimate arrived at by the regression. Yes, the point estimate is
our “best guess” as the influence of the parameter, but doesn’t displaying
the coefficient suggest a degree of accuracy that we do not have? Wouldn’t
it be better to simply display a +(-) and the p-value? Or perhaps the 95%
confidence interval of the point estimate? Removing the point estimates of
the explanatory variables from our tables would highlight the degree of
uncertainty that exists about the “true” influence of the explanatory
variables being studied. I know that reported the regression coefficient is
noting more than a point estimate, but do most readers really understand the
degree of uncertainty behind them?



Second, if we are going to display regression coefficients, why isn’t
standard practice to (also) display standardized regression coefficients
(standardized partial regression coefficients)? Standardized regression
coefficients are harder to interpret in terms of, say, “X increase in GPD
decrease child mortality by Y”, or “X increase in a country’s polity score
increases foreign direct investment by Y,” but they often allow a direct and
clear comparison of the influence of the different causal variable being
utilized in a model. If we display regression coefficients because we are
interested in the degree that variable influences the outcome being studied,
doesn’t make more sense to let the reader know the interval of influence we
would normally expect to see by standardizing the coefficients? With out
knowing the standard deviation of the response variable it is difficult to
know the degree of influence that the explanatory variables have. When
standardized coeffiecents are presented the reader is much less likely to
confuse a large coefficient with a strong effect and a small coefficient
with a big effect. Of course, standardized regression coefficentes should
be used with care, especially when sampling error or multicollinearity may
inflate standad errors oe when the variables being compared have nonnormal
distributions. I have read Fox’s two pages on the subject, but was left
unconvinced that using standardized partial regression coefficients is not a
generally better approach to conveying information. Hence, any insights or
references on this subject would be greatly appreciated.



Finally, is there any particular difficulty in interpreting the standardized
partial regression coefficients variables included in an interaction term?
Is it possible to simply generate 3D graphics just like you would for the
unstandardized coefficients?



Cheers and thanks in advance,

Anthony





-----------------------------------------------

Anthony A. Pezzola
apezzola@uc.cl
(02) 354-7823
Profesor de Ciencia Política
Instituto de Ciencia Política
Pontificia Universidad Católica de Chile
Santiago de Chile

Anthony,

I suspect that journals' editorial practices play an important role in
compelling authors to present results in the conventional form.

I had one experience where I submitted an article that presented
results from a logistic regression model in graphical form, showing
the 95% CIs associated with the estimated odds ratios for a set of
binary variables (membership in a variety of organizations). This was
about as simple a case as you could get, because the original
estimates were all directly comparable to one another, and I still
included a table showing the results in the industry-standard way in
an appendix. Nevertheless, the one significant change the editors
asked me to make before publication was to dump the chart and replace
it with the table. They didn't explicitly say why.

-Jay

Anthony,

On the problem with standardized betas, see

King, Gary, "How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science," American Journal of Political Science 30:3 (August, 1986) 666-687.

Best, Anne

Anne E. Sartori
Associate Professor of Political Science and (by courtesy) of Managerial Economics and Decision Sciences
Northwestern University
a-sartori@kellogg.northwestern.edu
(847) 491-4017

Anthony (regards from Seattle!)

All of this is good advice, but don't forget that journal editors can also learn new tricks. So don't stop trying to present your work in the most compelling visual manner, even when they might want yet another regression table...


Michael D. Ward, Professor of Political Science
University of Washington, Seattle, WA, 98195-3530, USA

direct: 206.616.3583 (email is better)
messages: 206.543.2780; fax: 206.685.2146
web site: faculty.washington.edu/mdw

Hi all,

The issue about the Journal editors in political science is a serious one:
why would we learn how to program detailed graphical displays if, at the
end, what they want us to show are difficult to interpret tables. After all,
these graphs need careful thinking and some programming experience, while
just say coefficient beta one is significant does not require much
effort....

How can we persuade people - including graduate students like me - to learn
these smart and "conscious" ways to present data and results if referees
don't allow us to use them?

All the best,

Antonio.

I don't disagree with Larry. Visuals are conveyed quickly; tables can be studied carefully for detail.
Where details are important (hopefully in much scholarly work) tables will be essential, even if they are
relegated to appendices. If you wanted to know life expectancies for example, a table is perfect,
whereas a visual portrayal won't have the detailed information.

I was simply arguing (perhaps too obliquely) that relying on contemporary standards in state of the art
journals may not be the best approach to presenting your own work. Someone had to be the first to
present a density plot, for example, in the APSR.

My own opinion is that the APSR standard for regression tables consumes far too many column inches.
Every year I give a quiz to my graduate students to tell me the value of a single coefficient they
remember reading during the previous quarter. So far my record is "perfect".

Michael D. Ward, Professor of Political Science
University of Washington, Seattle, WA, 98195-3530, USA

direct: 206.616.3583 (email is better)
messages: 206.543.2780; fax: 206.685.2146
web site: faculty.washington.edu/mdw

Hi all,

In an article published in *Perspectives on Politics* in 2007, Eduardo Leoni
and I argue that in most cases, graphs of regression results can more
effectively present all the information contained in a standard regression
table, and can do so using a similar amount of space. We illustrate this by
converting a few published tables into regression graphs.

The paper is available at
http://www.columbia.edu/~jpk2004/graphs.pdf;
there is also a web site accompanying the paper at
http://tables2graphs.com/doku.php that contains replication code.

Best,

John Kastellec

logistic regression on aggregate data

I have recently been engaged in an analysis of Australian electoral behaviour during the Great Depression. I have developed a database of the social composition of electoral districts (4 mutually exclusive groups: Catholic manual workers, Protestant manual workers, Employers + Self-employed, white-collar employees) and then used linear regression to develop estimates of levels of left party support among different social groups. The analysis is reported in a 2005 paper at SSHA and my recent book; When the Labor Party Dreams: Class, politics and politics in New South Wales 1930-32. However as electoral behaviour is highly polarised by class I suspect that a linear model overstates levels of left party support among workers and Catholics, the same problem that arises in the use of linear regression to analyse African-American voting. Thus I am attempting to use logistic regression.

However I am at a loss to work out how to do this. Every source I can find on logistic regression refers to its use to analyse individual data. Can anyone point me in the direction of a how-to manual (preferably using SPSS) or offer some tips that explains the use of logistic regression to estimate individual voting behaviour from aggregate data?

Geoff Robinson

Dr Geoffrey Robinson
Lecturer in History & Politics
Faculty of Arts & Education
Deakin University
Pigdons Rd
Geelong
VIC 3217
Australia
Tel: +61 3 5227 1452
Fax: +61 3 5227 3380
Email: geoffrey.robinson@deakin.edu.au
Web: http://geoffrobinson.info
CRICOS Provider Code 00113B

I don't know your book and your dataset, but regress a logistic equation is
very simple and the outputs is easier than a linear reg with multiple
vars.Well,
I don't know any manual to do this on SPSS, I have one to STATA and R. But I
think that you can find more about it on SPSS manuals site.
If you want you can explain more how yours variables are.

Daniel
University of Brasilia

Dr. Robinson,

Political Analysis, Vol. 10, No. 1 has several articles plus
references to some others on analyzing multi-party aggregate data.
The seemingly unrelated regression version in several of the articles
is quite easy to set up in stata. I will be glad to offer advice if
you wish.

good luck with your project.

John Jackson

Dear Geoff,

I think you should use an ecological inference method; unfortunately
there are not any ecological inference methods in SPSS.

For more information about ecological inference you should visit
http://gking.harvard.edu/projects/ecinf.shtml

Ioannis Andreadis
Lecturer
Laboratory of Applied Political Research
Department of Political Sciences
Aristotle University Thessaloniki
46 Egnatias Str.,Thessaloniki
54625 Hellas (Greece)
Tel: +302310991992, +302310991950
Fax: +302310991983

Geoff,

If I understand your question correctly and you have access to Stata, you should be able to estimate this model using the nl command, like so:

nl ( vote = (1/(1 + exp( - ({b1} *x1 + {b2} * x2 + {b3}* constant )))) )

The "vote" variable should vary between 0 and 1 as the percentage of left part support. You'll need to manually create the "constant" variable as well. You can use the "help nl" command to get some more details.

I should note that you might have an ecological inference problem in this design, though I can't know for sure unless I see more of the details.


--

Justin Esarey
Assistant Professor of Political Science
Emory University
Office: (404) 727-6583
Fax: (404) 727-4586
E-mail: jesarey@emory.edu
Web Site: http://userwww.service.emory.edu/~jesarey

As I understand it, Geoff has the proportion of the vote won by the
Labor party as the response variable of interest; nothing I've read in
the setup suggests we've got a multi-party setup. The predictors of
interest are the demographic compositions of the districts. The
concern is that linear regression might not appropriate here.

Logistic regression will at least ensure that the model generates
predictions that are actually proportions. The non-linear model above
will give you just that. Perhaps an easier way to implement this
might be to take the log-odds of the ALP vote proportions, and just
run a regression on those. For vote shares, say, in the 35-65 range,
the logistic transform (or a probit transform) is pretty linear, and
if most of your data is in that range, then garden-variety regressions
on the raw proportions will be good enough.

Yes, there is an ecological inference issue here, particularly since
Geoff's email suggests that he really is interested in recovering
estimates of the Labor vote shares within social groups. It might be
worth thinking about how to make the problem conform to one of the EI
type approaches available in software out there.

Another interesting issue presented by these data could be wrt the
functional form linking the covariates to the proportions. There
maybe interesting non-linearities and interactions at work here; as a
given social group reaches some threshold proportion relative to other
groups, its support might "jump" or display some kind of non-linearity
with respect to the Labor's raw proportions of the vote (or even wrt
the logits of the Labor vote shares); what I have in mind here are the
parties and the candidates engaging in some rational allocation of
resources (e.g., anti-Labor "runs dead" in working class strongholds),
leading to over/under-performance relative to the social composition
of the district. Compulsory voting in the Australian context might
smooth some of this out, at least on the voter mobilization/turnout
side, but it might be something to look at.

regards

-- simon jackman

Professor Simon Jackman,
Jan-August 2009
Visiting Professor,
United States Studies Centre
University of Sydney, NSW 2006
Australia
+61 2 9036 9208 (w)
+61 401 620 725 (m)

Depts of Political Science & (by courtesy) Statistics,
Stanford University, Stanford, CA 94305-6044, USA.
http://jackman.stanford.edu