Go Back   CORTEX Forums > Local Happenings > CORTEX Blogs > Fishing In The Bay
Register Blogs FAQ Members List Calendar Search Today's Posts Mark Forums Read

Fishing In The Bay A blog by Chris Lloyd on "Statistical musings from an antipodean perspective"

Reply
 
LinkBack Thread Tools Display Modes
Old 21st October 2009, 02:35 PM   #1 (permalink)
Member
 
Join Date: Jun 2009
Posts: 30
Chris Lloyd is on a distinguished road
Default Frequentists and prior information

Here is another post about wrong headed justification of Bayesian over frequentist statistics. As suggested by David Dowe in his comment on my previous post, it is worth pointing out at the beginning rather than the end that nowhere below will you find an argument against Bayesian statistics per se (though I think there are some).

In the previous post I mentioned that there are two claims that (some) Bayesians make about their approach that get me annoyed. The first is that Bayesian thinking is natural and people will naturally apply probability to unknowns if not brain-washed by a frequentist education. The second is that only Bayesians, and not frequentists, can make use of prior information. Wrong.



I claim that frequentists can include prior information in a very similar manner to*Bayesians. It might clarify things to consider a single parameter problem, so as not to get into the main difference between the two paradigms which to my mind is how Bayesians can integrate out nuisance parameters.

Imagine I observe x=50 successes from n=100 trials. Frequentists need to specify a data generating mechanism (DGM) and a log-likelihood function*(which follows from the DGM) to complete an inference. My log-likelihood can be written down as
xlogp+(n-x)log(1-p)

The sampling distribution of x is binomial. How is*prior information about p to be included?

In the best case scenario, I go to the authors of the published study about p*and obtain their log-likelihood function, perhaps from the raw data and model if necessary. I then add their log-likelihood to my own. I know the distribution of my data. I know the distribution of their data. So I know the distribution of the multiplied likelihoods (in principle anyway). No problem at all. Prior incorporated.

Say what you like about frequentist inference being good or bad – but I can clearly include the previous knowledge. Indeed, whenever we have a sample that can be divided into two parts you can consider the full likelihood as being generated from the first chunk of data updated by the second chunk given the first. So frequentists do include “prior” information every time they analyse a time series.

OK. So what if you don’t have access to the previous study, but instead just have an estimate of p and a standard error, for instance phat=0.4 with standard error 0.05.*If the estimate can be assumed to have come from a binomial experiment then we could solve for x and n and conclude that x=38.4 and n=96. So we have a slight problem right away with a fractional x. Maybe our estimate of 0.4 was rounded. We might make it x=38 out of n=95 which slightly errs on the side of conservatism – since it gives a slightly higher standard error. From here on, our likelihood and frequentist inference becomes that which follows from x=50+38 successes from 195 trials.

So now to the more realistic case – that we just have the estimate and standard error, perhaps not even from a single study but from a Cochrane meta-analysis. So we actually have imperfect prior information, from a frequentist point of view. The estimate and standard error tell us about the location and curvature of the likelihood that led to the estimate. We might thus approximate this likelihood by a normal likelihood term
-200(p-0.4)2

and then just add this to our own log-likelihood. You will have great difficulty in distinguishing a normal from a binomial prior log-likelihood. The ML estimate we obtain that incorporates prior information will no longer just be 88/195 but it differs from this by only a little – theoretically a second order term. The standard error from the joint likelihood also differs from the variance inverse weighted standard error by a second order term.

But what is the DGM you ask? How can we claim any frequentist properties for this ML estimator? We know that most estimates are asymptotically normal so we might argue that the prior Cochran estimate is generated by a DGM which is very close to normal with standard deviation close to the standard error. To first order, you actually don’t have to worry about exactly what it is. It is approximately normal. But if you formally assume that it is exactly normal (with mean p and standard deviation 0.05) then you can even do an exact frequentist likelihood inference. The full DGM is a combination of a continuous and discrete component. But this will only become important if we want to do second order or exact inference.

You might further refine the prior model term by allowing for the variance to differ with the true value like a binomial does. You can do this by replacing the standard deviation by the square root of p(1-p)/95, perhaps rescaled to equal the standard error of 0.05 when p=0.4. This gives an extra term depending on p in the total log-likelihood, and leads to a slightly different final inference because the variance weights are slightly different.

So the only difficulty I see*for frequentists including prior information is in small sample problems where one wants exact i.e. non-asymptotic inference, and where you do not know how the prior information was generated. Formally, frequentists*have to approximate not only the prior likelihood term but also a*DGM for it. Whereas in the Bayesian paradigm it is sufficient just to specify the prior log-likelihood term (without a DGM) and then proceed automatically to the “exact” inference i.e. compute the posterior distribution with barely a moment’s pause.

The process of thinking through how I would include prior information is actually useful I think. In having to invent a log-likelihood and a DGM, I all of a sudden wonder if I should really be making all this shit up! Perhaps I should analyse the present data to see what it says. If someone wants to combine this with previous information then they can*do a random effects meta-analysis later and separately.

Just to finish on a light note, it seems that the Bayesian conspiracy has even got to Bill Gates. The thing that really pisses me off is that Microsoft Word underlines the word frequentist as a spelling mistake but not Bayesian!



Get More from the original blog...
Chris Lloyd is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Sources Of Information Jo Vincent Open Source Analytics 2 27th August 2009 08:53 AM
Information Is An Asset Steve Bennett Oz Analytics 0 17th August 2009 11:19 AM
More On Living Information Steve Bennett Oz Analytics 0 4th August 2009 10:06 AM
Is Information Alive? Steve Bennett Oz Analytics 0 3rd August 2009 01:04 PM
IBM to Acquire Solid Information Technology to Broaden Information on Demand Portfoli Latest News Headlines 2007 News 0 26th December 2007 05:38 PM


All times are GMT +11. The time now is 06:18 AM.

© The Business Intelligence Group

Search Engine Optimization by vBSEO 3.3.0