Workgroup: Doing Bayesian Data Analysis: A Tutorial Introduction with R and BUGS

Workgroup: Doing Bayesian Data Analysis: A Tutorial Introduction with R and BUGS

Mon 08 Dec 2014 10:00

Admin

https://www.goodreads.com/book/show/9003187-doing-bayesian-data-analysis
http://www.amazon.com/Doing-Bayesian-Data-Analysis-Tutorial/dp/0123814855

Free ebook.

It is a textbook for doing Bayesian statistics with R. Bayesian methods are probably superior to the old null hypothesis/frequentist-based approach, and if not, it is good to know both. It has very good reviews and the 1st edition is on Libgen. There is a 2nd edition just out, but not on Libgen yet.

I have started working my way thru this book. I will do most or all of the exercises. If anyone wants to go thru it with me, say so here. Then we can work out exercises and cross-verify.

The source code given in the book is here: http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/

Emil O. W. Kirkegaard

Mon 08 Dec 2014 11:46

Admin

Exercise 3.1. [Purpose: To give you some experience with random number generation in R.] Modify
the coin flipping program in Section 3.5.1 (RunningProportion.R) to simulate a biased coin
that has p(H) = .8. Change the height of the reference line in the plot to match p(H).
Comment your code. Hint: Read the help for sample.

# Goal: Toss a coin N times and compute the running proportion of heads.
N = 500  # Specify the total number of flips, denoted N.
# Generate a random sample of N flips for a fair coin (heads=1, tails=0);
# the function "sample" is part of R:
#set.seed(47405) # Uncomment to set the "seed" for the random number generator.
flipsequence = sample( x=c(0,1) , prob=c(.2,.8) , size=N , replace=TRUE )
# Compute the running proportion of heads:
r = cumsum( flipsequence ) # The function "cumsum" is built in to R.
n = 1:N                    # n is a vector.
runprop = r / n            # component by component division.
# Graph the running proportion:
# To learn about the parameters of the plot function,
# type help('par') at the R command prompt.
# Note that "c" is a function in R.
plot( n , runprop , type="o" , log="x" ,
      xlim=c(1,N) , ylim=c(0.0,1.0) , cex.axis=1.5 ,
      xlab="Flip Number" , ylab="Proportion Heads" , cex.lab=1.5 ,
      main="Running Proportion of Heads" , cex.main=1.5 )
# Plot a dotted horizontal line at y=.5, just as a reference line:
lines( c(1,N) , c(.8,.8) , lty=3 )
# Display the beginning of the flip sequence. These string and character
# manipulations may seem mysterious, but you can de-mystify by unpacking
# the commands starting with the innermost parentheses or brackets and
# moving to the outermost.
flipletters = paste( c("T","H")[ flipsequence[ 1:10 ] + 1 ] , collapse="" )
displaystring = paste( "Flip Sequence = " , flipletters , "..." , sep="" )
text( 5 , .6 , displaystring , adj=c(0,1) , cex=1.3 )
# Display the relative frequency at the end of the sequence.
text( N , .3 , paste("End Proportion =",runprop[N]) , adj=c(1,0) , cex=1.3 )

Exercise 3.2. [Purpose: To have you work through an example of the logic presented in Section 3.2.1.2.]
Determine the exact probability of drawing a 10 from a shuffled pinochle deck. (In a
pinochle deck, there are 48 cards. There are six values: 9, 10, Jack, Queen, King, Ace.
There are two copies of each value in each of the standard four suits: hearts, diamonds,
clubs, spades.)
(A) What is the probability of getting a 10?
(B) What is the probability of getting a 10 or Jack?

The section appears to be misspecified.

p(ten)=8/48=1/6
They are indepedent events so: p(ten or jack)=p(ten)+p(jack)
p(Jack)=8/48=1/6
So, p(ten or jack)=1/6+1/6=2/6=1/3.

Exercise 3.3. [Purpose: To give you hands-on experience with a simple probability density function, in
R and in calculus, and to re-emphasize that density functions can have values larger than 1.] Consider a spinner with a [0,1] scale on its circumference. Suppose that the spinner is slanted or
magnetized or bent in some way such that it is biased, and its probability density function
is p( x) = 6x(1 − x) over the interval x ∈ [0, 1].
(A) Adapt the code from Section 3.5.2 (IntegralOfDensity.R) to plot this density function and approximate its integral. Comment your code. Be careful to consider values of
x only in the interval [0, 1]. Hint: You can omit the first couple lines regarding meanval
and sdval, because those parameter values pertain only to the normal distribution. Then set
xlow=0 and xhigh=1.
(B) Derive the exact integral using calculus. Hint: See the example, Equation 3.7.
(C) Does this function satisfy Equation 3.3?
(D) From inspecting the graph, what is the maximal value of p( x)?

A

# Graph of normal probability density function, with comb of intervals.
#meanval = 0.0             # Specify mean of distribution.
#sdval = 0.2               # Specify standard deviation of distribution.
xlow  = 0                  # Specify low end of x-axis.
xhigh = 1                  # Specify high end of x-axis.
dx = 0.02                 # Specify interval width on x-axis
# Specify comb points along the x axis:
x = seq( from = xlow , to = xhigh , by = dx )
# Compute y values, i.e., probability density at each value of x:
#y = ( 1/(sdval*sqrt(2*pi)) ) * exp( -.5 * ((x-meanval)/sdval)^2 ) norm dist function
y = 6*x*(1 − x) #function, remember to add in extra asterisks
# Plot the function. "plot" draws the intervals. "lines" draws the bell curve.
plot( x , y , type="h" , lwd=1 , cex.axis=1.5
      , xlab="x" , ylab="p(x)" , cex.lab=1.5
      , main="Probability Density of 6x(1 − x)" , cex.main=1.5 ) #change title
lines( x , y )
# Approximate the integral as the sum of width * height for each interval.
area = sum( dx * y )
# Display info in the graph.
#text( -sdval , .9*max(y) , bquote( paste(mu ," = " ,.(meanval)) )
#      , adj=c(1,.5) )
#text( -sdval , .8*max(y) , bquote( paste(sigma ," = " ,.(sdval)) )
#      , adj=c(1,.5) )
text( 0 , .9*max(y) , bquote( paste(Delta , "x = " ,.(dx)) )
      , adj=c(0,.5) )
text( .8 , .9*max(y) ,
      bquote(
        paste( sum(,x,) , " " , Delta , "x p(x) = " , .(signif(area,3)) )
      ) , adj=c(0,.5) )

B
Attached.

C
Yes, the sum is 1. WolframAlpha agrees.

D
Just type max(y), and one gets 1.5.

Exercise 3.4. [Purpose: To have you use a normal curve to describe beliefs. It’s also handy to know
the area under the normal curve between µ and σ.]
(A) Adapt the code from Section 3.5.2 (IntegralOfDensity.R) to determine (approximately) the probability mass under the normal curve from x = µ − σ to x = µ + σ. Comment
your code. Hint: Just change xlow and xhigh appropriately, and change the text location so
that the area still appears within the plot.
(B) Now use the normal curve to describe the following belief. Suppose you believe that
women’s heights follow a bell-shaped distribution, centered at 162cm with about two-thirds
of all women having heights between 147cm and 177cm.

A

# Graph of normal probability density function, with comb of intervals.
meanval = 0  #mean is 0
sdval = 1   #sd = 1, standard normal distribution
xlow  = meanval-sdval     # Specify low end of x-axis.
xhigh = meanval+sdval     # Specify high end of x-axis.
dx = 0.01                 # Specify interval width on x-axis
# Specify comb points along the x axis:
x = seq( from = xlow , to = xhigh , by = dx )
# Compute y values, i.e., probability density at each value of x:
y = ( 1/(sdval*sqrt(2*pi)) ) * exp( -.5 * ((x-meanval)/sdval)^2 ) #norm dist function
# Plot the function. "plot" draws the intervals. "lines" draws the bell curve.
plot( x , y , type="h" , lwd=1 , cex.axis=1.5
      , xlab="x" , ylab="p(x)" , cex.lab=1.5
      , main="Normal Probability Density Function" , cex.main=1.5 ) #change title
lines( x , y )
# Approximate the integral as the sum of width * height for each interval.
area = sum( dx * y )
# Display info in the graph.
text( -sdval*.8 , .38 , bquote( paste(mu ," = " ,.(meanval)) )
      , adj=c(1,.5) )
text( -sdval*.8 , .35 , bquote( paste(sigma ," = " ,.(sdval)) )
      , adj=c(1,.5) )
text( sdval*.6 , .38 , bquote( paste(Delta , "x = " ,.(dx)) )
      , adj=c(0,.5) )
text( sdval*.6 , .35 ,
      bquote(
        paste( sum(,x,) , " " , Delta , "x p(x) = " , .(signif(area,2)) )
      ) , adj=c(0,.5) )

B
Just change mean to 162 and sd to 15. This value for SD is unrealistic. It is something like 5-7 cm. This source gives 6.1 as an average of 10 European countries.

Exercise 3.5. [Purpose: Recognize and work with the fact that Equation 3.11 can be solved for the
conjoint probability, which will be crucial for developing Bayes’ theorem.] School children were
surveyed regarding their favorite foods. Of the total sample, 20% were 1st graders, 20%
were 6th graders, and 60% were 11th graders. For each grade, the following table shows
the proportion of respondents that chose each of three foods as their favorite:
Ice Cream Fruit French Fries
1st Graders .3 .6 .1
6th Graders .6 .3 .1
11th Graders .3 .1 .6
From that information, construct a table of conjoint probabilities of grade and favorite food.
Also, say whether grade and favorite food are independent or not, and how you ascertained the answer. Hint: You are given p(grade) and p(food|grade). You need to determine
p(grade,food).

See attached file.

They are not independent because the posterior probabilities are not identical to the prior probabilities, which is the definition of independence given in Section 3.4.3.

Chadtech

Tue 09 Dec 2014 23:46

Id like to participate.

Whats the plan? Complete exercises along a prearranged schedule?

Best,
-Ct

Emil O. W. Kirkegaard

Wed 10 Dec 2014 00:00

Admin

Work as fast as possible. :) I'll probably be doing a chapter + exercises every day to every other day. I am working on exercises for chapter 4 now.

Emil O. W. Kirkegaard

Wed 10 Dec 2014 01:00

Admin

4.1

This is the famous doctor problem. It has been studied when given to actual doctors. Doctors usually get it wrong when it is given in standard probability form. See: http://yudkowsky.net/rational/bayes

So given by the text: p(Test|Disease)=.99, i.e. given the person has the disease, the test is positive with .99 probability. The true positive rate.
p(Test|¬Disease)=.05, i.e. if the person does not have the disease, the prob. is .05 for a positive test. The false positive rate.
p(Disease)=.001, the base rate of the disease.

And the goal is to find p(Disease|Test). I used the spatial intuition given in Section 4.1.2. First you calculate the values according to the table. And then you normalize them by dividing by the row sum. I did this and got p(Disease|Test)=.01943, or about 2%.

The formula is p(Disease|Test)=p(Test|Disease)*p(Disease)/p(Test).
Fill in: p(Disease|Test)=.99*.001/.05049=0.0194.

See spreadsheet file.

4.2

See excel. Via formula: p(Disease|¬Test)=p(¬Test|Disease)*p(Disease)/p(¬Test).
Fill in: p(Disease|¬Test)=.01*.01944/.93173=.00021.

4.3

A+B are in spreadsheet .

C is the same, just in other words. The probability is 0.93173 (cell D34).

D also in SS.

4.4

In SS.

4.5

This one I spent hours thinking about. I still cannot understand why my first method did not work, even after reading the solutions manual.

Comments are in SS.

4.6 + 4.7

Fairly easy. Calculate the likelihoods for each θ like in the text and calculate P(3H9T).

Chadtech

Fri 12 Dec 2014 05:18

3.1

 numberOfFlips = 500
set.seed(47404)
flipsequence = sample ( x=c(0,1), prob=c(.2,.8), size=numberOfFlips, replace=TRUE )
r = cumsum ( flipsequence )
n = 1:numberOfFlips
runprop = r / n

plot( n, runprop, type="o", log="x",
  xlim=c(1,numberOfFlips), ylim=c(0.0,1.0), cex.axis=1.5,
  xlab="Flip Number", ylab="Proportion Heads", cex.lab=1.5,
  main="Running Portion of Heads", cex.main=1.5)

lines( c(1,numberOfFlips), c(.8,.8), lty=3 )

flipletters = paste( c("T","H")[flipsequence[1:10] + 1], collapse="")
displaystring = paste( "Flip Sequence = ", flipletters, "...", sep="")
text( 5, .9, displaystring, adj=c(0,1), cex=1.3 )
text( numberOfFlips, .3, paste("End Proportion =", runprop[numberOfFlips]), adj=c(1,0), cex=1.3)

dev.copy2eps( file = "RunningProportion.eps")

Here is the graph: http://i.imgur.com/h7BzTne.png

Then, out of curiosity I tried another seed: http://i.imgur.com/rQfl3NX.png

I wonder if the author chose that seed because its end proportion is exactly 0.8

3.2
http://i.imgur.com/vKFrdGP.png

3.3

I no longer have the code I used to answer part A.

B, C, and D are answered in this image: http://i.imgur.com/3IbBp0k.jpg

3.4

Here is the state of my code upon finishing part B. It is in agreement with what Emil said.

meanValue = 162
standardDeviation = 15
lowEndX = meanValue - standardDeviation
highEndX = meanValue + standardDeviation
#lowEndX = 0
#highEndX = 1
dx = 0.02

x = seq( from = lowEndX, to = highEndX, by = dx)

y = (1 / (standardDeviation * sqrt(2 * pi))) * exp(-.5 * ((x - meanValue) / standardDeviation)^2)

#y = 6 * x * (1 - x)

plot( x, y, type="h", lwd=1, cex.axis=1.5 
  , xlab="x", ylab="p(x)", cex.lab=1.5
  , main="Normal Probability Density", cex.main=1.5)

line( x, y )

area = sum( dx * y )

text( -standardDeviation, .9*max(y), bquote( paste( mu, " = ", .(meanValue)) ), adj=c(1,.5) )
text( -standardDeviation, .8*max(y),  bquote( paste( mu, " = ", .(standardDeviation)) ), adj=c(1,.5) )
text( standardDeviation, .9*max(y), bquote( paste(Delta, "x = ", .(dx)) ), adj=c(0,.5) )
text( .5 * - standardDeviation, 1*max(y), 
  bquote(
    paste( sum(,x,), " ", Delta, "x p(x) = ", .(signif(area, 3)) )
    ), adj=c(0,.5) )

dev.copy2eps( file = "IntegralOfDensity.eps")

And here are the graphs: http://i.imgur.com/y98GlgI.png, http://i.imgur.com/LuRxTA1.png

3.5
http://i.imgur.com/5qggt6n.png

No, they are not independent. ~ (p(grade) = p(grade | food))

Emil O. W. Kirkegaard

Fri 12 Dec 2014 17:06

Admin

The seed only works for the first sample. It has not much effect when you do 500 samples.

Your 3.2 is not in agreement with mine. There are 8 tens/jacks, not 6. 2 of each suit, and 4 suits = 8.

Your text is outside the plot in the 2nd last graph. You can move them by changing the text commands.

Otherwise, I did not see anything else wrong. Waiting for exercises 4, so we can begin on exercises in 5 together. :)

Chadtech

Mon 15 Dec 2014 05:08

Just a heads up. I have finished reading chapter 4, but I cant promise I will be able to complete the exercises within 3 days.

Emil O. W. Kirkegaard

Mon 15 Dec 2014 05:45

Admin

I am reading the Baker book + various papers in the meanwhile.

https://www.goodreads.com/book/show/875481.Race

Chadtech

Sat 20 Dec 2014 09:00

Here is my work for chapter 4: http://imgur.com/a/9Rr1j

I avoided a few questions.

I dont feel very confident compared to Emil's spreadsheet.

Chadtech

Thu 08 Jan 2015 21:26

Hey,

Yeah Im not reading through this book any more. Nothing against the material (its great), I just cant commit to it.

Best,
-Ct

Emil O. W. Kirkegaard

Fri 09 Jan 2015 07:07

Admin

Ok.

Emil O. W. Kirkegaard

Thu 05 Feb 2015 23:18

Admin

[size=medium]5.1[/size]
A
x= BernBeta(c(4,4),1)

B
x= BernBeta(x,1)

C
x= BernBeta(x,0)

D
x= BernBeta(c(4,4),0)
x= BernBeta(x,1)
x= BernBeta(x,1)

[size=medium]5.2[/size]
A
x = BernBeta(c(1,1),c(rep(1,58),rep(0,42))) #uniform, 58 1's + 42 0's

HDI_95 = [48.3-67.3]

B
Yes. .50 is in the HDI

C
x = BernBeta(x,c(rep(1,57),rep(0,43))) #uniform, 57 1's + 43 0's

HDI_95 = [.506-.642]

D
No, .50 not within the HDI.

[size=medium]5.3[/size]
A
x = BernBeta(c(1,1),c(rep(1,40),rep(0,10)))
HDI_95 = [.677-.893]

Yes. .50 not in HDI.

B
BernBeta(c(1,1),c(rep(1,15),rep(0,35)))
HDI_95 = [.187-.433]

Yes. .50 not in HDI.

[size=medium]5.4[/size]
x = BernBeta(c(.5,.5),c(rep(1,4),0)) #inverted prior, 4 heads of 5

The plot function is a bit broken for this. One has to increase its width to see the end value of HDI. But it is .436-.998.

[size=medium]5.5[/size]
A
"strong" isn't quantified. But perhaps something like beta(50,50) is strong for this kind of belief. Probably, government coins are not exactly fair. Their sides are not the same, so they may have a small bias towards either side. The manufacture process is automatic, so presumably pretty reliable. I haven't seen any obviously unfair official coins. I've played with coin flips before and I didn't see any bias towards either side, so if it exists, it is too small to detect intuitively.

x = BernBeta(c(50,50),c(rep(1,9),0)) #strong prior, 9H out of 10

The predicted for the 11th flip is the mean of the distribution. This isn't shown, but it's approximately halfway between the HDI ends: .444-.629, so about 536.5. Or, just use 59/110 = 0.5364.

B
This sounds like a cheating coin, but I don't have any experience with them, or how cheaty they can be. But I don't it can be 100% cheaty in either side.

x = BernBeta(c(1.1,1.1),c(rep(1,9),0)) #maybe trick coin, 9H out of 10

[size=medium]5.6[/size]
So we need to do it both for a prior for a fair coin, and one for a trick coin.

x = BernBeta(c(50,50),c(rep(1,15),rep(0,5))) #strong prior, 15H out of 20
x = BernBeta(c(1.1,1.1),c(rep(1,15),rep(0,5))) #maybe trick coin, 15H out of 20

P(D|M_fair) = 1.33e-6
P(D|M_trick) = 3.16e-6

So Bayes' factor is about 2.38 for trick over fair model.

[size=medium]5.7[/size]
x = BernBeta(c(100,1),1) #strong head prior, 1H out of 1
x = BernBeta(c(1,100),1) #strong tails prior, 1H out of 1

P(D|M_head)=.99
P(D|M_tail)=.0099

Head model over tail model is .99/.0099 = 100.

[size=medium]5.8[/size]
A
x = BernBeta(c(100,1),c(rep(1,8),rep(0,4))) #strong head prior, 8H out of 12
x = BernBeta(c(1,100),c(rep(1,8),rep(0,4))) #strong head prior, 8H out of 12

P(D|M_head)=1.49e-7
P(D|M_head)=2.02e-12

Bayes factor = 73762

B
10000

C
Different values. The model does not specify the theta, only a distribution of thetas. So we have to sample from it.

D
No. The proportion of samples generating 8 heads out of 12 flips is very, very small for this model.

table(simSampleZrecord)

There are only 22 8's from 10000 samplings. Very rare.

Emil O. W. Kirkegaard

Mon 09 Feb 2015 04:43

Admin

6.1
A
It is the total probability of all parameter values. It is very close to 1 because that is the sum of any probability distribution. The reason it is not exactly 1 is because we are approximating it with grids (.9991421). One can increase the number of intervals to get higher precision. E.g. using nIntervals=100 instead of 10 makes the summed proability = 0.9999999.

B
There are now 11 values, not 10 as prespecified. The range is also changed from .05-.95 to 0-1. They make the function include too much. Each value is the midpoint of the square interval. Hence, if the last value is 1, and the width is .1, then one will be included .05 of the region outside 1. The reverse applies for the region around 0.

It matters very little as long as one does not use a very low number of nIntervals e.g. 10, then the sum becomes 1.000992. Slightly over 1 because the region outside was included at both ends. But if one increases the nIntervals, e.g. to 100, the error becomes smaller because the width of the rectangle becomes smaller and so the half width that's outside in both ends is very small.

6.2
A+B
Many ways to do it.

The way in the book is pretty ugly.

##Ugly prior
pTheta = c( 50:1 , rep(1,50) , 1:50 , 50:1, rep(1,50), 1:50) #trimodal distro
pTheta = pTheta/sum(pTheta) #standardize
nIntervals = length(pTheta) #and how many intervals
width = 1 / nIntervals #and width
Theta = seq( from = width/2 , to = 1-width/2 , by = width ) #interval midpoints

plot(Theta,pTheta) #plot
describe(Theta) #describe

data = c(rep(1,15),rep(0,5)) #specify some data
post = BernGrid(Theta, pTheta, data) #calculate

Alternative:

##Using three normal distributions
range = seq(-5,5,.01)
dist1 = dnorm(range, -5)
dist2 = dnorm(range, 0)
dist3 = dnorm(range, 5)
dist = dist1+dist2+dist3
pTheta = dist/sum(dist)
nIntervals = length(pTheta) #and how many intervals
width = 1 / nIntervals #and width
Theta = seq( from = width/2 , to = 1-width/2 , by = width ) #interval midpoints

plot(Theta,pTheta) #plot
describe(Theta) #describe

data = c(rep(1,15),rep(0,5)) #specify some data
post = BernGrid(Theta, pTheta, data) #calculate

[attachment=596]
[attachment=597]

6.3
A+B

##Two steps
data = c(rep(1,3),rep(0,1)) #first batch of data
post = BernGrid(Theta, pTheta, data) #calculate

data = c(rep(1,12),rep(0,4)) #second batch of data
post = BernGrid(Theta, post, data) #calculate from prior.. posterior

6.4
A
Same as in last chapter, but with discrete values. A uniform prior means that the prior has no effect.
HDI is .485-.675.

B
Yes, .50 is within HDI.

C
HDI = .505-.635.

D
No, .50 is not in the HDI.

6.5
With no prior data, it is tempting to just use a uniform prior. This is not rational in a real-world situation becus the company wud have gone bankrupt if the defect fraction was high (say >.5).

HDI is .045 to .075, which does not include .10, so the manager is probably right.

6.6
Seems unnecessary to use the comment code. I wrote my own

pTheta = (1:100)^2 #define prior
pTheta = pTheta/sum(pTheta) #normalize
nIntervals = length(pTheta) #and how many intervals
width = 1 / nIntervals #and width
Theta = seq( from = width/2 , to = 1-width/2 , by = width ) #interval midpoints

plot(Theta,pTheta) #plot to make sure it is right

data = c(1,1,0,0) # two heads two tails
post = BernGrid(Theta, pTheta, data) #calculate

Mean theta given the data is .624 which is our best guess for the next.

6.7 Comparing models
On head-biased model: P(D) = .00605
On tails-model: P(D) = .00131

So the ratio is about 4.62 in favor of the head-model.

#The first model is 6.6 above, second model given here
pTheta2 = (100:1)^2 #define prior, reverse from before
pTheta2 = pTheta2/sum(pTheta2) #normalize
nIntervals2 = length(pTheta2) #and how many intervals
width2 = 1 / nIntervals #and width
Theta2 = seq( from = width/2 , to = 1-width/2 , by = width ) #interval midpoints

plot(Theta2,pTheta2) #plot to make sure it is right

data = c(rep(1,6),rep(0,2)) # 6H 2T
post = BernGrid(Theta, pTheta, data) #calculate first model
post2 = BernGrid(Theta2, pTheta2, data) #calculate second model

6.8
A
No prior about the drug, but have a prior about the distribution of about 50.5% male births. One can take a prior from previous drugs that attemped to change the gender ratio. I'm fairly certain none of them worked, so one could just take 50.5% as the prior still. I take this as a strong prior.

B
With my prior, yes. HDI = .454-.613.

C
Proponent model: P(D) = 1.99e-15
Proponent model: P(D) = 1.01e-15

So ratio is ~2 in favor of proponent's model. So how strong the evidence is depends on the prior beliefs in each model.