Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
155 commits
Select commit Hold shift + click to select a range
d1f3c31
Merge pull request #4 from seankross/master
jtleek Oct 21, 2016
1413961
Merge branch 'new'
rdpeng Oct 21, 2016
34c9b5f
data servers
rdpeng Oct 26, 2016
5cbe475
fixed typos
seankross Oct 26, 2016
9f91778
Merge pull request #5 from seankross/master
jtleek Oct 26, 2016
2df7831
NSSD 25
rdpeng Oct 28, 2016
0791470
adding plot
rafalab Nov 3, 2016
167ae12
fixing link to pic
rafalab Nov 3, 2016
c9a8f6e
moving file
rafalab Nov 3, 2016
1c8391a
edits
rafalab Nov 3, 2016
24c4bf0
chromebook picture
jtleek Nov 8, 2016
b3d30b2
chromebook2 post
jtleek Nov 8, 2016
f227d88
Delete 2016-11-08-chromebook-part2.md
jtleek Nov 8, 2016
32bf0f2
Create 2016-11-08-chromebook-part2.md
jtleek Nov 8, 2016
7365497
png instead of pn
rafalab Nov 9, 2016
2e4d4a9
adding a missing html
rafalab Nov 9, 2016
980b3bf
adding a missing html
rafalab Nov 9, 2016
95bb64b
adding a missing html
rafalab Nov 9, 2016
88b005c
adding a missing html
rafalab Nov 9, 2016
9057396
adding figs
rafalab Nov 9, 2016
db46706
addin election png
rafalab Nov 9, 2016
46b50e7
adding post
rafalab Nov 9, 2016
e4e8b56
adding post
rafalab Nov 9, 2016
9fb986c
adding post
rafalab Nov 9, 2016
462e597
adding post
rafalab Nov 9, 2016
3b0822f
adding post
rafalab Nov 9, 2016
fe71b73
adding post
rafalab Nov 9, 2016
bc6fd3e
adding post
rafalab Nov 9, 2016
0fc73ee
adding post
rafalab Nov 9, 2016
a325815
adding post
rafalab Nov 9, 2016
17b3565
adding post
rafalab Nov 9, 2016
2777827
adding post
rafalab Nov 9, 2016
4c5e023
adding post
rafalab Nov 9, 2016
eed6d5d
adding post
rafalab Nov 9, 2016
9c0a0df
adding post
rafalab Nov 9, 2016
b3b3e2f
adding post
rafalab Nov 9, 2016
22dab1d
adding post
rafalab Nov 9, 2016
caf5b36
adding post
rafalab Nov 9, 2016
b856006
adding post
rafalab Nov 9, 2016
c5a4773
adding post
rafalab Nov 9, 2016
6122b98
adding post
rafalab Nov 10, 2016
75bd18e
adding post
rafalab Nov 11, 2016
66c75a6
adding post
rafalab Nov 11, 2016
afcddd3
adding post
rafalab Nov 11, 2016
a4892b7
adding post
rafalab Nov 11, 2016
4e4ddf2
adding post
rafalab Nov 11, 2016
378f821
adding post
rafalab Nov 11, 2016
9e4c030
adding post
rafalab Nov 11, 2016
984a263
added leekgroup colors post.
jtleek Nov 17, 2016
48d431d
NSSD 27
rdpeng Nov 30, 2016
3669fa3
Get date right
rdpeng Nov 30, 2016
9178232
editing translation error
rafalab Dec 7, 2016
de5ca16
adding PISA plots
rafalab Dec 9, 2016
7b9235e
adding post on PISA math
rafalab Dec 9, 2016
d3ce091
adding post on PISA math
rafalab Dec 9, 2016
70c6048
adding post on PISA math
rafalab Dec 9, 2016
eee28f3
adding post on PISA math
rafalab Dec 9, 2016
55fbd0e
adding post on PISA math
rafalab Dec 9, 2016
21e0346
adding post on PISA math
rafalab Dec 9, 2016
ba44352
NSSD Ep 28
rdpeng Dec 15, 2016
21b0afc
updatded w/post
jtleek Dec 16, 2016
b13e038
updated post
jtleek Dec 16, 2016
ed94b39
fixed 4 eras
jtleek Dec 16, 2016
4ec8ebe
updated w/post
jtleek Dec 20, 2016
7519429
swapped out an e for an a in my last name 🙈
LucyMcGowan Dec 21, 2016
d8c6fbb
Merge pull request #6 from LucyMcGowan/patch-1
jtleek Dec 21, 2016
431efd1
Removed an extra apostrophe
jfiksel Dec 22, 2016
2305f31
Merge pull request #7 from jfiksel/patch-1
jtleek Dec 22, 2016
e5a7c08
stress blog post
jtleek Dec 29, 2016
e598764
Update 2016-12-29-some-stress-reducers.md
jtleek Dec 30, 2016
fb3e12b
NSSD Episode 30
rdpeng Jan 10, 2017
f62641b
Episode 30
rdpeng Jan 10, 2017
32043ea
New post
rdpeng Jan 17, 2017
0284399
Update
rdpeng Jan 17, 2017
51be42b
data prototyping
jtleek Jan 18, 2017
716dd6e
updated post
jtleek Jan 18, 2017
dd6f504
updated with dsl link
jtleek Jan 18, 2017
5b1f81a
added demystify post
jtleek Jan 19, 2017
812a5dd
updated post
jtleek Jan 19, 2017
07b1e94
updated with new AI post
jtleek Jan 20, 2017
cac9d8d
updated title
jtleek Jan 20, 2017
389051f
updated with picture
jtleek Jan 20, 2017
4be4122
fixed picture again doh
jtleek Jan 20, 2017
7a753a6
updated references
jtleek Jan 20, 2017
7640882
Spelling
rdpeng Jan 21, 2017
5c1dfd8
Add images
rdpeng Jan 23, 2017
f4a8db7
New post UX/Value
rdpeng Jan 23, 2017
8a5a55c
Colons!
rdpeng Jan 23, 2017
614a32c
Update
rdpeng Jan 24, 2017
2334fb7
updated class description
jtleek Jan 26, 2017
2bf8fc4
updated title
jtleek Jan 26, 2017
ea43fcc
new post
jtleek Jan 31, 2017
7de0eda
updated w/ad
jtleek Jan 31, 2017
6724edd
New post
rdpeng Feb 1, 2017
454719f
NSSD 32
rdpeng Feb 14, 2017
008f714
Bloomberg article
rdpeng Feb 15, 2017
92be2e6
New post
rdpeng Feb 20, 2017
354746f
update
rdpeng Feb 20, 2017
dbb1d25
updated w/ml and earthquakes
jtleek Feb 23, 2017
332e28d
Updated with repro post
jtleek Mar 2, 2017
a75e030
updated title
jtleek Mar 2, 2017
de3acbd
attempt to edit
rdpeng Mar 2, 2017
b4288aa
edit
rdpeng Mar 2, 2017
cde38c9
editing
rdpeng Mar 3, 2017
9870f0d
TED
rdpeng Mar 3, 2017
c1de174
New post
rdpeng Mar 7, 2017
69e9c65
New post
rdpeng Mar 8, 2017
7088e54
edits
rafalab Mar 14, 2017
2ca31d6
added post on data science classes
jtleek Mar 16, 2017
1d7024e
fixed w/link to stephanie/rafa
jtleek Mar 16, 2017
70b1a2a
edits
jtleek Mar 16, 2017
9b13bb9
adding post
rafalab Apr 3, 2017
de5e81e
adding post
rafalab Apr 3, 2017
89e85f5
fixing typo
rafalab Apr 3, 2017
0d910bb
fixing typo
rafalab Apr 3, 2017
685ac56
fixing typo
rafalab Apr 3, 2017
face284
fixing typo
rafalab Apr 3, 2017
d5be4db
fixing typo
rafalab Apr 4, 2017
846ef9c
fixing typo
rafalab Apr 4, 2017
fcf65c5
fixing typo
rafalab Apr 4, 2017
fbd3ad5
fixing typo
rafalab Apr 4, 2017
fafaccb
fixing typo
rafalab Apr 4, 2017
89836ab
fixing typo
rafalab Apr 4, 2017
95d1dc9
fixing typo
rafalab Apr 4, 2017
4c2cb7f
fixing typo
rafalab Apr 4, 2017
fdee12a
adding new figs
rafalab Apr 6, 2017
5435cb8
adding post
rafalab Apr 6, 2017
6e61698
adding post
rafalab Apr 6, 2017
290b135
adding post
rafalab Apr 6, 2017
533fc20
adding post
rafalab Apr 6, 2017
c8adc5e
adding post
rafalab Apr 6, 2017
6459c04
adding post
rafalab Apr 6, 2017
c809fd6
adding post
rafalab Apr 6, 2017
b04a730
adding post
rafalab Apr 6, 2017
489957f
adding post
rafalab Apr 6, 2017
c4315ed
adding post
rafalab Apr 6, 2017
a3dddf3
adding post
rafalab Apr 6, 2017
290ae7b
adding post
rafalab Apr 6, 2017
c7ce04d
adding post
rafalab Apr 7, 2017
e4141c4
adding post
rafalab Apr 24, 2017
08c091a
adding post
rafalab Apr 24, 2017
f478548
adding post
rafalab Apr 24, 2017
6a0bfa7
adding post
rafalab Apr 24, 2017
d602475
editing post
rafalab Apr 24, 2017
6ee6dd1
editing post
rafalab Apr 24, 2017
b89e3b5
editing post
rafalab Apr 24, 2017
1e1c029
editing post
rafalab Apr 24, 2017
ae28826
adding temp
rafalab Apr 24, 2017
1bf3488
adding temp
rafalab Apr 24, 2017
2135398
removing duplicate post
rafalab Apr 24, 2017
5285288
removing duplicate post
rafalab Apr 24, 2017
2b18db0
removing duplicate post
rafalab Apr 24, 2017
0c0e8cb
adding haircut post
rafalab May 4, 2017
7ac2370
Add files via upload
sandipan May 11, 2017
4f8d8f9
Delete 2017-05-11-order-stat-auction.md
sandipan May 11, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions _Rmd_files/2016-12-13-leekgroup-plots.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
---
title: "Leek group guide to making plots"
output: html_document
---

I have written a few guides for people in academics including:

* [How to write your first paper](https://github.com/jtleek/firstpaper)
* [How to review a paper](https://github.com/jtleek/reviews)
* [How to share data](https://github.com/jtleek/datasharing)
* [How to write an R package](https://github.com/jtleek/rpackages)
* [How to read academic papers](https://github.com/jtleek/readingpapers)

The purpose of these guides has partially been for other people to use them outside of my research group. But the main driver has been having a set of tutorials that can be a sort of "onboarding" for new members of my research group.

Recently we had to work collectively on a project where multiple members were each sending in plots and I realized that they looked very different in aesthetic, color scheme, and organization. The result is that it was pretty hard to put the figures together in a paper. It also means that when we use each other's slides in talks there is no coherent pattern to what a plot will look like.

Other organizations - like [fivethirtyeight](http://fivethirtyeight.com/) have a consistent look and feel to their graphics. They do this (I imagine) largely as a defense mechanism - they have to produce plots every day! But I think that it also adds to the professionalism of the data analysis products they produce.

I realized I would like my research group to have a similar type of professionalism to our plots since we regularly produce data products and have to illustrate scientific data.

This is a guide for how plots should be made in the Leek group. I hope it will evolve over time as members of the group weigh in on their opinions. There is a corresponding

*[Leek group plotting R package](link TBD)

that you can use to make plots like ours if you want to with both ggplot2 and base R plotting parameters set up.

## Expository versus exploratory graphs

If you are analyzing data you make plots all of the time. This is part of the interactive data analysis workflow. When exploring data you should not spend time on how the plots look. They should be ugly and fast so you can quickly explore a data set. This guide does not apply to exploratory plots.

Expository plots are plots that we intend to distribute as part of a paper, blog post, or other communication of our results. Expository plots differ from exploratory plots because they are intended to communicate information to someone who is not you. The key principles behind Leek group expository plots are:

(1) They communicate the answer to a specific scientific question
(2) Each plot answers a single scientific question
(3) Each plot will have a figure caption describing the key story in the plot
(4) The figure and legend are sufficient to communicate a scientific message without the surrounding paper text.
(5) They have a consistent color theme, point type, and font.

Point (4) is directly related to the Leek group [guide to writing the first paper](https://github.com/jtleek/firstpaper)
Binary file added _images/2017-04-06/IMG_7075.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2017-04-06/IMG_7076.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2017-04-06/cambio-en-matricula.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2017-04-06/costo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2017-04-06/matricula.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2017-05-04/haircuts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2x2-table-results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/2x2-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/Flowchart-full.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/Flowchart-partial.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/Flowchart.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/ai-album.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/alexa-ai.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/cartoon-phone-photos.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/chromebook2.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/heights-with-outlier.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/images-to-numbers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/importance-not-size.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff-color-names.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff-rgb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff-smile-dots.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff-smile-lines.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff-smile.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/jeff.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/labels-to-numbers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/many-workflows.png
Binary file added _images/movie-ai.png
Binary file added _images/notajftweet.png
Binary file added _images/papr.png
Binary file added _images/pisa-2015-math-v-others.png
Binary file added _images/pisa-2015-scatter.png
Binary file added _images/silver3.png
Binary file added _images/timeline-ai.png
Binary file added _images/us-election-2016-538-prediction.png
Binary file added _images/us-election-2016-538-v-upshot.png
Binary file added _images/ux1.png
Binary file added _images/ux2.png
Binary file added _images/workflow.png
2 changes: 1 addition & 1 deletion _posts/2011-12-03-reverse-scooping.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ tags:
- advice
- Rant
---
I would like to define a new term: r_everse scooping_ is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.
I would like to define a new term: _reverse scooping_ is when someone publishes your idea after you, and doesn’t cite you. It has happened to me a few times. What does one do? I usually send a polite message to the authors with a link to my related paper(s). These emails are usually ignored, but not always. Most times I don’t think it is malicious though. In fact, I almost reverse scooped a colleague recently.  People arrive at the same idea a few months (or years) later and there is just too much literature to keep track-off. And remember the culprit authors were not the only ones that missed your paper, the referees and associate editor missed it as well. One thing I have learned is that if you want to claim an idea, try to include it in the title or abstract as very few papers get read cover-to-cover.
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,6 @@ While the pundits were claiming the race was a “dead heat”, the day

**Update****: **Congratulations also to Sam Wang (<a href="http://election.princeton.edu/" target="_blank">Princeton Election Consortium</a>) and Simon Jackman (<a href="http://www.huffingtonpost.com/simon-jackman/pollster-predictions_b_2081013.html" target="_blank">pollster</a>) that also called the election perfectly. And thanks to the pollsters that provided the unbiased (on average) data used by all these folks. Data analysts won &#8220;experts&#8221; lost.

**Update 2**: New plot with data from <a href="http://www.foxnews.com/politics/elections/2012-election-results/" target="_blank">here</a>. Old graph <a href="http://rafalab.jhsph.edu/simplystats/silver.png" target="_blank">here</a>.
~~**Update 2**: New plot with data from <a href="http://www.foxnews.com/politics/elections/2012-election-results/" target="_blank">here</a>. Old graph <a href="http://rafalab.jhsph.edu/simplystats/silver.png" target="_blank">here</a>.~~

<a href="http://rafalab.jhsph.edu/simplystats/silver2.png" target="_blank"><img height="500" src="http://rafalab.jhsph.edu/simplystats/silver2.png" width="500" /></a>
![Observed versus predicted](https://raw.githubusercontent.com/simplystats/simplystats.github.io/master/_images/silver3.png)
Original file line number Diff line number Diff line change
Expand Up @@ -35,36 +35,34 @@ In a recent New York Times [article](http://www.nytimes.com/2014/09/30/science/t



Because the real story (or non-story) is way too boring to sell newspapers, the author resorted to a sensationalist narrative that went something like this:  "Evil and/or stupid frequentists were ready to let a fisherman die; the persecuted Bayesian heroes saved him." This piece adds to the growing number of writings blaming frequentist statistics for the so-called reproducibility crisis in science. If there is something Roger, [Jeff](http://simplystatistics.org/2013/11/26/statistical-zealots/) <a>and </a>[I](http://simplystatistics.org/2013/08/01/the-roc-curves-of-science/) <a>agree on is that this debate is </a>[not constructive](http://noahpinionblog.blogspot.com/2013/01/bayesian-vs-frequentist-is-there-any.html). As </a>[Rob Kass](http://arxiv.org/pdf/1106.2895v2.pdf) <a>suggests it's time to move on to pragmatism. Here I follow up Jeff's <a href="http://simplystatistics.org/2014/09/30/you-think-p-values-are-bad-i-say-show-me-the-data/">recent post</a> by sharing related thoughts brought about by two decades of practicing applied statistics and hope it helps put this unhelpful debate to rest.</p>
Because the real story (or non-story) is way too boring to sell newspapers, the author resorted to a sensationalist narrative that went something like this:  "Evil and/or stupid frequentists were ready to let a fisherman die; the persecuted Bayesian heroes saved him." This piece adds to the growing number of writings blaming frequentist statistics for the so-called reproducibility crisis in science. If there is something Roger, [Jeff](http://simplystatistics.org/2013/11/26/statistical-zealots/) and [I](http://simplystatistics.org/2013/08/01/the-roc-curves-of-science/) agree on is that this debate is [not constructive](http://noahpinionblog.blogspot.com/2013/01/bayesian-vs-frequentist-is-there-any.html). As [Rob Kass](http://arxiv.org/pdf/1106.2895v2.pdf) suggests it's time to move on to pragmatism. Here I follow up Jeff's [recent post](http://simplystatistics.org/2014/09/30/you-think-p-values-are-bad-i-say-show-me-the-data/) by sharing related thoughts brought about by two decades of practicing applied statistics and hope it helps put this unhelpful debate to rest.


<p>
Applied statisticians help answer questions with data. How should I design a roulette so my casino makes $? Does this fertilizer increase crop yield? Does streptomycin cure pulmonary tuberculosis? Does smoking cause cancer? What movie would would this user enjoy? Which baseball player should the Red Sox give a contract to? Should this patient receive chemotherapy? Our involvement typically means analyzing data and designing experiments. To do this we use a variety of techniques that have been successfully applied in the past and that we have mathematically shown to have desirable properties. Some of these tools are frequentist, some of them are Bayesian, some could be argued to be both, and some don't even use probability. The Casino will do just fine with frequentist statistics, while the baseball team might want to apply a Bayesian approach to avoid overpaying for players that have simply been lucky.
</p>

<p>
It is also important to remember that good applied statisticians also *think*. They don't apply techniques blindly or religiously. If applied statisticians, regardless of their philosophical bent, are asked if the sun just exploded, they would not design an experiment as the one depicted in this popular XKCD cartoon.
</p>

<p>

It is also important to remember that good applied statisticians also **think**. They don't apply techniques blindly or religiously. If applied statisticians, regardless of their philosophical bent, are asked if the sun just exploded, they would not design an experiment as the one depicted in this popular XKCD cartoon.



<a href="http://xkcd.com/1132/"><img class="aligncenter" src="http://imgs.xkcd.com/comics/frequentists_vs_bayesians.png" alt="" width="234" height="355" /></a>
</p>

<p>


Only someone that does not know how to think like a statistician would act like the frequentists in the cartoon. Unfortunately we do have such people analyzing data. But their choice of technique is not the problem, it's their lack of critical thinking. However, even the most frequentist-appearing applied statistician understands Bayes rule and will adapt the Bayesian approach when appropriate. In the above XCKD example, any respectful applied statistician would not even bother examining the data (the dice roll), because they would assign a probability of 0 to the sun exploding (the empirical prior based on the fact that they are alive). However, superficial propositions arguing for wider adoption of Bayesian methods fail to realize that using these techniques in an actual data analysis project is very different from simply thinking like a Bayesian. To do this we have to represent our intuition or prior knowledge (or whatever you want to call it) with mathematical formulae. When theoretical Bayesians pick these priors, they mainly have mathematical/computational considerations in mind. In practice we can't afford this luxury: a bad prior will render the analysis useless regardless of its convenient mathematically properties.
</p>

<p>
Despite these challenges, applied statisticians regularly use Bayesian techniques successfully. In one of the fields I work in, Genomics, empirical Bayes techniques are widely used. In <a href="http://www.ncbi.nlm.nih.gov/pubmed/16646809">this</a> popular application of empirical Bayes we use data from all genes to improve the precision of estimates obtained for specific genes. However, the most widely used output of the software implementation is not a posterior probability. Instead, an empirical Bayes technique is used to improve the estimate of the standard error used in a good ol' fashioned t-test. This idea has changed the way thousands of Biologists search for differential expressed genes and is, in my opinion, one of the most important contributions of Statistics to Genomics. Is this approach frequentist? Bayesian? To this applied statistician it doesn't really matter.
</p>

<p>
For those arguing that simply switching to a Bayesian philosophy will improve the current state of affairs, let's consider the smoking and cancer example. Today there is wide agreement that smoking causes lung cancer. Without a clear deductive biochemical/physiological argument and without<br /> the possibility of a randomized trial, this connection was established with a series of observational studies. Most, if not all, of the associated data analyses were based on frequentist techniques. None of the reported confidence intervals on their own established the consensus. Instead, as usually happens in science, a long series of studies supporting this conclusion were needed. How exactly would this have been different with a strictly Bayesian approach? Would a single paper been enough? Would using priors helped given the "expert knowledge" at the time (see below)?
</p>

<p>
Despite these challenges, applied statisticians regularly use Bayesian techniques successfully. In one of the fields I work in, Genomics, empirical Bayes techniques are widely used. In [this](http://www.ncbi.nlm.nih.gov/pubmed/16646809) popular application of empirical Bayes we use data from all genes to improve the precision of estimates obtained for specific genes. However, the most widely used output of the software implementation is not a posterior probability. Instead, an empirical Bayes technique is used to improve the estimate of the standard error used in a good ol' fashioned t-test. This idea has changed the way thousands of Biologists search for differential expressed genes and is, in my opinion, one of the most important contributions of Statistics to Genomics. Is this approach frequentist? Bayesian? To this applied statistician it doesn't really matter.



For those arguing that simply switching to a Bayesian philosophy will improve the current state of affairs, let's consider the smoking and cancer example. Today there is wide agreement that smoking causes lung cancer. Without a clear deductive biochemical/physiological argument and without the possibility of a randomized trial, this connection was established with a series of observational studies. Most, if not all, of the associated data analyses were based on frequentist techniques. None of the reported confidence intervals on their own established the consensus. Instead, as usually happens in science, a long series of studies supporting this conclusion were needed. How exactly would this have been different with a strictly Bayesian approach? Would a single paper been enough? Would using priors helped given the "expert knowledge" at the time (see below)?



<img src="http://cdn.saveourbones.com/wp-content/uploads/smoking_doctor.jpg" width="234" height="355" class="aligncenter" alt="" />
</p>

<p>

And how would the Bayesian analysis performed by tabacco companies shape the debate? Ultimately, I think applied statisticians would have made an equally convincing case against smoking with Bayesian posteriors as opposed to frequentist confidence intervals. Going forward I hope applied statisticians continue to be free to use whatever techniques they see fit and that critical thinking about data continues to be what distinguishes us. Imposing Bayesian or frequentists philosophy on us would be a disaster.
</p>
4 changes: 2 additions & 2 deletions _posts/2014-11-04-538-election-forecasts-made-simple.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ categories:
---
Nate Silver does a [great job](http://fivethirtyeight.com/features/how-the-fivethirtyeight-senate-forecast-model-works/) of explaining his forecast model to laypeople. However, as a statistician I've always wanted to know more details. After preparing a "<span class="s2"><a href="http://cs109.github.io/2014/pages/homework.html">predict the midterm elections</a>" </span>homework for my [<span class="s2">data science class</span>](http://cs109.github.io/2014) I have a better idea of what is going on.

[Here](http://rafalab.jhsph.edu/simplystats/midterm2012.html) is my best attempt at explaining the ideas of 538 using formulas and data. And [here](http://rafalab.jhsph.edu/simplystats/midterm2012.Rmd) is the R markdown.
[Here](http://simplystatistics.org/html/midterm2012.html) is my best attempt at explaining the ideas of 538 using formulas and data. ~~And [here](http://rafalab.jhsph.edu/simplystats/midterm2012.Rmd) is the R markdown.~~

&nbsp;

&nbsp;

&nbsp;
&nbsp;
Loading