You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: notes/main.typ
+22Lines changed: 22 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -102,6 +102,28 @@ Form the famous movie #link("https://en.wikipedia.org/wiki/Rebel_Without_a_Cause
102
102
The special issue Volume 8, Issue 2, 2022
103
103
Issue of #emph("Observational Studies") titleed #link("https://en.wikipedia.org/wiki/Rebel_with_a_Cause_(book)")[`Rebel With a Cause`]
104
104
105
+
== Fun example
106
+
107
+
=== on overparameterized models
108
+
109
+
Form comment in #link("https://statmodeling.stat.columbia.edu/2025/11/14/how-is-it-that-this-problem-with-its-21-data-points-is-so-much-easier-to-handle-with-1-predictor-than-with-16-predictors/")[`Impossible statistical problems`] of Andrew Gelman by Phil, November 14, 2024.
110
+
111
+
#quote("I’m imagining a political science student coming in for statistical advice:
112
+
Student: I’m trying to predict the Democratic percentage of the two-party vote in U.S. Presidential elections, six months before Election Day. I want to use just the past ten elections because I think the political landscape was too different before that.
113
+
Statistician: Sounds interesting. What predictive variables do you have?
114
+
Student: I’ve got the Democratic share in the last election, and the change in unemployment rate over the past year and the past three years, and the inflation rate over the past year and the past three years, and the change in median income over the past year and past three years.
115
+
Statistician: That’s a lot of predictors for not many elections, we are going to have some issues, but maybe we can use lasso or a regularization scheme or something. Let’s get started.
116
+
Student: I also own an almanac.
117
+
Statistician: Oh. Sorry, I can’t help you, your problem is impossible.")
118
+
119
+
10 data points and 7 predictors, there are somthing to do, with a almanac, 1000+ predictors, the problem is impossible since the model is overparameterized and can not give any prediction power for future.
120
+
121
+
Thus, in tiny sample point, give too much useless predictors may indeed polute the data and make the problem impossible.
122
+
123
+
#question("Dense and lower high dimensional model")[
124
+
In dense high dimensional model, and number of samples $n$ is not so big,
If the model is misspecified, it just like the above example, may too many useless predictors and give a useless prediction. "Is there an example of such dense high dimensional model?]
105
127
106
128
= On the undistinguishable or identification of statistical models
0 commit comments