-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Note: The content of this issue is as of #33 (or equivalent) being merged.
Code:
Since we want to be able to submit this to http://rescience.github.io/, I made a beeline for replicating two of the core papers: ProtoTree and ProtoPNet. I've tried to keep the code reasonably modular etc so that it can be reused and turned into a library in the future, but for now some of the things in the old code that weren't blockers for replicating the papers have not yet been changed/added, e.g.
- Make the models work on multiple GPUs #39
- Switch from manual arg handling to Lightning CLI (or similar) #34
- Make new models saveable #36
- Make the models usable without the main run_model.py script #35
- Remove manual evaluation / test set code #38
- Improve testing #23
See the issues page for additional items.
In terms of interpretability, patch visualizations are available for both models, and for the ProtoTree model there are additional visualizations of the patches and decision processes used to classify specific images.
Experiments:
So far, I've managed to replicate the claimed accuracy of both papers, with the hyperparameters and settings their authors used. One thing I've noticed is that both papers employ custom, complex optimization algorithms and manually tuned hyperparameters; however, I've discovered that in many cases we can train the models fine (often achieving comparable or superior accuracy) with just Adam and standard hyperparameters. I've started running some further experiments to try to understand to what extent these prototype-based models can use much simpler training procedures. We need to be mindful of #37, since it's unclear to what extent this has impacted the hyperparameters and accuracy.
Completed experiments:
- Core replications
- ProtoPNet with just Adam (no projection/convex optim on last layer), this seems to result in the same/higher accuracy than the original ProtoPNet algorithm.
- Qualitative analysis of visualizations (although it would be nice to do a little bit more here).
Remaining experiments, there's lots we could do here, but the main theme here is that we want to see how well these models do with standard optimization and hyperparams:
- Determine how useful the milestone/LR decay is. Preliminary results on the CUB dataset with the ProtoTree hyperparameters indicate no impact on ProtoPNet, and a moderate positive impact on ProtoTree. We need to understand whether these hyperparameters are dataset specific and/or how easy they would be to find in datasets where we don't have access to the test set.
- Try using standard backbone instead of one pretrained on INaturalist.
- Train trees with gradient based optimization for the leaves (preliminary results are positive).
- Determine the impact of the ProtoPNet cluster/separation costs.
- How beneficial is projection? (preliminary results: not very useful, perhaps even harmful)
- How important is backbone freezing?
- Do smaller batch sizes improve the ability to generalize? (Based on other neural networks, I'd assume the answer is yes, but it would be nice to have quantitative evidence) And should we make learning rate vary depending on batch size?
- More runs on car, dog, and other datasets
Paper:
I've started writing this, but a lot of the remaining stuff will depend on the outcomes of the experiments.
One thing we'll need to decide is whether we present our work as pure replications of two papers, or as a more general investigation into how prototype-based models can be more easily trained. Personally, I think the latter is a more accurate representation of what has been done, as well as being more useful to others in the field.