Internal-external cross-validation

Ensuring external generalization of the trained models is a key factor for clinical translation. That is why we recommend that your report your accuracy results using an internal-external cross-validation scheme, as described in our paper. 


Class grouping

In our paper, we group annotations into classes and super-classes, based on clinical reasoning. We found that class grouping impacts interrater variability statistics and accuracy of model training. The dataset you download from here contains raw classes, providing you with the flexibility to explore different class grouping schemes to fit your research question.


Baseline accuracies

The following are the accuracy results from our paper. The train-test splits can be found in the same download link for the corrected single-rater dataset.