What is different about this data?¶
Beside the large scale and novel data generation methodology, this dataset is unique in that it includes multi-rater data from various participants of different levels of expertise in histopathology. The dataset also contains annotations from the same participants with or without algorithmic suggestions. As a result of the data collection scheme, the resultant hybrid datasets contain a mixture of bounding box and segmentation annotations.
How can I use this data?¶
The most obvious answer is: to train your own models and analyze slides for your own research project. That being said, this is an educational challenge, so feel free to explore ways to improve accuracy or to use the data in creative ways. In our paper, we explore:
- Adaptations to MaskRCNN to improve prediction in the context of nucleus classification, localization and segmentation
- Adapting models to work on hybrid datasets containing both bounding boxes and segmentation boundaries.
- Improved interpretability of model decision through Decision Tree Approximation of Learned Embeddings (DTALE).
Each of the above points is an open area for innovation, and we'd love to see the creative ways in which you address these problems. Additional points include:
- Creative ways to use and analyze the multi-rater data.
- Creative ways to improve prediction on uncommon classes or to supplement this dataset with others.
- Discovery of novel pathomic and genomic biomarkers using predictions from models trained using this data.