Covid-19 may produce a visual pattern in CT scans described as "grounded glass". See: https://pubs.rsna.org/doi/10.1148/radiol.2020200463. This visual pattern is clearly discernable for radiologists. We trained a CNN to detect these features. This gave us a remarkably high accuracy. Given the fact that we carefully selected and included corner-cases (pathological lungs which are hard to discern as either Covid-19 or non-Covid even for humans), we believe we are roughly on-par with other leading publications. Using the fast.ai library allowed us to achieve this rate even with fewer sample pictures and less code.
|Class name ||Count of images ||Description |
|Non-Covid||3721||groups together a range of diseases such as viral pneumonia, bacterial pneumonia, abscess, lipoid pneumonia, idiopathic pulmonary fibrosis, lung cancer and other diseases |
|COVID-19||3057||confirmed Covid-19 infections (via PCR, etc.) |
Also important to know, we made sure that pictures of the same patient do not appear in the training as well as validation sets.
Our data sets will be linked as soon as our publication is finished.
The data was hand-selected and verified by a medical professional radiologist. We excluded very close cases which would be hard for a person to discern. We rather - for this iteration - took clear Covid-19 pictures versus clearly pathological lungs (but non-Covid).
We are definitely looking at getting more data and a more diverse data set.
If you have more data and you are willing to share anonymized data sets for this open source project, please get in contact with us.
The accuracy on our validation set is 94.5%
, and on our more difficult testset which contains more hard to distinguish cases the accuracy is 88.9%
In previous trial training runs, we achieved even higher accuracies in the range of 95%-98%. However, we opted for reducing the accuracy in order to increase the stability (recall, specificity) of the model.
|positive predictive value (= precision):|| 95.775%|
|negative predictive value:|| 93.425%|
|positive predictive value (= precision): ||92.683%|
|negative predictive value: ||85.714%|
Types of data sets
We used open source Covid-19 data sets and combined them. A special focus was put on hand-selecting the data. Quite a few of the open source datasets contained inconsistencies and also wrong labels.
Likewise, we made sure that mild Covid cases were also included. This allows us to draw a sharp line in the COVID-19 and non-COVID classification. All COVID-19 images were confirmed via other tests (such as PCR) and hand-checked by Dr. Javor.
How did we train?
We used a pre-trained Resnet50 provided by the fastai2 library. Images were rescaled to 448x448 pixels, as is common for CNNs. We used 17 epochs. The data was randomly split into a training and a validation set (20% were taken for validation, with seed=43). The split between training and validation and test set was done on a per-patient level. In other words: we made sure that different pictures of the same patient would not appear in two or three sets at the same time. A patient's pictures appeared either in the training set, the validation set or the test set.
We used the default augmentations offered by fastai. In addition, we added an independent test-set which was taken from another open source data set, it was not contained in the training nor validation sets. We were able to confirm our findings with this independent test set. Note that the test set was intentionally chosen to contain slightly harder examples.
Our model and the jupyter notebook showing our training steps in detail can be found on our github repo.
How did we test?
First of all, we evaluated the training not only with the validation set, but also by a separate test set. The same test set was then shown to two human radiologists with 15 years of experience. The AI algorithm was able to beat one radiologist and coming very close in terms of accuracy to the second one.
Deep Learning Library
We use the fastai version 2 framework for training the model. Many thanks go to Jeremy Howard and the fastai team!
Model and Hyperparameters
We trained the model with a standard fastai2 resnet50. Images were standardized to 448x448 pixels. The random seed was set to 43 and kept constant in order to reproduce our results. We used the default transofmrations of fastai2, which are at the time of our training:
We used the Adam Optimizer, the loss function was the FlattenedLoss of CrossEntropyLoss().
We used 17 epochs of training via the fastai2 library () on a single Nvidia Tesla GPU with 16 GB of VRAM. Batch size was set to 32. As opposed to (5), we did not do any prior lung segmentation or preprocessing. Our results seem to indicate that the preprocessing step may be omitted and nevertheless achieve high sensitivity and specificity. It seems that a plain resnet50 is good enough for the task.
Previous research and COVID-19 databases