Evaluation


Metrics

Following the paper (He et al., 2021), the segmentation performance is evaluated in two aspects: (1) The area-based metric: Dice Similarity Coefficient (DSC) is used to evaluate the area-based overlap index. (2) The distance-based metric: Average Hausdorff Distance (AVD) is used to evaluate the coincidence of the surface for stable and less sensitive to outliers. Hausdorff Distance (HD), which is sensitive to outliers, is also used to further compare the segmentation quality of outliers.

For each target in one image, the ground truth segmentation point set is declared as A, and the predicted segmentation point set is B, then:

  • DSC (Dice Similarity Coefficient, larger is better)

  • HD (Hausdorff Distance, smaller is better)

    where

    ||a-b|| is Euclidean distance.

  • AVD (Average Hausdorff Distance, smaller is better)

    where


Ranking methods

  • Calculate the DSC, HD, and AVD for all targets (kidney, tumor, renal vein, and renal artery) in all cases, which means for each case you will get 12 scores.
  • Respectively average the 12 scores over all cases.
  • Rank the 12 scores separately. Then you will get 12 rankings.
  • Average these rankings.
  • Tie if the rankings are equal.

Precautions and troubleshooting

  • Failed submissions will not be counted into the submission limit. But we still suggest reading these precautions very carefully before creating submissions to avoid unnecessary problems.
  • It's strongly recommended to run the evaluation code on your own machine before creating your submission on Grand Challenge. You can use the label files predicted by the baseline methods as "ground truth". This will foresee some possible technical problems, like filename mismatch, image size mismatch, incorrect label values, image read failure, etc.
  • As you can see, we read your prediction files using SimpleITK, so if you get 'ITK ERROR: ITK only supports orthonormal direction cosines. No orthonormal definition found!', this is probably because that your prediction files are saved using nibabel. Setting qform and sform correctly before saving prediction files will solve this problem. For this problem, we recommend you to save your prediction files using code we provided in baseline

    import SimpleITK as sitk
    predict = sitk.GetImageFromArray(predict)
    sitk.WriteImage(predict, some_path)
  • If your submission shows "Succeeded" on Grand Challenge, but the results doesn't show on the "Leaderboards" or the results are different from your expectations, please contact us via email.