Overall, 87% of the radiation oncologists who participated in the study gave the MVision AI Contour+ a score of 4 or 5 (minor or no editing required).
How to properly test an AI-based auto-contouring solution?
Artificial-intelligence applications are becoming increasingly popular in medicine, including Radiotherapy. Before AI-based tools can be widely implemented, they need to be carefully evaluated in terms of safety and efficacy. Auto-contouring solutions are usually evaluated for time saving and quality. The perceived performance of the AI models can be judged statistically according to the spatial similarity with a reference set of manually contoured structures and qualitatively for clinical acceptance.
If the team chooses to test the new solution by comparing it to clinical contours from previously treated patients, the results might be misleading due to the heterogeneity of the reference data. The well-known interobserver variability of manual contours and possible deviations from international standards and guidelines make the use of retrospective clinical data problematic.
A better solution would be the creation of a well curated reference dataset generated by experts although this requires substantially more time and effort. Even if subjectivity cannot be completely avoided, this approach offers more reliable results than using uncurated retrospective clinical data.
What method has been successfully used?
A group of experienced medical physicists and radiation oncologists from IRCCS Azienda Ospedaliero-Universitaria di Bologna published this year the results of a complex evaluation of MVision AI models’ quality (1). The authors used multiple parameters to assess the clinical acceptability and the potential for improving their daily activity.
Measuring distances, overlapping surfaces and analyzing percentages is something that we can use for objective judgment, but sometimes the numbers are not enough to estimate their practical significance. The time saved by using automated predictions and the grade of Radiation Oncologists’ satisfaction represent pragmatic modalities of evaluation. Consistency, which should be as high as possible, is an important aspect in radiotherapy.
To assess the impact of using Contour+, the team evaluated the interobserver variability between the manual contours and between the manually adjusted MVision AI predictions, respectively.
The complex approach of the Italian group is reflected also by the wide variety of analyzed cases. One hundred and eleven cases which had a tumor located in one of the most frequent anatomical sites were selected. Head and neck, breast, thorax, abdomen, male or female pelvis cases were manually delineated according to institutional protocols which were based on ESTRO and RTOG guidelines.
MVision AI Contour+ provided the study contours for 59 different structures – organs at risk or CTV structures as lymph node regions and seminal vesicles. It is important to mention that the patients included were scanned on two different machines, with variable slice thickness, and that for 67% of the patients from the abdominal site, 80% from those from the head and neck and 90% from those with thoracic cancers had contrast medium used.
What were the results?
Such an ample evaluation brings an equally substantial amount of data, which cannot be covered in a short summary. We chose to mention some of the results that have practical importance.
The quality of the automatic contours was graded as 1 if unusable; 2, 3 and 4 if various grades of adjustments were needed (2 – major; 3 – some; 4 – minor) and 5 if there was no need of editing. The satisfactory grade was paired with the Lickert scale, ranging from 1 (poor) to 5 (excellent).
Overall, 44% of the radiation oncologists who participated in the study gave the MVision AI solution a score of 4 and 43% gave the maximum of scoring, meaning 5. The score given by senior radiation oncologists was significantly higher than the scoring made by junior radiation oncologists. However, the time for manually adjusting the MVision AI contours did not differ according to physicians’ experience, which shows that the solution is equally helpful for all users. The median time for manual contouring was 25 minutes, ranging from 8 to 115 minutes.
The median time for MVision AI contours generation was 2.3 minutes, ranging from 1.2 to 8 minutes, and for manual corrections the median needed time was 10 minutes, ranging from 0.3 to 46.3 minutes. Details on the evaluation and time save on each evaluated site can be found in Figure 1 and Table 1.
Slice thickness and the use of contrast did not interfere with the performance of the MVision AI models. Manually adjusting MVision AI contours significantly decreased interobserver variability compared to the manual contours.
What can be concluded?
Complex and extensive evaluation of MVision AI solution was highly appreciated by clinicians, reduced contouring time and decreased interobserver variability.
Guideline-consistent ground truth contours and rigorous peer-review are a distinctive hallmark of MVision AI’s GBS (guideline based segmentation) approach to training AI models for auto- contouring. MVision AI is continuously improving its models, so newer versions are expected to perform even better. These excellent results and many similar others, motivate the MVision AI team to continue developing in the same direction and support clinicians to improve cancer care.
Table 1. Time save, satisfaction grades for the anatomical site groups included in the analysis by Strolin et al, 2023
Strolin S, Santoro M, et al (2023) How smart is artificial intelligence in organ delineation? Testing a CE and FDA-approved Deep-Learning tool using multiple expert contours delineated on planning CT images. Front. Oncol. 13:1089807. doi: 10.3389/fonc.2023.1089807