This Users' Guide to the Medical Literature discusses the use of machine learning models as a diagnostic tool, and it explains the important steps needed for making these models and the outcomes they derive clinically effective.
In recent years, many new clinical diagnostic tools have been developed using complicated machine learning methods. Irrespective of how a diagnostic tool is derived, it must be evaluated using a 3-step process of deriving, validating, and establishing the clinical effectiveness of the tool. Machine learning–based tools should also be assessed for the type of machine learning model used and its appropriateness for the input data type and data set size. Machine learning models also generally have additional prespecified settings called hyperparameters, which must be tuned on a data set independent of the validation set. On the validation set, the outcome against which the model is evaluated is termed the reference standard. The rigor of the reference standard must be assessed, such as against a universally accepted gold standard or expert grading.
You are the chief medical officer of a large multifacility health care system. One of the medical staff committees of the organization reviewed guidelines from the American Academy of Ophthalmology recommending annual diabetic retinopathy screening for all adult patients with diabetes.1 You determine that there is reasonably good evidence supporting this recommendation. Patients with diabetes are prone to developing retinopathy or macular edema and these diseases may progress to advanced stages before any symptoms occur. Screening allows for treatment of these diseases with anti–vascular endothelial growth factor (anti-VEGF) agents or laser photocoagulation in an early disease stage—before vision is compromised.
Despite the benefit of screening, your organization has very limited access to eye care. You have also found an article suggesting that such screening, using an automated system based in primary care clinics in a health system similar to yours, was effective for diabetic retinopathy screening.2 In that study, nondilated digital retinal images were obtained in primary care clinics and automatically analyzed by artificial intelligence software. The system is proprietary, and you do not know how valid, reliable, and effective it might be. You perform a web search and find that there are several automated systems available that screen for diabetic retinopathy. You also find that it is currently believed that systems based on a machine learning method called convolutional neural networks (CNNs) seem to have the most promise for detecting diabetic retinopathy in clinical practice because these systems have the ability to manage very large amounts of information, high sensitivity, and high specificity.
A search of PubMed finds some articles that demonstrate the performance characteristics for automated systems for detecting eye disease. In one JAMA article, the ability of machine learning using modern CNNs to detect diabetic retinopathy was shown,3 and in another, a CNN-based system was developed and validated using independent samples.4 A third article ...