Where is Deep Learning in Retina Headed?

Nearly two years have passed since the Google Brain team presented data showing that a deep-learning algorithm is capable of detecting signs of diabetic retinopathy at least as accurately as a cohort of ophthalmologists.¹ While this was certainly not the first exploration of artificial intelligence or machine learning applications in medicine, it had a profound impact in terms of capturing the collective attention and imagination of researchers, clinicians, industry and mainstream media. Deep learning had announced its arrival in ophthalmology, although many were unsure of how, when, why and to what extent it would help reshape the way we deliver care.

Since then, numerous studies have validated deep-learning models in the detection and diagnosis of diseases afflicting the posterior segment of the eye, with extremely high accuracy.^1-7 In April 2018, the Food and Drug Administration granted breakthrough device designation to the cloud-based software IDx-DR (IDx Technologies) as the first artificial intelligence-based medical device to detect referable DR from color fundus photographs.² This is the first approved instrument to provide a screening decision without clinician input.

Moving forward, ongoing advances in machine learning, and especially deep learning, offer the potential to help expand patient access to care, increase efficiency, reduce errors and improve overall quality of care. Here, we elaborate on four areas in which this revolutionary technology is positioned to impact our day-to-day clinical practice as retina specialists.

1. Deployment of Large-Scale Teleretinal Screening Programs

Diabetes is a growing epidemic both domestically and internationally. Current estimates show that more than 30 million Americans have diabetes; this number exceeds 400 million worldwide.^8,9 Both of these figures continue to rise at staggering rates that surpass most predictive models. Despite well-established guidelines for screening and potential early detection of DR by an eye-care provider, 30 to 50 percent of people with diabetes do not adhere to these recommendations for a multitude of reasons.^10,11 Teleretinal screening programs for DR may help close this gap, and are already demonstrating success in select regional markets with nonmydriatic cameras being deployed in various settings (primary-care-physicians’ vs. endocrinologists’ offices).

As this infrastructure continues to develop, deep learning overlay may augment remote imaging diagnostic capabilities by reducing the degree of human involvement needed. That is, shifting from requiring direct human interpretation of every image to primarily human oversight with grading/affirmation needed only for referable abnormals. Deep learning-based screening programs offer at least four potential benefits (box).

FDA approval of the IDx-DR device was based on a prospective study that assessed the software performance on retinal images from 900 diabetic patients at 10 primary-care offices.² In the study, Michael Abràmoff, MD, PhD, and colleagues showed that IDx-DR’s sensitivity and specificity for detecting greater-than-mild DR were 87 and 90 percent, respectively. Notably, existing staff at the primary-care physician sites received a one-time standardized four-hour training program on operating the system, after which they were able to successfully image patients and transfer information to the platform 96 percent of the time. (Dr. Abràmoff is founder and president of IDx.)

Deep Learning in AMD

Beyond diabetes, deep-learning algorithms have shown promise in detecting various posterior segment diseases. For example, Philippe M. Burlina, PhD, and colleagues applied two different deep-learning algorithms to solve a two-class age-related macular degeneration classification problem, categorizing fundus images from the National Institutes of Health Age-related Eye Disease Study dataset (n>130,000 images) as either disease-free/early stage AMD (for which dietary supplements are not considered) or intermediate or advanced stage (for which supplements and monitoring are considered).⁶ The investigators found that both deep-learning methods yielded accuracy that ranged between 88.4 and 91.6 percent, while the area under the curve (AUC) was between 0.94 and 0.96. These findings were promising and indicated performance levels comparable to physicians.

Furthermore, a group of international researchers reported on a deep-learning system that, in addition to detecting referable DR and vision-threatening DR (defined as severe nonproliferative DR or proliferative DR), was also trained to identify AMD and referable glaucoma. The investigators commented that screening for other
vision-threatening conditions should be mandatory for any clinical diabetic screening program.⁷

In the primary validation dataset (n=71,896 images), the AUC of the algorithm for referable DR was 0.936, with sensitivity of 90.5 percent and specificity of 91.6 percent. For vision-threatening DR, the AUC was 0.958, with sensitivity of 100 percent and specificity of 91.1 percent. For possible glaucoma, the AUC was 0.942, with sensitivity of 96.4 percent and specificity of 87.2 percent. Finally, for AMD, the AUC was 0.931, with sensitivity of 93.2 percent and specificity of 88.7 percent. Among the additional 10 datasets used for external validation (n=40,752 images), the AUC range for referable DR was between 0.889 and 0.983.

Wide-Field Imaging Potential

Equally as important as the disease screened for is the imaging modality used to do the screening. Numerous nonmydriatic fundus camera systems are currently available, but limited investigations have been conducted thus far using wide-field imaging, which may offer unique advantages for future teleretinal screening programs. The collaboration between Nikon’s
Optos subsidiary and Google’s Verily (formerly Google Life Sciences) in late 2016 is evidence.¹² Researchers in Japan reported on their deep learning algorithm to detect rhegmatogenous retinal detachment using Optos ultra-wide-field fundus images, which demonstrated a high sensitivity of 97.6 percent with an AUC of 0.988.¹³

A separate study from the same group aimed to use Optos ultra-wide-field images for the detection of neovascular AMD. Similarly, they reported a high sensitivity of 100 percent with an AUC of 0.998.¹⁴ The single greatest limitation of both studies was the low number of images used for training in each study (n<500), as deep learning requires a large number of data sets for optimal training. Moving forward, larger sets of classified, labeled wide-field images will need to be procured for more optimal deep-learning algorithm development.

2. Systemic Disease Assessment

Retina specialists routinely assess for ocular involvement of various systemic disease states, ranging from vascular (diabetes, hypertension) to infectious (tuberculosis, syphilis) to inflammatory (sarcoidosis, Behçet’s). However, deep learning offers the potential to identify subclinical findings and patterns from retinal images that extend beyond the discernible threshold of a human interpreter. This may one day enable fundus photography to be used as a supplemental biomarker for overall systemic morbidity/mortality assessment, rather than for just identifying retinal pathology.

Retinal Imaging in CVD

This exciting possibility was recently explored by Google researchers working with a Stanford University cardiologist. They used a deep-learning algorithm trained on retinal fundus images (n=284,335 patients) to predict associated cardiovascular risk factors.¹⁵ Their algorithm accurately predicted cardiovascular risk factors not previously thought to be detectable in retinal images, such as patient age (within 3.26 years), gender (AUC=0.97), smoking status (AUC=0.71), systolic blood pressure (within 11.23 mmHg) and major adverse cardiac events (AUC=0.70). This performance approached the accuracy of other cardiovascular risk calculators, which typically require a blood draw to measure cholesterol levels.

While this is a new and evolving area of study, future directions may also investigate associations with subclinical retinal findings and neurodegenerative conditions such as Alzheimer’s, Parkinson’s and multiple sclerosis.

Figure. The terms “deep learning,” “machine learning” and “artificial intelligence” can be thought of concentric circles: AI is the largest circle, machine learning a smaller circle within the subset of AI, and deep learning the smallest circle within the subset of machine learning

3. Improving Clinical Efficiency and Daily Workflow

The widespread use and success of intravitreal agents for the management of retinal diseases has not only revolutionized patient care, but also dramatically increased the burden of treatment for retina specialists.

The ever-expanding indications for these medications (e.g., anti-VEGF for any stage of DR), as well as the promise of new targeted therapies for conditions without current treatment (e.g., dry AMD), are likely to challenge and further strain the day-to-day clinical practice of retina specialists as patient office visits and diagnostic testing only increase.

Given how optical coherence tomography imaging is the single most common diagnostic test performed on a daily basis in retina clinics, this task potentially lends itself to automation with deep-learning techniques. Several groups have successfully utilized deep learning in segmentation of OCT scans for the detection of morphological features such as intraretinal fluid (IRF) or subretinal fluid (SRF) from various retinovascular diseases.^16-19

Taking Deep-Learning
One Step Further

Researchers in Germany proposed a deep-learning model with the goal of predicting the need for intravitreal anti-VEGF retreatment rather than just detecting the presence of IRF or SRF.²⁰ In their study, 183,402 OCT images from patients receiving ongoing anti-VEGF therapy were cross-referenced with the electronic institutional intravitreal injection records. The trained algorithm reached a prediction accuracy of 95.5 percent on the images in the validation set. For single retinal B-scans in the validation dataset, a sensitivity of 90.1 percent and a specificity of 96.2 percent were achieved, with an AUC of 0.968.

Taken together, deep learning in this setting may one day be used for more rapid, automated evaluation and assessment of images and monitoring of disease activity, only necessitating human verification for abnormals. Similar protocols have been implemented in the field of radiology to improve efficiency. Furthermore, these methods may additionally offer the clinician support in decision-making on a given patient’s need for treatment.

Deep Learning and EHR

Deep learning appears poised to impact clinical workflow efficiency beyond tasks pertaining to just image recognition and classification. For example, some of the earlier applications of deep learning were in the fields of voice/speech recognition and language processing. Accordingly, Deep Scribe, Robin and other companies are developing deep learning-based digital medical scribing platforms to augment and improve the physician’s documentation process into the electronic health record.^21,22

Having a system, rather than a live/remote human scribe, that can reliably produce clinic notes up to the physician’s standards while constantly evolving and improving over time may help to increase direct face-time with patients, alleviate administrative and clerical burden, reduce administrative practice costs and, ultimately, improve day-to-day clinic efficiency.

4. Precision Medicine in Retina

The Precision Medicine Initiative had defined precision medicine as “an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person.”^{23 21} An individualized approach to medicine may enable physicians to more accurately predict which treatment strategies may be most effective for certain patient groups based on inherent individual differences.

Inferring Disease Patterns

Looking further down the road, deep learning offers the potential to help solve a number of our overburdened health-care system’s growing problems. As of now, these algorithms have been mostly used for the detection and diagnosis of disease. However, as efforts grow toward acquiring sequential datasets from the same patients over an extended period of time, deep learning may unlock the potential to start inferring patterns of disease progression, and, from that,
make treatment and prognostic
predictions.

We may one day be able to tailor treatments and interventions to patients at highest risk of disease progression at an earlier stage. For example, DR could potentially be reclassified along a scale where a numeric grade denotes a patient’s risk of developing diabetic macular edema or progressing to proliferative disease. Once a certain numeric threshold is crossed, treatment would then be indicated, even if center-involving DME or neovascularization has not yet developed.

Conversely, deep learning may also elucidate for whom and when treatment can be selectively withheld; that is, where there may not be any added functional visual benefit, thus avoiding overcommitment of expensive and finite resources.

Predicting Treatment Outcomes

Research groups are currently investigating deep-learning methodologies to identify OCT structural biomarkers in hopes of predicting clinical treatment outcomes.^24,25 Austrian investigators have applied deep learning techniques to OCT images from 614 clinical patients in the HARBOR trial, aiming to predict functional response to intravitreal anti-VEGF therapy. HARBOR was a 24-month, Phase III dose-response trial of ranibizumab (Lucentis, Roche/Genentech) for treatment of wet AMD.

One of their studies applied a deep-learning algorithm to delineate retinal layers and the CNV-associated lesion components, IRF, SRF and pigment epithelial detachment.²⁴ These were extracted together with visual acuity measurements at baseline, months one, two and three, and then used to predict vision outcomes at month 12 by using “random forest” machine learning. The group found that the most relevant OCT biomarker for predicting the corresponding visual acuity was the horizontal extension of IRF within the foveal region, whereas SRF and pigment epithelial detachment ranked lower.

With respect to predicting final visual acuity outcomes after one year of treatment, the accuracy of the algorithm increased in a linear fashion with each successive month of data included from the initiation phase. The most accurate predictions were generated at month three (R2=0.70).

In a separate study, the same researchers applied their deep-learning techniques to assess whether low and high ranibizumab injection requirements from the pro re nata arm of the HARBOR trial could be predicted based on OCT scans at baseline and months one and two.²⁵ Of 317 eligible patients, 71 had low injection requirements (≤five), 176 had medium (five to 16) and 70 had high (≥16) injection requirements during the PRN phase of treatment extending from three to 23 months.

The authors found that classification within low or high treatment demonstrated AUCs of 0.7 and 0.77, respectively. Additionally, the most relevant OCT biomarker for prediction of injection burden was volume of SRF within the central
3 mm at month two.

Analyzing Massive Datasets

On a larger scale, deep-learning algorithms are being applied to analyze substantial quantities of electronic health records with the goal of making predictive assessments regarding certain high-risk populations. Predictive modeling with EHR data is anticipated to further advance personalized medicine and improve overall health-care quality. A Google-led study recently demonstrated that deep-learning methods using patients’ entire raw EHR records are capable of accurately predicting multiple medical events. ²⁶

In the study, de-identified EHR data from two U.S. academic medical centers with 216,221 adults hospitalized for at least 24 hours was unrolled into a total of 46 billion data points. The deep-learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUC=0.93 to 0.94), 30-day unplanned readmission (AUC=0.75 to 0.76), prolonged length of stay (AUC=0.85 to 0.86), and all of a patient’s final discharge diagnoses (AUC=0.90).

This type of research may be of unique interest given the patient demographic retina specialists care for are typically elderly (i.e., AMD) or vasculopathic (i.e., diabetes). Thus, both groups may be at higher risk for systemic adverse events necessitating hospitalizations. Extrapolating the results of this study, being able to identify patients at highest risk for experiencing secondary events beforehand could theoretically influence our future treatment paradigms.

For example, in managing PDR, a more definitive panretinal photocoagulation may be indicated over anti-VEGF injections for a patient whom an algorithm denotes as having a high risk of inpatient hospitalization. That may potentially mean a higher risk of missing clinic appointments.

References

1. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402-2410.

2. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digital Medicine. 2018;1:article 39.

3. Abràmoff MD, Lou Y, Erginay A, et al. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci. 2016;57:5200-5206.

4. Abràmoff MD, Folk JC, Han DP, et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013;131:351–357.

5. Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017;124:962-969.

6. Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135:1170-1176.

7. Ting DSW, Cheung CY, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211-2223.

8. New CDC report: More than 100 million Americans have diabetes or prediabetes [press release]. Atlanta, GA; Centers for Disease Control and Prevention. July 18, 2017. Available at: https://www.cdc.gov/media/releases/2017/p0718-diabetes-report.html. Accessed September 9, 2018.

9. Zimmet PZ. Diabetes and its drivers: The largest epidemic in human history? Clin Diabetes Endocrinol. 2017;3:1.eCollection.

10. Kuo S, Fleming BB, Gittings NS, et al. Trends in care practices and outcomes among Medicare beneficiaries with diabetes. Am J Prev Med. 2005;29:396–403.

11. Brechner RJ, Cowie CC, Howie LJ, et al. Ophthalmic examination among adults with diagnosed diabetes mellitus. JAMA. 1993; 270:1714–1718.

12. Nikon and Verily establish strategic alliance to develop machine learning-enabled solutions for diabetes-related eye disease [press release]. Tokyo, Japan; Nikon. December 27, 2016. Available at: https://www.nikon.com/news/2016/1227_01.htm. Accessed September 99, 2018.

13. Ohsugi H, Tabuchi H, Enno H, Ishitobi N. Accuracy of deep learning, a machine-learning technology, using ultra-wide-field fundus ophthalmoscopy for detecting rhegmatogenous retinal detachment. Sci Rep. 2017;7:9425.

14. Matsuba S, Tabuchi H, Ohsugi H, et al. Accuracy of ultra-wide-field fundus ophthalmoscopy-assisted deep learning, a machine-learning technology, for detecting age-related macular degeneration. Int Ophthalmol. 2018 May 9. [epub ahead of print]

15. Poplin R, Varadarajan AV, Blumer K, et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–164.

16. Lee CS, Tyring AJ, Deruyter NP, Wu Y, Rokem A, Lee AY. Deep-learning based, automated segmentation of macular edema in optical coherence tomography. Biomed Opt Express. 2017; 8:3440-3448.

17. Schlegl T, Waldstein SM, Bogunovic H, et al. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology. 2018;125:549-558.

18. Fang L, Cunefare D, Wang C, Guymer RH, Li S, Farsiu S. Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search. Biomed Opt Express. 2017; 8:2732-2744.

19. El Tanboly A, Ismail M, Shalaby A, et al. A computer-aided diagnostic system for detecting diabetic retinopathy in optical coherence tomography images. Med Phys. 2017; 44:914-923.

20. Prahs P, Radeck V, Mayer C, et al. OCT-based deep learning algorithm for the evaluation of treatment indication with anti-vascular endothelial growth factor medications. Graefes Arch Clin Exp Ophthalmol. 2018;256:91-98.

21. DeepScribe. https://deepscribe.tech/. Accessed September 9, 2018.

22. Robin. https://www.robinhealthcare.com/ Accessed September 9, 2018.

23. What is precision medicine. Genetics Home Reference. Bethesda, MD; U.S. National Library of Medicine website. Available at: https://ghr.nlm.nih.gov/primer/precisionmedicine/definition. Accessed September 9, 2018.

24. Schmidt-Erfurth U, Bogunovic H, Sadeghipour A, et al. Machine learning to analyze the prognostic value of current imaging biomarkers in neovascular age-related macular degeneration. Ophthalmol Retina. 2017. [Epub ahead of print]

25. Bogunovic H, Waldstein SM, Schlegl T, et al. Prediction of anti-VEGF treatment requirements in neovascular AMD using a machine learning approach. Invest Ophthalmol Vis Sci. 2017; 58:3240-3248.

26. Rajkomar A, Oren E, Chen K, et al. Scalable and accurate deep learning with electronic health records. npj Digital Medicine. 2018;1:article 18.

27. Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA. 2017; 318:517-518.

28. Hoff T. Deskilling and adaptation among primary care physicians using two work innovations. Health Care Manage Rev. 2011; 36:338-348.

29. Human Intelligence and Artificial Intelligence in Medicine Symposium. Palo Alto, CA; April 17, 2018. Available at: https://med.stanford.edu/presence/initiatives/hiai-symposium.html. Accessed September 9, 2018.

30. Keane P, Topol EJ. With an eye to AI and autonomous diagnosis. npj Digital Medicine. 2018;1:article 40.

Where is Deep Learning in Retina Headed?

A close look at four ways deep learning may change clinical practice — and a word of caution.

Current Issue

Continuing Medical Education

Additional Publications

Treating Chronic Inflammation Associated With Uveitis Affecting the Posterior Segment in a Retina Setting with YUTIQ

GA: Recognizing the Burden

Treatment of Proliferative Diabetic Retinopathy

Improving Wet AMD Outcomes With a Fellow Eye Strategy

Eyepoint Yutiq: A Discussion Among Experts on the Treatment of Uveitis Affecting the Posterior Segment

Review Of Ophthalmology

Back to Basics: Visual Field Interpretation

Should You Change Your Practice Model?

ARVO Annual Meeting Report