07/16/2024
By Zinan Xiong

The Richard A. Miner School of Computer & Information Sciences invites you to attend a doctoral dissertation defense by Zinan Xiong on "Enhancing Disease Recognition and Consistency in Medical Imaging and Videos: A Deep Learning Perspective."

Ph.D. Candidate: Zinan Xiong
Date: Thursday, July 18, 2024
Time: 9:30 a.m. EST
Location: This will be a virtual defense via Zoom.

Committee Members:

  • Yu Cao (advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH)
  • Benyuan Liu (advisor), Professor, Director, Miner School of Computer & Information Sciences, UMass Center for Digital Health (CDH), Computer Networking Lab, CHORDS
  • Hengyong Yu (member), FIEEE, FAAPM, Professor, Department of Electrical & Computer Engineering
  • Yan Luo (member), Professor, Department of Electrical & Computer Engineering

Abstract:

Over the past decade, the relentless advancement of deep learning algorithms has unleashed a transformative wave across an array of industries, reshaping landscapes from autonomous vehicle technology to immersive gaming experiences and critically, revolutionizing the frontiers of healthcare. These algorithms, characterized by their remarkable computational capabilities and adaptive learning prowess, have emerged as indispensable tools, effectively alleviating the burdens of laborious, resource-intensive tasks that once demanded substantial human involvement.

In the healthcare sector, the integration of deep learning algorithms has significantly transformed medical practices, especially in the field of medical imaging. This integration has ushered in a new era, empowering healthcare professionals with unprecedented precision in diagnostics. Harnessing the prowess of deep learning, practitioners have unlocked the potential to unravel intricate patterns within medical imagery, elevating disease recognition to unprecedented levels of accuracy and efficacy. Such innovations have not only expedited diagnostic procedures but have also substantially redefined the paradigms of patient care.

In the detection and treatment of upper digestive tract diseases, deep learning has emerged as a crucial asset. For instance, its application in detecting esophageal and gastric cancers has significantly heightened the detection rates of these lesions, ensuring a safety net in scenarios where doctors might overlook these abnormalities due to factors like fatigue. By integrating deep learning models, systems are equipped to promptly and accurately alert healthcare professionals, mitigating the probabilities of misdiagnosis or missed diagnoses. This technological integration serves as a supportive layer, aiding medical practitioners in identifying subtle yet critical anomalies within the upper digestive tract, thereby bolstering the overall diagnostic accuracy and, subsequently, treatment efficacy.

The implementation of deep learning in the realm of upper digestive tract diseases is pivotal not only for early detection but also for precision in treatment strategies. Its role extends beyond mere detection, offering insights into tailoring treatment plans based on comprehensive analyses of medical imaging data. By harnessing the capabilities of deep learning algorithms, healthcare providers gain access to more nuanced information, allowing for personalized and targeted interventions. This integration not only aids in the timely identification of pathologies but also lays the foundation for a more individualized approach to patient care, optimizing treatment outcomes and overall prognosis in upper digestive tract ailments.

Generally, doctors aim to minimize the duration of the endoscope's passage from entering the oral cavity to reaching the throat, not only to enhance patient comfort but also as an indicator of a physician's proficiency, particularly for novice practitioners. Therefore, this paper proposes a framework for an automated algorithm based on deep learning and image classification to measure the oral-pharyngeal transit time. This framework aims to identify distinct points of endoscope passage by automating the recognition of various segments along the path. By calculating time differentials between these points, it precisely determines the duration from entry through the mouth to the eventual passage through the pharynx. Moreover, the algorithm is designed to automatically eliminate interferences caused by potential procedural anomalies, ensuring the accurate computation of transit time.

In the upper digestive tract, atrophic gastritis is a common pathology linked to the risk of gastric cancer. However, relying solely on image classification methods often results in discontinuous outcomes when applied to actual video scenarios, causing challenges in diagnosis for healthcare professionals. To address this issue of inconsistent classification between frames in videos, this paper introduces the Adapify algorithm. By leveraging both the main model and auxiliary model to analyze the video content separately, this algorithm performs weighted summation of their outcomes, subsequently adjusting the final classification results. This approach aims to rectify errors occurring between consecutive frames in videos, ensuring stable and reliable classification outcomes for practitioners.

In addition, with recent large language models such as ChatGPT and LLaMA becoming increasingly popular, the field of computer vision is also intensifying its efforts, hoping to develop a visual foundational model that holds a similar prominent position in the visual domain as the former two in the language domain. Therefore, Meta has recently proposed a new visual foundational model called the Segment Anything Model (SAM), which can be used to segment objects and scenes in real life. Because this foundational model is trained on an extremely large dataset, it possesses strong zero-shot generalization capabilities. Therefore, I hope to transfer the power of this foundational model to ultrasound medical images, enabling it to achieve similar segmentation capabilities as in other natural images while significantly reducing its computational complexity, making it easy to transplant into scenarios with limited hardware requirements, including even edge devices.