MotivationGenomic region sets summarize functional genomics data and define locations of interest... more MotivationGenomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis.ResultsWe propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 wi...
Existing simulations designed for cultural and interpersonal skill training rely on pre-defined r... more Existing simulations designed for cultural and interpersonal skill training rely on pre-defined responses with a menu option selection interface. Using a multiple-choice interface and restricting trainees' responses may limit the trainees' ability to apply the lessons in real life situations. This systems also uses a simplistic evaluation model, where trainees' selected options are marked as either correct or incorrect. This model may not capture sufficient information that could drive an adaptive feedback mechanism to improve trainees' cultural awareness. This paper describes the design of a dialogue-based simulation for cultural awareness training. The simulation, built around a disaster management scenario involving a joint coalition between the US and the Chinese armies. Trainees were able to engage in realistic dialogue with the Chinese agent. Their responses, at different points, get evaluated by different multi-label classification models. Based on training on...
Objective: Develop a deep learning-based methodology using the foundations of systems pathology t... more Objective: Develop a deep learning-based methodology using the foundations of systems pathology to generate highly accurate predictive tools for complex gastrointestinal diseases, using celiac disease (CD) as a prototype. Design: To predict the severity of CD, defined by Marsh-Oberhuber classification, we used deep learning to develop a model based on histopathologic features. Results: The study was based on a pediatric cohort of 124 patients identified with different classes of CD severity. The model predicted CD with an overall 88.7% accuracy with the highest for Marsh IIIc (91.0%; 95% sensitivity; 91% specificity). The model identified EECs as a defining feature of children with Marsh IIIc CD and endocrinopathies which was confirmed using immunohistochemistry. Conclusion: This deep learning image analysis platform has broad applications in disease treatment, management, and prognostication and paves the way for precision medicine.
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2019
Environmental Enteropathy (EE) and celiac disease (CD) are gastrointestinal conditions that adver... more Environmental Enteropathy (EE) and celiac disease (CD) are gastrointestinal conditions that adversely impact the growth of children. EE is prevalent in low- and middle-income countries, whereas as CD is prevalent worldwide. The histologic appearance of duodenal EE biopsies significantly overlaps with celiac enteropathy. We propose a convolutional neural network (ConvNet) to classify EE cases from Pakistani infants along with celiac and healthy controls from the United States. We also identified areas of biopsies that generate high activation values in the ConvNet model. The identified features helped in distinguishing EE and celiac from healthy intestinal tissues. This work advances the understanding of both diseases and provides a potential screening and diagnostic tool for practitioners.
Correlating elements of large databases that are related but not exact matches has importance in ... more Correlating elements of large databases that are related but not exact matches has importance in a variety of applications. In health care epidemiologists have an interest in searching records for patterns of disease. In law enforcement this correlation task enables crime analysts to associate incidents possibly resulting from the same individual or group of individuals. In practice, most analysts perform this task manually by searching through records looking for similarities. Manual search does not imply paper records, but rather the construction of search criteria to narrow the search but not overly restrict it, either. The paper describes automated approaches to record or report correlation. Each of the automated approaches employs a weighted sum of attributes as the total similarity measure (TSM) between any elements in the database. All the approaches build the TSM using prior information provided by experienced analysts. We compare the methods using real data from a law enfor...
MotivationGenomic region sets summarize functional genomics data and define locations of interest... more MotivationGenomic region sets summarize functional genomics data and define locations of interest in the genome such as regulatory regions or transcription factor binding sites. The number of publicly available region sets has increased dramatically, leading to challenges in data analysis.ResultsWe propose a new method to represent genomic region sets as vectors, or embeddings, using an adapted word2vec approach. We compared our approach to two simpler methods based on interval unions or term frequency-inverse document frequency and evaluated the methods in three ways: First, by classifying the cell line, antibody or tissue type of the region set; second, by assessing whether similarity among embeddings can reflect simulated random perturbations of genomic regions; and third, by testing robustness of the proposed representations to different signal thresholds for calling peaks. Our word2vec-based region set embeddings reduce dimensionality from more than a hundred thousand to 100 wi...
Existing simulations designed for cultural and interpersonal skill training rely on pre-defined r... more Existing simulations designed for cultural and interpersonal skill training rely on pre-defined responses with a menu option selection interface. Using a multiple-choice interface and restricting trainees' responses may limit the trainees' ability to apply the lessons in real life situations. This systems also uses a simplistic evaluation model, where trainees' selected options are marked as either correct or incorrect. This model may not capture sufficient information that could drive an adaptive feedback mechanism to improve trainees' cultural awareness. This paper describes the design of a dialogue-based simulation for cultural awareness training. The simulation, built around a disaster management scenario involving a joint coalition between the US and the Chinese armies. Trainees were able to engage in realistic dialogue with the Chinese agent. Their responses, at different points, get evaluated by different multi-label classification models. Based on training on...
Objective: Develop a deep learning-based methodology using the foundations of systems pathology t... more Objective: Develop a deep learning-based methodology using the foundations of systems pathology to generate highly accurate predictive tools for complex gastrointestinal diseases, using celiac disease (CD) as a prototype. Design: To predict the severity of CD, defined by Marsh-Oberhuber classification, we used deep learning to develop a model based on histopathologic features. Results: The study was based on a pediatric cohort of 124 patients identified with different classes of CD severity. The model predicted CD with an overall 88.7% accuracy with the highest for Marsh IIIc (91.0%; 95% sensitivity; 91% specificity). The model identified EECs as a defining feature of children with Marsh IIIc CD and endocrinopathies which was confirmed using immunohistochemistry. Conclusion: This deep learning image analysis platform has broad applications in disease treatment, management, and prognostication and paves the way for precision medicine.
AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science, 2019
Environmental Enteropathy (EE) and celiac disease (CD) are gastrointestinal conditions that adver... more Environmental Enteropathy (EE) and celiac disease (CD) are gastrointestinal conditions that adversely impact the growth of children. EE is prevalent in low- and middle-income countries, whereas as CD is prevalent worldwide. The histologic appearance of duodenal EE biopsies significantly overlaps with celiac enteropathy. We propose a convolutional neural network (ConvNet) to classify EE cases from Pakistani infants along with celiac and healthy controls from the United States. We also identified areas of biopsies that generate high activation values in the ConvNet model. The identified features helped in distinguishing EE and celiac from healthy intestinal tissues. This work advances the understanding of both diseases and provides a potential screening and diagnostic tool for practitioners.
Correlating elements of large databases that are related but not exact matches has importance in ... more Correlating elements of large databases that are related but not exact matches has importance in a variety of applications. In health care epidemiologists have an interest in searching records for patterns of disease. In law enforcement this correlation task enables crime analysts to associate incidents possibly resulting from the same individual or group of individuals. In practice, most analysts perform this task manually by searching through records looking for similarities. Manual search does not imply paper records, but rather the construction of search criteria to narrow the search but not overly restrict it, either. The paper describes automated approaches to record or report correlation. Each of the automated approaches employs a weighted sum of attributes as the total similarity measure (TSM) between any elements in the database. All the approaches build the TSM using prior information provided by experienced analysts. We compare the methods using real data from a law enfor...
Uploads
Papers by Donald Brown