网站地图 加入收藏 中文 English
 
首页 中心概况 组织机构 研究队伍 科学研究 人才培养 交流合作 支撑服务 人才招聘 下载专区 联系我们
当前位置:首页 - 交流合作 - 专题学术讲座
专题学术讲座
11月18日北大生物信息学讲座-Yu Sheng
发布时间:2016-11-15关键字:

 CBI seminar


Title:Automated feature extraction and selection for high-throughput phenotyping
Speaker: Dr. Yu Sheng,
Assistant Professor of statistics
in the Center for Statistical Science of Tsinghua University.
Time:14:00-15:00, Friday, November 18, 2016
Location: Room 311, Wang Ke-Zhen Building, Peking University
Abstract:
With the rapid adoption of electronic medical records (EMR), medicine and healthcare has become one of the most important field for big data applications. One of the important applications in medical research is the EMR-based phenotyping, which is to identify patients with certain phenotypes with machine learning algorithms. The conventional procedure for designing a phenotyping algorithm requires the participation of medical experts to discuss with statisticians and medical informaticians about the variables to use and the medical terms to search for, and the designing of one algorithm typically takes months to finalize. We propose a data-driven method to automate the algorithm designing process that can achieve higher accuracy even than expert designed algorithms. We utilize publicly available knowledge sources, such as the Wikipedia, to collect an initial set of candidate features. Billing codes and the natural language variable of the target phenotype are used to created surrogates of the gold-standard labels, and penalized logistic regression models are trained repeatedly with bootstrap to predict the surrogates in order to evaluate the informativeness of the candidate features. Only a succinct set of highly informative features will pass the data-driven screening and enter the final model to predict the true gold-standard labels. This method has been implemented in the development of large scale biobanks in top ranked hospitals in the U.S.
Speaker Bio:
Dr. Yu Sheng is Assistant Professor of statistics in the Center for Statistical Science of Tsinghua University. Dr. Yu received his BS and MA degrees in statistics from Nankai University and the University of Michigan, and he received his PhD degree in systems engineering (operations research) from the George Washington University. He started his research in medical informatics since his research work at Harvard University, and his current research interests include deep understanding of the medical language with machine learning methods, internet and data-driven knowledge extraction, and supervised and unsupervised EMR analysis.
Welcome!




版权所有 生命科学联合中心 京ICP备15006448号-5