medric medric
잠시만 기다려 주세요. 로딩중입니다.

STR 자료의 데이터마이닝을 이용한 혈연관계의 분류

Classification of Common Relationships Based on Short Tandem Repeat Profiles Using Data Mining

대한법의학회지 2019년 43권 3호 p.97 ~ 105
정수진 ( Jeong Su-Jin ) - 고려대학교 통계학과

이효정 ( Lee Hyo-Jung ) - 동아ST 개발본부
이숭덕 ( Lee Soong-Deok ) - 서울대학교 의과대학 법의학과
이승환 ( Lee Seung-Hwan ) - 대검찰청 과학수사2과
박수정 ( Park Su-Jeong ) - 대검찰청 과학수사2과
김종식 ( Kim Jong-Sik ) - 대검찰청 과학수사2과
이재원 ( Lee Jae-Won ) - 고려대학교 통계학과


We reviewed past studies on the identification of familial relationships using 22 short tandem repeat markers. As a result, we can obtain a high discrimination power and a relatively accurate cut-off value in parent-child and full sibling relationships. However, in the case of pairs of uncle-nephew or cousin, we found a limit of low discrimination power of the likelihood ratio (LR) method. Therefore, we compare the LR ranking method and data mining techniques (e.g., logistic regression, linear discriminant analysis, diagonal linear discriminant analysis, diagonal quadratic discriminant analysis, K-nearest neighbor, classification and regression trees, support vector machines, random forest [RF], and penalized multivariate analysis) that can be applied to identify familial relationships, and provide a guideline for choosing the most appropriate model under a given situation. RF, one of the data mining techniques, was found to be more accurate than other methods. The accuracy of RF is 99.99% for parentchild, 99.44% for full siblings, 90.34% for uncle-nephew, and 79.69% for first cousins.


Short tandem repeats; Kinship testing; Relationships; Likelihood ratio; Data mining
원문 및 링크아웃 정보
등재저널 정보