잠시만 기다려 주세요. 로딩중입니다.

GEN2VCF: a converter for human genome imputation output format to VCF format

Genes & Genomics 2020년 42권 10호 p.1163 ~ 1168
신동문, 황미영, 김봉조, 류근호, 김영진,
소속 상세정보
신동문 ( Shin Dong-Mun ) - Chungbuk National University College of Electrical and Computer Engineering Department of Computer Science
황미영 ( Hwang Mi-Yeong ) - Osong Health Technology Administration Complex National Institute of Health Center for Genome Science
김봉조 ( Kim Bong-Jo ) - Osong Health Technology Administration Complex National Institute of Health Center for Genome Science
류근호 ( Ryu Keun-Ho ) - Chungbuk National University College of Electrical and Computer Engineering Department of Computer Science
김영진 ( Kim Young-Jin ) - Osong Health Technology Administration Complex National Institute of Health Center for Genome Science

Abstract


Background: For a genome-wide association study in humans, genotype imputation is an essential analysis tool for improving association mapping power. When IMPUTE software is used for imputation analysis, an imputation output (GEN format) should be converted to variant call format (VCF) with imputed genotype dosage for association analysis. However, the conversion requires multiple software packages in a pipeline with a large amount of processing time.

Objective: We developed GEN2VCF, a fast and convenient GEN format to VCF conversion tool with dosage support.

Methods: The performance of GEN2VCF was compared to BCFtools, QCTOOL, and Oncofunco. The test data set was a 1 Mb GEN-formatted file of 5000 samples. To determine the performance of various sample sizes, tests were performed from 1000 to 5000 samples with a step size of 1000. Runtime and memory usage were used as performance measures.

Results: GEN2VCF showed drastically increased performances with respect to runtime and memory usage. Runtime and memory usage of GEN2VCF was at least 1.4- and 7.4-fold lower compared to other methods, respectively.

Conclusions: GEN2VCF provides users with efficient conversion from GEN format to VCF with the best-guessed genotype, genotype posterior probabilities, and genotype dosage, as well as great flexibility in implementation with other software packages in a pipeline.

키워드

Human genome; Imputation; SNP; Converter; Parsing

원문 및 링크아웃 정보

등재저널 정보