hg19 (GRCh37) 与 hg38 (GRCh38) 数据差异比较

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

hg19(GRCh37)vs.hg38(GRCh38)HumanGenomeReferenceComparisonZuotianTatumDepartmentofHumanGeneticsLeidenUniversityMedicalCenterTimelineGRCh37:Firstrelease:Feb27,2009Latestpatch:Jun28,2013(p13)GRCh38:Firstrelease:Dec24,2013Latestpatch:Oct14,2014(p1):Totalbases:3.23Billion2.99Billion(withoutN)N50:46MillionNumberofalternativeloci:9Non-nucleargenome:NoGRCh38.p2:Totalbases:3.21Billion3.05Billion(withoutN)N50:67MillionNumberofalternativeloci:261Non-nucleargenome:Yes(Nov2014).NewinGRCh38releaseThreenewsequencefiles,inadditiontothestandardassemblyfiles:-GCA_000001405.15_GRCh38_top-level.fna.gz-GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz-GCA_000001405.15_GRCh38_full_analysis_set.fna.gzTheanalysissetfilesarecreatedtoavoidfalsemappinginNGSalignmentpipelines.GCA_000001405.15_GRCh38_top-level.fna.gzAllthetop-levelobjectsinthefull-assemblyChromosomesunlocalizedscaffoldsunplacedscaffoldsalternatelocusscaffoldsmitochondrialgenomeThesequenceidentifiersareInternationalSequenceDatabaseCollaboration(INSDC)accession.versionsandthedefinitionlinesareGenBankstyle.Nosequenceshavebeenhard-masked.GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gzChromosomesfromtheGRCh38PrimaryAssemblyunit.Note:thetwoPARregionsonchrYhavebeenhard-maskedwithNs.ThechromosomeYsequenceprovidedthereforehasthesamecoordinatesastheGenBanksequencebutitisnotidenticaltotheGenBanksequence.Similarly,duplicatecopiesofcentromericarraysandWGSonchromosomes5,14,19,21&22havebeenhard-maskedwithNs.MitochondrialgenomefromtheGRCh38non-nuclearassemblyunit.UnlocalizedscaffoldsfromtheGRCh38PrimaryAssemblyunit.UnplacedscaffoldsfromtheGRCh38PrimaryAssemblyunit.Epstein-Barrvirus(EBV)sequenceNote:TheEBVsequenceisnotpartofthegenomeassemblybutisincludedintheanalysissetasasinkforalignmentofreadsthatareoftenpresentinsequencingsamples.GCA_000001405.15_GRCh38_full_analysis_set.fna.gz=GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz+alt-scaffoldsfromtheGRCh38ALT_REF_LOCI_*assemblyunitsAlt-lociaddcomplexitytoRNASeqquantificationIdeogramofGRCh38.p2RNASeqquantification-Fragments(reads)permillionperkillobase(FPKM/RPKM)valuestoquantifygeneexpression-UniquemappingonlyAnalysistoolsdonotdistinguishallelicduplicationfromparalogousduplication-NonoverlappinggeneregionsTounderstandtheeffectofalt-locionRNASeqquantificationComparealignmentofchromosome6MHCregionbetween-hg19fullsetwith7alt-loci-hg38analysissetwithoutalt-lociSequencecontentarelargelyunchangedbetweenhg19andhg38.Mapping/alignmentforRNASeqhg19hg38mapped14,655,29914,704,427mappedDiffChr4,9594,017mappedPairProper14,639,26114,690,090mappedPairProperPct92.6292.94total15,805,56115,805,561totalSplice5,060,8295,078,133unmapped1,150,2621,101,134hg19:withaltlocihg38:withoutaltlociEffectofaltlociinRNASeqalignments0.010.11101001000100000.010.1110100100010000GeneRPKM(hg19)GeneRPKM(hg19)chr_masked−171946XGeneRPKM(hg38)MajorHistocompatibilitycomplexregiononchromosome6HLA-Ahg19fullset–chr6D1hg19fullset–chr6_mann_hap4D1hg19fullset–chr6_qb1_hap6D1hg19fullset–chr6_dbb_hap3D1HLA-Ahg19fullset–chr6hg38analysissetD1D2D3D1D2D3HLA-Chg19fullsetD1D2D3hg38analysissetD1D2D3HLA-DRAhg19fullsetD1D2D3hg38analysissetD1D2D3MajorHistocompatibilitycomplexregiononchromosome6ClassIIIMHCClassIII700kbstretch,60genes.Themostgene-denseregionofthehumangenome14%coding~72%transcribedHighlyconservedOnlyafreehaveclearlydefinedandprovenfunctionTNFhg19fullset–chr6D1.controlD1.treatedhg38analysisset–chr6D1.controlD1.treatedHighlyvariantimmuneregionsretiledLILRA3movedtoalt-lociinhg38hg19hg38LILRB2LILRA3LILRA5LILRB2LILRA5PhantomLILRA3LILRA3inhg19IntergenicLILRB3LILRA4LILRB5Needmorecomprehensiveapproachtogenomevariation.AssemblymodelisneitherhaploidnordiploidAnalysistoolspenalizereadsmappingto1locationdonotdistinguishallelicduplicationfromparalogousduplicationAgraphstructureisanaturalwaytorepresentapopulation-basedgenomeassemblyConclusionsRPKMvaluesarehighlycorrelatedbetweenhg19andhg38.Analysissetispreferredforexpressionanalysis.Additionalanalysismaybeperformedtousethealt-lociseparately.Annotationsforhg38isstilllackingandneedcontributionfromthecommunity.Improvemodelingofgenomevariabilityinpopulation.Questions?

1 / 29
下载文档,编辑使用

©2015-2020 m.111doc.com 三一刀客.

备案号:赣ICP备18015867号-1 客服联系 QQ:2149211541

×
保存成功