人工智能与神经网络笔记7

aaalkhfss
12 ℃
2020-01-18

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

AI-NNLectureNotesChapter8Feed-forwardNetworks§8.1IntroductionToClassificationTheClassificationModelX=[xx…x]--theinputpatternsofclassifier.i(X)--decisionfunctionTheresponseoftheclassifieris1or2or…orR.t12n0xxx21nPatternClass1or2or…orRClassifieri(X)0GeometricExplanationofClassificationPattern--ann-dimensionalvector.Alln-dimensionalpatternsconstituteann-dimensionalEuclideanspaceEandiscalledpatternspace.IfallpatternscanbedividedintoRclasses,thentheregionofthespacecontainingonlypatternsofr-thclassiscalledther-thregion,r=1,…,R.Regionsareseparatedfromeachotherbydecisionsurface.ApatternclassifiermapssetsofpattersinEintooneoftheregionsdenotedbynumbersi=1,2,…,R.n0nClassifiersThatUseTheDiscriminantFunctionsThemembershipinaclassaredeterminedbasedonthecomparisonofRdiscriminantfunctionsg(X),i=1,…,R,computedfortheinputpatternunderconsideration.g(X)arescalarvaluesandthepatternXbelongstothei-thclassiffg(X)g(X),i,j=1,…,R,ij.Thedecisionsurfaceequationisg(X)-g(X)=0.Assumingthatthediscrimminantfunctionsareknown,theblockdiagramofabasicpatternclassifiercanbeshownbelow:iiiijijg(X)g(X)g(X)MaximumSelectorXClassi0Discriminators1iR1iRForagivenpattern,thei-thdiscriminatorcomputesthevalueofthefunctiong(X)calledbrieflythediscriminant.iWhenR=2,theclassifiercalleddichotomizerissimplifiedasbelow:g(X)i0ClassTLU-11iXDiscriminatorDiscriminantItsdiscriminantfunctionisg(X)=g(X)-g(X)12Ifg(X)0,thenXbelongstoClass1;Ifg(X)0,thenXbelongstoClass2.Thefollowingfigureisanexamplewhere6patternsbelongtooneofthe2classesandthedecisionsurfaceisastraightline.-2-1120-1-2xx12g(X)0g(X)0(2,0)(3/2,-1)(1,-2)(0,0)(-1/2,-1)(-1,-2)g(X)=-2x+x+212DecisionSurfaceg(X)=0Infinitenumberofdiscrimminantfunctionsmayexist.TrainingandClassificationConsiderneuralnetworkclassifiersthatderivetheirweightsduringthelearningcycle.Thesamplepatterns,calledthetrainingsequence,arepresentedtothemachinealongwiththecorrectresponseprovidedbytheteacher.Aftereachincorrectresponse,theclassifiermodifiesitsparametersbymeansofiterative,supervisedlearningbasedoncomparingthetargetedcorrectresponsewiththeactualresponse.++TLUd0=I=1or-10+-d-0g(y)xxxx§8.2SingleLayerPerceptron1.LinearThresholdUnitandSeparabilityTXXX=(x,…,x),xR={1,-1}t1NW=(w,…,w)1NtTRy=sgn(WX-T){1,-1}tLetX’=andW’=(W,T),thenwehaveX-1y=sgn(W’X’)=(wx)tnnn=1N+1LinearlySeparablePatternsAssumethatthereisapatternsetwhichisdividedintosubset,,…,,respectively.Ifalinearmachinecanclassifythepatternfrom,asbelongingtoclassi,fori=1,…,N,thenthepatternsetsarelinearlyseparable.12NitWhenR=2,forgivenXandY,ifthereexistssuchW,fandTthatmakestherelationf:{-1,1}valid,thenthefunctionissaidtobelinearlyseparable.Example1:NANDisalinearlyseparablefunction.GivenXandYsuchthatxxy-1-11-1111-1111-112ItcanbefoundthatW=(-1,-1)andT=-3/2isthesolution:y=sgn(WX-T)=sgn(-x-x+3/2)tt12xxTwx+wx-Ty-1-1-3/21+1-(-3/2)=7/21-11-3/21-1-(-3/2)=3/211-1-3/2-1+1-(-3/2)=3/2111-3/2-1-1-(-3/2)=-1/2-1121122-3/2xx12x+x=3/2123/23/2Y0Y0xx1122wwY=sgn(-x-x)12Example2XORisnotalinearlyseparablefunction.Giventhatxxy12-1-1-1-1111-1111-1ItisimpossibleforfindingsuchWandTthatsatisfyingy=sgn(WX-T):If(-1)w+(-1)wT,then(+1)w+(+1)wTIf(-1)w+(+1)wT,then(+1)w+(-1)wTt12121212Itisseenthatlinearlyseparablebinarypatternscanbeclassifiedbyasuitablydesignedneuron,butlinearlynon-separablebinarypatternscannot.PerceptronLearningGiventrainingset{X(t),d(t)},t=0,1,2,…,whereX(t)={x(t),…,x(t),-1}letw=T,x=-11NN+1N+1y=sgn(wx)=n=1N+1nn+1,N+1n=1wx)0-1,otherwisenn(1)Setw(0)=smallrandomvalues,n=1,…,N+1(2)InputasampleX(t)={x(t),…,x(t),-1}andd(t)(3)Computetheactualoutputy(t)(4)Revisetheweightsw(t+1)=w(t)+[d(t)-y(t)]x(t)(5)Returnto(2)untilw(t+1)=w(t)foralln.n1NnnnnnWhere01isacoefficient.TheoremTheperceptronlearningalgorithmwillconvergeifthefunctionislinearlyseparable.GradientDescentAlgorithmTheperceptronlearningalgorithmisrestrictedtothelinearlyseparablefunctioncases(hardlimitingactivationfunction).Gradientdescentalgorithmcanbeappliedinmoregeneralcaseswiththeonlyrequirementthatthefunctionbedifferentiable.Giventhetrainingset(x,y),n=1,…,N,trytofindW*suchthaty^=f(W*x)y.LetE=E=(1/2)(y-y^)=(1/2)(y-f(W*x))betheerrormeasureoflearning.TominimizeE,takegradE===(1/2)=(1/2)=-(y-f(W*•x))nnnnnn=1Nnn=1Nnn2n=1Nnn2wEWmn=1NEWnmn=1N(y-y^)Wmnn2n=1N(y-f(W*•x))Wnn2mn=1Nnnf(W*•x)nWm=-=-(y-f(W*•x))•n=1Nnnf(W*•x)WxWxWnnnm(y-f(W*•x))f’•xn=1NnnmnThelearning(adjusting)ruleisthusasfollowsW=W-=W+0EWmmn=1(y-y^)f’•x,mNnnmnm§8.3Multi-LayerPerceptron1.WhyMulti-Layer?XORwhichcannotbeimplementedby1-layernetworkaswasseencanbeimplementedby2-layernetwork:1.50.5xx1111y2xX=sgn(1x+1x-1.5)y=sgn(1x+1x-2x-0.5)102011102010201xxxy111-11-1-11-11-11-1-1-1-110201Single-layernetworkshasnohiddenunitsandthushavenointernalrepresentationability.Theycanonlymapsimilarinputpatternstosimilaroutputones.Ifthereisalayerofhiddenunits,thereisalwaysaninternalrepresentationoftheinputpatternsthatmaysupportanymappingsfrominputtooutputunits.Thisfigureshowstheinternalrepresentationabilities.NetworkStructureTypesoftheClassified