您的当前位置：首页 Pattern Analysis and Applications manuscript No. (will be inserted by the editor) Decomposi

Pattern Analysis and Applications manuscript No. (will be inserted by the editor) Decomposi

来源：步遥情感网

PatternAnalysisandApplicationsmanuscriptNo.(willbeinsertedbytheeditor)

DecompositionMethodologyforClassiﬁcationTasks-AMetaDecomposerFramework

LiorRokach1

DepartmentofInformationSystemsEngineeringBen-GurionUniversityoftheNegevP.O.B.653,Beer-Sheva84105,Israelliorrk@bgu.ac.il

Thedateofreceiptandacceptancewillbeinsertedbytheeditor

AbstractTheideaofdecompositionmethodologyforclassiﬁcationtasksistobreakdownacomplexclassiﬁcationtaskintoseveralsimplerandmoremanageablesub-tasksthataresolvablebyusingexistinginductionmethods,thenjoiningtheirsolutionstogetherinordertosolvetheoriginalproblem.Inthispaperweprovideanoverviewofverypopularbutdiversedecomposi-tionmethodsandintroducearelatedtaxonomytocategorizethem.Subse-quentlywesuggestusingthistaxonomytocreateanovelmeta-decomposerframeworktoautomaticallyselecttheappropriatedecompositionmethodforagivenproblem.Theexperimentalstudyvalidatestheeﬀectivenessoftheproposedmeta-decomposeronasetofbenchmarkdatasets.1Introduction

Oneoftheexplicitchallengesinclassiﬁcationtasksistodevelopmethodsthatwillbefeasibleforcomplicatedreal-worldproblems.Inmanydisci-plines,whenaproblembecomesmorecomplex,thereisanaturaltendencytotrytobreakitdownintosmaller,distinctbutconnectedpieces.Theconceptofbreakingdownasystemintosmallerpiecesisgenerallyreferredtoasdecomposition.Thepurposeofdecompositionmethodologyistobreakdownacomplexproblemintosmaller,lesscomplexandmoremanageable,sub-problemsthataresolvablebyusingexistingtools,thenjoiningthemtogethertosolvetheinitialproblem.Decompositionmethodologycanbeconsideredasaneﬀectivestrategyforchangingtherepresentationofaclassi-ﬁcationproblem.Indeed,Kusiak[38]considersdecompositionasthe“mostusefulformoftransformationofdatasets”.

Thedecompositionapproachisfrequentlyusedinstatistics,operationsresearchandengineering.Forinstance,decompositionoftimeseriesiscon-

2LiorRokach

sideredtobeapracticalwaytoimproveforecasting.Theusualdecompo-sitionintotrend,cycle,seasonalandirregularcomponentswasmotivatedmainlybybusinessanalysts,whowantedtogetaclearerpictureofthestateoftheeconomy[18].Althoughtheoperationsresearchcommunityhasextensivelystudieddecompositionmethodstoimprovecomputationaleﬃ-ciencyandrobustness,identiﬁcationofthepartitionedproblemmodelhaslargelyremainedanadhoctask[26].

Inengineeringdesign,problemdecompositionhasreceivedconsiderableattentionasameansofreducingmultidisciplinarydesigncycletimeandofstreamliningthedesignprocessbyadequatearrangementofthetasks[37].Decompositionmethodsarealsousedindecision-makingtheory.AtypicalexampleistheAHP(AnalyticHierarchyProcess)method[62].Inartiﬁcialintelligenceﬁndingagooddecompositionisamajortactic,bothforensuringthetransparentend-productandforavoidingacombinatorialexplosion[45].

Researchhasshownthatnosinglelearningapproachisclearlysuperiorforallcases.Infact,thetaskofdiscoveringregularitiescanbemadeeasierandlesstimeconsumingbydecompositionofthetask.However,decom-positionmethodologyhasnotattractedasmuchattentioninthepatternrecognitionandmachinelearningcommunity[11].

Althoughdecompositionisapromisingtechniqueandpresentsanobvi-ouslynaturaldirectiontofollow,therearehardlyanyworksinthepatternrecognitionliteraturethatconsiderthesubjectdirectly.Instead,thereareabundantpracticalattemptstoapplydecompositionmethodologytospe-ciﬁc,reallifeapplications[11].Therearealsomanydiscussionsoncloselyrelatedproblems,largelyinthecontextofdistributedandparallellearning[71]orensemblesclassiﬁers.

Variousdecompositionmethodshavebeenpresented[38].Therewasalsosuggestiontodecomposetheexploratorydataanalysisprocessinto3parts:modelsearch,patternsearch,andattributesearch[7].However,inthiscasethenotionof“decomposition”referstotheentireprocess,whilethispaperfocusesondecompositionofthemodelsearch.

Intheneuralnetworkcommunity,severalresearchershaveexaminedthedecompositionmethodology[25].The“mixture-of-experts”(ME)methoddecomposestheinputspace,suchthateachexpertexaminesadiﬀerentpartofthespace[46].However,thesub-spaceshavesoft“boundaries”,namelysub-spacesareallowedtooverlap.Eachexpertoutputstheconditionalprob-abilityofthetargetattributegiventheinputinstance.Agatingnetworkisresponsibleforcombiningthevariousexpertsbyassigningaweighttoeachnetwork.Theseweightsarenotconstantbutarefunctionsoftheinputinstancex.

Hierarchicalmixturesofexperts(HME)iswell-knownextensiontothebasicmixtureofexperts[32].Thisextensiondecomposesthespaceintosub-spaces,andthenrecursivelydecomposeseachsub-spacetosub-spaces.Variationofthebasicmixturesofexpertsmethodshavebeendevelopedtoaccommodatespeciﬁcdomainproblems.Aspecializedmodularnetwork

TitleSuppressedDuetoExcessiveLength3

calledtheMeta-pinetworkhasbeenusedtosolvethevowel-speakerproblem[24,48].TherehavebeenotherextensionstotheMEsuchasnonlineargatedexpertsfortime-series[70];revisedmodularnetworkforpredictingthesurvivalofAIDSpatients[47];andanewapproachforcombiningmultipleexpertsforimprovinghandwrittennumeralsrecognition[53].

Severaltaxonomiesfordecompositionmethodshavebeensuggestedintheliterature[38,66].However,thereisnoworkthatconsidersthecoexis-tenceofthesediﬀerentdecompositionmethodsinordertoanswerpracticalquestionssuchas:Whenshouldwepreferonedecompositionmethodovertheother?Isitpossibletosolveagivenproblemusingahybridizationofseveraldecompositionmethods?

Inourpreviouswork[57,43],wepresentedapreliminarytaxonomyfordecompositionofclassiﬁcationtasks.Inthispaper,weﬁrstextendthistaxonomyandsubsequentlysuggestasystematicwaysuchataxonomycanbeusedinpractice.Morespeciﬁcallythetaxonomyisusedasthebasisforanewdecisiontree-basedmeta-decomposerwhichaimstoautomaticallyidentifythebestdecompositionmethodforagivendatabase.2DecompositionAdvantages

Decompositionmethodscanimprovethepredictiveaccuracyofregularmethods.Infactinsomecasesimprovingperformanceisthemainmotiva-tionfordecomposition[66].Althoughthismightlooksurprisingatﬁrst,itcanbeexplainedbythebias-variancetradeoﬀ.Sincedecompositionmethod-ologyconstructsseveralsimplersub-modelsinsteadasinglecomplicatedmodel,wemightgainbetterperformancebychoosingtheappropriatesub-models’complexities(i.e.ﬁndingthebestbias-variancetradeoﬀ).Forin-stance,asingledecisiontreethatattemptstomodeltheentireinstancespaceusuallyhashighvarianceandsmallbias.Ontheotherhand,Na¨ıveBayescanbeseenasacompositeofsingle-attributedecisiontrees(eachoneofthesetreescontainsonlyoneuniqueinputattribute).ThebiasoftheNa¨ıveBayesislarge(asitcannotrepresentacomplicatedclassiﬁer);ontheotherhand,itsvarianceissmall.Decompositioncanpotentiallyobtainasetofdecisiontrees,suchthateachoneofthetreesismorecomplicatedthanasingle-attributetree(thusitcanrepresentamorecomplicatedclassiﬁerandithaslowerbiasthantheNa¨ıveBayes)butnotcomplicatedenoughtohavehighvariance.

Thereareotherjustiﬁcationsfortheperformanceimprovementofde-compositionmethods,suchastheabilitytoexploitthespecializedcapabil-itiesofeachcomponent,andconsequentlyachieveresultswhichwouldnotbepossibleinasinglemodel.Forinstancetheidentiﬁcationaccuracyofaclinicaldiagnosiscanbeimprovedwhentheproblemisdecomposedandtwoneuralnetworksaretrained[3].

Oneoftheexplicitchallengesoftheresearchcommunityistodevelopmethodsthatfacilitatetheuseofpatternrecognitionalgorithmsforreal-worldapplications.Intheinformationage,dataisautomaticallycollected

4LiorRokach

andthereforethedatabaseavailableforpatternsdiscoverycanbequitelarge,asaresultofanincreaseinthenumberofrecordsinthedatabaseandthenumberofﬁelds/attributesineachrecord(highdimensionality).Therearemanyapproachesfordealingwithhugedatabasesincluding:samplingmethods;massivelyparallelprocessing;eﬃcientstoragemethods;anddimensionreduction.Decompositionmethodologysuggestsanalterna-tivewaytodealwiththeaforementionedproblemsbyreducingthevolumeofdatatobeprocessedatatime.Decompositionmethodsbreaktheoriginalproblemintoseveralsub-problems,eachonewithrelativelysmalldimension-ality.Inthisway,decompositionreducestrainingtimeandmakesitpossibletoapplystandardpatternrecognitionalgorithmstolargedatabases[66].Decompositionmethodssuggestaconceptualsimpliﬁcationoftheorig-inalcomplexproblem.Insteadofgettingasingleandcomplicatedmodel,decompositionmethodscreateseveralsub-models,whicharemorecompre-hensible.Thismotivationhasoftenbeennotedintheliterature[49,28,66].Smallermodelsarealsomoreappropriateforuser-drivendataminingthatisbasedonvisualizationtechniques.Furthermore,ifthedecompositionstruc-tureisinducedbyautomaticmeans,itcanprovidenewinsightsabouttheexploreddomain.

Modularityeasesthemaintenanceoftheclassiﬁcationmodel.Sincenewdataisbeingcollectedallthetime,itisessentialonceinawhiletoexe-cutearebuildprocesstotheentiremodel.However,ifthemodelisbuiltfromseveralsub-models,andthenewdatacollectedaﬀectsonlypartofthesub-models,amoresimplere-buildingprocessmaybesuﬃcient.Thisjustiﬁcationhasoftenbeennoted[38].

Iftherearenodependenciesbetweenthevarioussub-components,thenparalleltechniquescanbeapplied.Byusingparallelcomputation,thetimeneededtosolveaminingproblemcanbeshortened.

Decompositionmethodologysuggeststheabilitytousediﬀerentinducersforindividualsub-problemsoreventousethesameinducerbutwithadiﬀerentsetup.Forinstance,itispossibletouseneuralnetworkshavingdiﬀerenttopologies(diﬀerentnumberofhiddennodes).Theresearchercanexploitthisfreedomofchoicetoboostclassiﬁerperformance.

Theﬁrstthreeadvantagesareofparticularimportanceincommercialandindustrialdatamining.However,asitwillbedemonstratedlater,notalldecompositionmethodsdisplaythesameadvantages.

3TheElementaryDecompositionTaxonomy

Findinganoptimalorquasi-optimaldecompositionforacertainsupervisedlearningproblemmightbehardorimpossible.Forthatreasontheelemen-tarydecompositionmethodologyhavebeenproposed[43].Thebasicideaistodevelopameta-algorithmthatrecursivelydecomposesaclassiﬁcationproblemusingelementarydecompositionmethods.Weusetheterm“ele-mentarydecomposition”todescribeatypeofsimpledecompositionthat

TitleSuppressedDuetoExcessiveLength5

canbeusedtobuildupamorecomplicateddecomposition.Givenacertainproblem,weﬁrstselectthemostappropriateelementarydecompositiontothatproblem.Asuitabledecomposerthendecomposestheproblem,andﬁ-nallyasimilarprocedureisperformedoneachsub-problem.Thisapproachagreeswiththe“nofreelunchtheorem”,namelyifonedecompositionisbet-terthananotherinsomedomains,thentherearenecessarilyotherdomainsinwhichthisrelationshipisreversed.

Forimplementingthisdecompositionmethodology,onemightconsiderthefollowingissues:

–Whattypeofelementarydecompositionmethodsexistforclassiﬁcationinducers?

–Whichelementarydecompositiontypeperformsbestforwhichproblem?Whatfactorsshouldonetakeintoaccountwhenchoosingtheappropri-atedecompositiontype?

–Givenanelementarytype,howshouldweinferthebestdecompositionstructureautomatically?

–Howshouldthesub-problemsbere-composedtorepresenttheoriginalconceptlearning?

–Howcanweutilizepriorknowledgeforimprovingdecomposingmethod-ology?Figure1suggestsananswertotheﬁrstissue.Thisﬁgureillustratesanovelapproachforarrangingthediﬀerentelementarytypesofdecomposi-tioninsupervisedlearning.

Supervised learning decomposition

Original ConceptIntermediate ConceptTupleAttributeConceptAggregationFunctionDecomposition

SpaceSampleFig.1ElementaryDecompositionMethodsinClassiﬁcation.

3.1IntermediateConceptDecomposition

Inintermediateconceptdecomposition,insteadofinducingasinglecompli-catedclassiﬁer,severalsub-problemswithdiﬀerentandmoresimplecon-

6LiorRokach

ceptsaredeﬁned.Theintermediateconceptscanbebasedonanaggregationoftheoriginalconcept’svalues(conceptaggregation)ornot(functionde-composition).

Classicalconceptaggregationreplacestheoriginaltargetattributewithafunction,suchthatthedomainofthenewtargetattributeissmallerthantheoriginalone.

Conceptaggregationhasbeenusedtoclassifyfreetextdocumentsintopredeﬁnedtopics[11].Thispapersuggestsbreakingthetopicsupintogroups(co-topics).Insteadofpredictingthedocument’stopicdirectly,thedocumentisﬁrstclassiﬁedintooneoftheco-topics.Anothermodelisthenusedtopredicttheactualtopicinthatco-topic.

TheError-CorrectingOutputCoding(ECOC)isageneralconceptag-gregationalgorithmwhichdecomposesmulti-classproblemsintomultiple,two-classproblems[15].Aclassiﬁerisbuiltforeachpossiblebinaryparti-tionoftheclasses.ExperimentsshowthatECOCimprovestheaccuracyofneuralnetworksanddecisiontreesonseveralmulti-classproblemsfromtheUCIrepository.TheideatodecomposeaKclassclassiﬁcationprob-lemsintoKtwoclassclassiﬁcationproblems[2].Eachproblemconsidersthediscriminationofoneclasstotheotherclasses.Thelastmethodcanbeextendedformanipulatingthedatabasedontheclassrelationsamongtrainingdata[41].Byusingthismethod,theydivideaKclassclassiﬁcationproblemintoaseriesofK(K−1)/2two-classproblemswhereeachproblemconsidersthediscriminationofoneclasstoeachoneoftheotherclasses.Theyhaveexaminedthisideausingneuralnetworks.

Theround-robinclassiﬁcationproblem(pairwiseclassiﬁcation)isatech-niqueforhandlingmulti-classproblems,inwhichoneclassiﬁeriscon-structedforeachpairofclasses[20].Empiricalstudyhasshowedthatthismethodcanpotentiallyimproveclassiﬁcationaccuracy.

FunctiondecompositionwasoriginallydevelopedintheFiftiesandSix-tiesfordesigningswitchingcircuits.Itwasevenusedasanevaluationmech-anismforcheckerplayingprograms[63].Thisapproachwaslaterimproved[8].Recently,themachinelearningcommunityhasadoptedthisapproach.Amanualdecompositionoftheproblemandanexpert-assistedselectionofexamplestoconstructrulesfortheconceptsinthehierarchywasstud-ied[45].Incomparisonwithstandarddecisiontreeinductiontechniques,structuredinductionexhibitsaboutthesamedegreeofclassiﬁcationaccu-racywiththeincreasedtransparencyandlowercomplexityofthedevelopedmodels.

Ageneral-purposefunctiondecompositionapproachformachinelearn-inghasalsobeendeveloped[73].Accordingtothisapproach,attributesaretransformedintonewconceptsinaniterativemannerandcreateahierar-chyofconcepts.Itisalsopossibletouseadiﬀerentfunctiondecompositionknownasbi-decomposition[40].

TitleSuppressedDuetoExcessiveLength7

3.2OriginalConceptDecomposition

OriginalConceptdecompositionmeansdividingtheoriginalproblemintoseveralsub-problemsbypartitioningthetrainingsetintosmallertrainingsets.Aclassiﬁeristrainedoneachsub-sampleseekingtosolvetheoriginalproblem.Notethatthisresemblesensemblemethodologybutwiththefol-lowingdistinction:eachinducerusesonlyaportionoftheoriginaltrainingsetandignorestherest.Afteraclassiﬁerisconstructedforeachportionseparately,themodelsarecombinedinsomefashion,eitheratlearningorclassiﬁcationtime.

Therearetwoobviouswaystobreakuptheoriginaldataset:tuple-orientedorattribute(feature)oriented.Tupledecompositionbyitselfcanbedividedintotwodiﬀerenttypes:sampleandspace.Insampledecomposition(alsoknownaspartitioning),thegoalistopartitionthetrainingsetintoseveralsamplesets,suchthateachsub-learningtaskconsiderstheentirespace.

Inspacedecomposition,ontheotherhand,theoriginalinstancespaceisdividedintoseveralsub-spaces.Eachsub-spaceisconsideredindependentlyandthetotalmodelisa(possiblysoft)unionofsuchsimplermodels.

Spacedecompositionalsoincludesthedivideandconquerapproachessuchasmixturesofexperts,locallinearregression,CART/MARS,adaptivesubspacemodels,etc.,[31,32,54,27].

Featuresetdecomposition(alsoknownasattributesetdecomposition)generalizesthetaskoffeatureselectionwhichisextensivelyusedindatamining.Featureselectionaimstoprovidearepresentativesetoffeaturesfromwhichaclassiﬁerisconstructed.Ontheotherhand,infeaturesetdecomposition,theoriginalfeaturesetisdecomposedintoseveralsubsets.Aninduceristraineduponthetrainingdataforeachsubsetindependently,andgeneratesaclassiﬁerforeachone.Subsequently,anunlabelledinstanceisclassiﬁedbycombiningtheclassiﬁcationsofallclassiﬁers.Thismethodpotentiallyfacilitatesthecreationofaclassiﬁerforhighdimensionalitydatasetsbecauseeachsub-classiﬁercopeswithonlyaprojectionoftheoriginalspace.

Intheliteraturethereareseveralworksthatﬁtthefeaturesetdecompo-sitionframework.However,inmostofthepapersthedecompositionstruc-turewasobtainedad-hocusingpriorknowledge.Moreover,itwasarguedthat:“Thereexistsnoalgorithmormethodsusceptibletoperformaverticalself-decompositionwithouta-prioriknowledgeofthetask!”[61].

ThefeaturesetdecompositionalgorithmknownasMFS(MultipleFea-tureSubsets)combinesmultiplenearestneighborclassiﬁers,eachusingonlyasubsetofrandomfeatures[4].ExperimentsshowMFScanimprovethestandardnearestneighborclassiﬁers.Thisprocedureresemblesthewell-knownbaggingalgorithm[10].However,insteadofsamplinginstanceswithreplacement,itsamplesfeatureswithoutreplacement.

Additionalalternativeistogroupthefeaturesaccordingtotheattributetype:nominalvaluefeatures,numericvaluefeaturesandtextvaluefeatures

8LiorRokach

[38].Asimilarapproachwasusedfordevelopingthelinear-bayesclassiﬁer[21].Thebasicideaconsistsofaggregatingthefeaturesintotwosubsets:theﬁrstsubsetcontainingonlythenominalfeaturesandthesecondsubsetonlythecontinuousfeatures.

AnapproachforconstructinganensembleofclassiﬁersusingroughsettheorywaspresentedbyHu[29].AlthoughHu’sworkreferstoensemblemethodologyandnotdecompositionmethodology,itisstillrelevantforthiscase,especiallyasthedeclaredgoalwastoconstructanensemblesuchthatdiﬀerentclassiﬁersusediﬀerentattributesasmuchaspossible.AccordingtoHu,diversiﬁedclassiﬁersleadtouncorrelatederrors,whichinturnimproveclassiﬁcationaccuracy.Themethodsearchesforasetofreducts,whichincludealltheindispensableattributes.Areductrepresentstheminimalsetofattributeswhichhasthesameclassiﬁcationpowerastheentireattributeset.

Thefeaturesetcanbedecomposedaccordingtothetargetclass[68].Foreachclass,thefeatureswithlowcorrelationrelatingtothatclasshavebeenremoved.Thismethodhasbeenappliedonafeaturesetof25sonarsignalswherethetargetwastoidentifythemeaningofthesound(whale,crackingice,etc.).

Featuresetdecompositionhasbeenusedforradarvolcanoesrecognition[14].Inthiscase,afeaturesetof119featureswasmanuallydecomposedinto8subsets.Featuresthatarebasedondiﬀerentimageprocessingopera-tionsweregroupedtogether.Asaconsequence,foreachsubset,fourneuralnetworkswithdiﬀerentsizeswerebuilt.

Thefeaturesetdecompositionwasprovedtobebeneﬁcialinmanyotherapplications,suchastext-independentspeakeridentiﬁcation[13],truckbacker-upperproblem[30]andqualityprobleminmanufacturingplants[42].

Theco-trainingparadigmforlearningwithlabelledandunlabelleddatacanbeconsideredasafeaturesetdecompositionforclassifyingWebpages[9].Co-trainingisusefulwhenthereisalargedatasample,ofwhichonlyasmallpartislabelled.Inmanyapplications,unlabelledexamplesaresig-niﬁcantlyeasiertocollectthanlabelledones.Thisisespeciallytruewhenthelabellingprocessistime-consumingorexpensive,suchasinmedicalapplications.Accordingtotheco-trainingparadigm,theinputspaceisdi-videdintotwodiﬀerentviews(i.e.twoindependentandredundantsetsoffeatures).Foreachview,adiﬀerentclassiﬁerisbuilttoclassifyunlabelleddata.Thenewlylabelleddataofeachclassiﬁeristhenusedtoretraintheotherclassiﬁer.Ithasbeenshown,bothempiricallyandtheoretically,thatunlabelleddatacanbeusedtoaugmentlabelleddata.

Anotheralternativefordecomposingthefeaturesetisasfollows[39]:Allinputfeaturesareinitiallygroupedbyusingahierarchicalclusteringalgorithmbasedonpairwisemutualinformation,withstatisticallysimilarfeaturesassignedtothesamegroup.Asaconsequence,severalfeaturesub-setsareconstructedbyselectingonefeaturefromeachgroup.Aneural

TitleSuppressedDuetoExcessiveLength9

networkissubsequentlyconstructedforeachsubset.Allnetworksarethencombined.

Inthestatisticsliterature,themostwell-knowndecompositionalgorithmistheMARSalgorithm[19].Inthisalgorithm,amultipleregressionfunctionisapproximatedusinglinearsplinesandtheirtensorproducts.IthasbeenshownthatthealgorithmperformsanANOVAdecomposition,namelytheregressionfunctionisrepresentedasagrandtotalofseveralsums.Theﬁrstsumisofallbasicfunctionsthatinvolveonlyasingleattribute.Thesecondsumisofallbasicfunctionsthatinvolveexactlytwoattributes,representing(ifpresent)two-variableinteractions.Similarly,thethirdsumrepresents(ifpresent)thecontributionsfromthree-variableinteractions,andsoon.

Otherworksonfeaturesetdecompositionhavebeendevelopedbyex-tendingtheNa¨ıveBayesclassiﬁer.TheNa¨ıveBayesclassiﬁer[16]usestheBayes’ruletocomputetheconditionalprobabilityofeachpossibleclass,assumingtheinputfeaturesareconditionallyindependentgiventhetar-getfeature.Duetotheconditionalindependenceassumption,thismethodiscalled“Na¨ıve”.Nevertheless,avarietyofempiricalresearchesshowsur-prisinglythattheNa¨ıveBayesclassiﬁercanperformquitewellcomparedtoothermethods,evenindomainswhereclearfeaturedependenciesexist[16].Furthermore,Na¨ıveBayesclassiﬁersarealsoverysimpleandeasytounderstand[36].

Recently,anewgeneralframeworkthatsearchesforhelpfulfeaturesetdecompositionstructureshasbeenproposed[58].Thisframeworknestsmanyalgorithms,twoofwhicharetestedempiricallyoverasetofbench-markdatasets.TheﬁrstalgorithmperformsaserialsearchwhileusinganewVapnik-Chervonenkisdimensionboundformultipleoblivioustreesasanevaluatingschema.Thesecondalgorithmperformsamulti-searchwhileusingwrapperevaluatingschema.Thisworkindicatesthatfeaturesetde-compositioncanincreasetheaccuracyofdecisiontrees.

4TheDecomposer’sCharacteristics

Thefollowingsub-sectionspresentthemainpropertiesthatcharacterizede-composers.Thesepropertiescanbeusefulfordiﬀerentiatingbetweenvariousdecompositionframeworks.

4.1TheStructureAcquiringMethod

Thisimportantpropertyindicateshowthedecompositionstructureisob-tained:

–Manually(explicitly)basedonanexpert’sknowledgeinaspeciﬁcdo-main[9,45].Iftheoriginofthedatasetisarelationaldatabase,thentheschema’sstructuremayimplythedecompositionstructure.

10LiorRokach

–Predeﬁnedduetosomerestrictions(asinthecaseofdistributeddatamining)

–Arbitrarily[17,12]-Thedecompositionisperformedwithoutanypro-foundthought.Usually,aftersettingthesizeofthesubsets,membersarerandomlyassignedtothediﬀerentsubsets.

–Inducedwithouthumaninteractionbyasuitablealgorithm[73].Somemayjustiﬁablyclaimthatsearchingforthebestdecompositionmightbetime-consuming,namelyprolongingtheinductionprocess.Inor-dertoavoidthisdisadvantage,thecomplexityofthedecompositionalgo-rithmsshouldbekeptassmallaspossible.However,evenifthiscannotbeaccomplished,therearestillimportantadvantages,suchasbettercom-prehensibilityandbetterperformancethatmakesdecompositionworththeadditionalcomputationalcomplexity.

Furthermore,itshouldbenotedthatinanongoinginductioneﬀort(likeinachurningapplication)searchingforthebestdecompositionstructuremightbeperformedinwidertimebuckets(forinstance,onceayear)thanwhentrainingtheclassiﬁers(forinstanceonceaweek).Moreover,foracquir-ingdecompositionstructure,onlyarelativelysmallsampleofthetrainingsetmayberequired.Consequently,theexecutiontimeofthedecomposerwillberelativelysmallcomparedtothetimeneededtotraintheclassiﬁers.Usuallyinreal-lifeapplicationsthedecompositionisperformedmanu-allybyincorporatingbusinessinformationintothemodellingprocess.Thefollowingquotationprovidesapracticalexample[6]:

Itmaybeknownthatplatinumcardholdersbehavediﬀerentlyfromgoldcardholders.Insteadofhavingadataminingtechniqueﬁgurethisout,giveitthehintbybuildingseparatemodelsfortheplatinumandgoldcardholders.

Decompositioncanbealsousefulforhandlingmissingdata.Inthiscasewedonotrefertosporadicmissingdatabuttothecasewhereseveralat-tributevaluesareavailableforsometuplesbutnotforallofthem.Forinstance:“Historicaldata,suchasbillinginformation,isavailableonlyforcustomerswhohavebeenaroundforasuﬃcientlylongtime”or“Outsidedata,suchasdemographics,isavailableonlyforthesubsetofthecustomerbasethatmatches”).Inthiscase,oneclassiﬁercanbetrainedforcus-tomershavingalltheinformationandasecondclassiﬁerfortheremainingcustomers[6].

4.2TheMutuallyExclusiveProperty

Thispropertyindicateswhetherthedecompositionismutuallyexclusive(disjointeddecomposition)orpartiallyoverlapping(i.e.acertainvalueofacertainattributeinacertaintupleisutilizedmorethanonce).Forinstance,inthecaseofsampledecomposition,“mutuallyexclusive”meansthatacertaintuplecannotbelongtomorethanonesubset[17,12].Otherhave

TitleSuppressedDuetoExcessiveLength11

usednon-exclusivefeaturedecomposition[4].SimilarlyCARTandMARSperformmutuallyexclusivedecompositionoftheinputspace,whileHMEallowssub-spacestooverlap.

Thepartiallyoverlappingdecompositionsarepotentiallymoreaccuratethanmutuallyexclusivedecompositions,becausethelatterformsarestric-tionontheproblemspacewhichmightskiponaccuratemodels.Stillmu-tuallyexclusivedecompositionhassomeimportantandhelpfulproperties:–Agreatertendencyinreductionofexecutiontimethannon-exclusiveapproaches.Sincemostlearningalgorithmshavecomputationalcom-plexitythatisgreaterthanlinearinthenumberofattributesortuples,partitioningtheproblemdimensionalityinamutuallyexclusivemannermeansadecreaseincomputationalcomplexity[51].

–Sincemutualexclusivenessentailsusingsmallerdatasets,themodelsobtainedforeachsub-problemaresmallerinsize.Withoutthemutuallyexclusiverestriction,eachmodelcanbeascomplicatedasthemodelobtainedfortheoriginalproblem.Smallermodelscontributetocompre-hensibilityandeaseinmaintainingthesolution.

–Mutuallyexclusivedecompositionmayhelpavoidsomeerrorcorrela-tionproblemsthatcharacterizenon-mutuallyexclusivedecompositions[4].However,mutuallyexclusivetrainingsetsdonotnecessarilyresultinlowerrorcorrelation[66].Thispointistruewheneachsub-problemisrepresentative(i.e.representtheentireproblem,asinsampledecom-position).

–Reducedtendencytocontradictionbetweensub-models.Whenamutu-allyexclusiverestrictionisunenforced,diﬀerentmodelsmightgeneratecontradictiveclassiﬁcationsusingthesameinput.Reducinginter-modelscontraindicationshelpustograsptheresultsandtocombinethesub-modelsintoonemodel.Theresultingpredictionsofensemblemethodsareusuallyinscrutabletoend-users,mainlyduetothecomplexityofthegeneratedmodels,aswellastheobstaclesintransformingthesesmodelsintoasinglemodel[56].Moreover,sincethesemethodsdonotattempttouseallrelevantfeatures,theresearcherwillnotobtainacompletepic-tureofwhichattributeactuallyaﬀectsthetargetattribute,especiallywhen,insomecases,therearemanyrelevantattributes.

–Sincethemutuallyexclusiveapproachencouragessmallerdatasets,theyaremorefeasible.Someinducerscanprocessonlylimiteddatasetsize(forinstancewhentheprogramrequiresthattheentiredatasetwillbestoredinthemainmemory).Themutuallyexclusiveapproachcanmakecertainthatinducersarefairlyscalabletolargedatasets[12,51].

–Weclaimthatend-userscangraspmutuallyexclusivedecompositionmucheasierthanmanyothermethodscurrentlyinuse.Forinstance,boosting,whichisawell-knownensemblemethod,distortstheoriginaldistributionofinstancespace,afactthatnon-professionalusersﬁndhardtograsporunderstand.

12LiorRokach

4.3TheInducerUsage

Thispropertyindicatestherelationbetweenthedecomposerandthein-ducerused.Somedecompositionimplementationsare“inducer-free”,namelytheydonotuseintrinsicinducersatall.Usuallythedecompositionpro-cedureneedstochoosethebestdecompositionstructureamongseveralstructuresthatitconsiders.Inordertomeasuretheperformanceofacer-taindecompositionstructure,thereisaneedtorealizethestructurebybuildingaclassiﬁerforeachcomponent.Howeversince“inducer-free”de-compositiondoesnotuseanyinductionalgorithm,itusesafrequencytableoftheCartesianproductofthefeaturevaluesinstead.Considerthefol-lowingexample.Thetrainingsetconsistsoffourbinaryinputattributes(a1,a2,a3,a4)andonetargetattribute(y).Assumethatan“inducer-free”decompositionprocedureexaminesthefollowingfeaturesetdecomposition:(a1,a3)and(a2,a4).Inordertomeasuretheclassiﬁcationperformanceofthisstructure,itisrequiredtobuildtwoclassiﬁers;oneclassiﬁerforeachsubset.Intheabsenceofaninductionalgorithm,twofrequencytablesarebuilt;eachtablehas22=4entriesrepresentingtheCartesianproductoftheattributesineachsubset.Foreachentryinthetable,wemeasurethefrequencyofthetargetattribute.Eachoneofthetablescanbeseparatelyusedtoclassifyanewinstancex:wesearchfortheentrythatcorrespondstotheinstancexandselectthetargetvaluewiththehighestfrequencyinthatentry.This“inducer-free”strategyhasbeenusedinseveralplaces.ForinstancetheextensionofNa¨ıveBayessuggestedcanbeconsideredasafeaturesetdecompositionwithnointrinsicinducer[16].Thefunctionde-compositionalgorithmdevelopedbyusingsparsefrequencytablesalsoﬁtsthisstrategy[73].

Otherimplementationsareconsideredasan“inducer-dependent”type,namelythesedecompositionmethodsuseintrinsicinducers,andtheyhavebeendevelopedspeciﬁcallyforacertaininducer.Theydonotguaranteeeﬀectivenessinanyotherinductionmethod.Forinstance,someworkshavebeendevelopedspeciﬁcallyforneuralnetworks[41]ordecisiontrees[58].Thethirdtypeofdecompositionmethodisthe“inducer-independent”type.Theseimplementationscanbeperformedonanygiveninducer,how-ever,thesameinducerisusedinallsubsets.Asopposedtothe“inducer-free”implementation,whichdoesnotuseanyinducerforitsexecution,“inducer-independent”requirestheuseofaninducer.Nevertheless,itisnotlimitedtoaspeciﬁcinducerlikethe“inducer-dependent”.

Thelasttypeisthe“inducer-chooser”type,which,givenasetofinduc-ers,thesystemusesthemostappropriateinduceroneachsub-problem.4.4Exhaustiveness

Thispropertyindicateswhetheralldataelementsshouldbeusedinthedecomposition.Forinstance,anexhaustivefeaturesetdecompositionreferstothesituationinwhicheachfeatureparticipatesinatleastonesubset.

TitleSuppressedDuetoExcessiveLength13

4.5CombinerUsage

Thispropertyspeciﬁestherelationbetweenthedecomposerandthecom-biner.Somedecomposersarecombiner-dependent.ThatistosaytheyhavebeendevelopedspeciﬁcallyforacertaincombinationmethodlikevotingorNa¨ıveBayes.Otherdecomposersarecombiner-independent;thecombina-tionmethodisprovidedasinputtotheframework.Potentiallytherecouldbedecomposersthat,givenasetofcombiners,wouldbecapableofchoosingthebestcombinerinthecurrentcase.4.6SequentiallyorConcurrently

Thispropertyindicateswhetherthevarioussub-classiﬁersarebuiltsequen-tiallyorconcurrently.Insequentialframeworktheoutcomeofacertainclassiﬁermayeﬀectthecreationofthenextclassiﬁer.Ontheotherhand,inconcurrentframeworkeachclassiﬁerisbuiltindependentlyandtheirresultsarecombinedinsomefashion.Somereferstothispropertyas“Therelation-shipbetweenmodules”[65]anddistinguishesbetweenthreediﬀerenttypes:successive,cooperativeandsupervisory.Roughlyspeakingthe“successive”refersto“sequential”while“cooperative”refersto“concurrent”.Thelasttypeappliestothecaseinwhichonemodelcontrolstheothermodel,forinstance,oneneuralnetworkisusedtotuneanotherneuralnetwork.

Theoriginalprobleminintermediateconceptdecompositionisusuallyconvertedtoasequentiallistofproblems,wherethelastproblemaimstosolvetheoriginalone.Ontheotherhand,inoriginalconceptdecompositiontheproblemisusuallydividedintoseveralsub-problemswhichexistontheirown.Nevertheless,therearesomeexceptions.Forinstance,the“windowing”concept[52]isconsideredtobesequential.

Naturallytheremightbeotherimportantpropertieswhichcanbeusedtodiﬀerentiateadecompositionscheme.Table1summarizesthemostrel-evantresearchperformedoneachdecompositiontype.

Table1SummaryofDecompositionMethodsintheLiterature.Paper[2][11][45][73][1][17][59][54][35][4][38]

DecompositionTypeConceptConceptFunctionFunctionSampleSampleSampleSpaceSpaceAttributeAttribute

MutuallyExclusiveNoYesYesYesNoYesYesNoYesNoYes

StructureAcquiringMethodArbitrarilyManuallyManuallyInducedArbitrarilyArbitrarilyInducedInducedInducedArbitrarilyManually

14LiorRokach

5MetaDecomposerFramework

Asstatedabove,ourultimategoalistodevelopamechanismthatcom-binesalldecompositionmethodssuchthatgivenanunseendataset;themostappropriatedecomposition(ifany)couldbeselected.Therearetwoalternativestoachievethisautomaticandsystematicdecompositionproce-dure:

–Thewrapperapproach–Givenacertaindataset,useeachelementarydecompositionandselecttheonethatappearstogivethehighestsuccessrate.Themainadvantageofthisapproachisitsabilitytopredictquitewelltheperformanceofeachexaminedmethod.Themaindisadvantageofthismethodisit’sprolongedprocessingtime.Forsomeinducerstheinductiontimesmaybeverylong,particularlyinlargereal-lifedatasets.Severalresearchershaveimplementedthismethodforselectinginduc-tionalgorithmsordimensionreductionalgorithmsandshowedthatitproducessuperiorresults[,34].

–Themeta-learningapproach–Basedondatasetscharacteristics,themeta-decomposerdecideswhethertodecomposetheproblemornotandwhatelementarydecompositiontouse.Theideaofthemeta-decomposerapproachcanbesummarizedasfollows:Ifacertaindecompositionmethodoutperformsotherdecompositionmethodsinaparticulardataset,thenoneshouldexpectthatthismethodwillbepreferablewhenotherproblemswithsimilarcharacteristicsarepresented.Forthispurposeonecanemploymeta-learning.Meta-learningisconcernedwithaccu-mulatingexperienceontheperformanceofmultipleapplicationsofalearningsystem.Onepossibleoutputofthemeta-learningprocessisameta-classiﬁerthatiscapabletoindicatewhichlearningmethodismostappropriatetoagivenproblem.Inthispaperthemeta-learningfocusesonexplainingwhatcausesadecompositionmethodtobesuccessfulornotinaparticularproblem.Thus,inthiscasethemeta-classiﬁerisusedameta-decomposerwhichattemptstoselectthemostappropriatedecompositionmethod.Thisgoalcanbeaccomplishedbyperformingthefollowingphases:Intheﬁrstphaseoneshouldexaminetheperfor-manceofallinvestigateddecompositionmethodsonvariousdatasets.Uponexaminationofeachdataset,thecharacteristicsofthedatasetareextracted.Thedataset’scharacteristics,togetherwiththeindicationofthemostpreferabledecompositionmethod,(inthisdataset)arestoredinameta-dataset.Thismeta-datasetreﬂectstheexperienceaccumu-latedacrossdiﬀerentdatasets.Inthesecondphase,aninducercanbeappliedtothismeta-datasettoinduceameta-decomposerthatcanmapadatasettothemostappropriatedecompositionmethod(basedonthecharacteristicsofthedataset).Inthelastphase,themeta-decomposerisactuallyusedtomatchanewunseendatasettothemostappropriatedecompositionmethod.Thispaperadoptsthesecondalternativeandexaminesitonrealworldproblems.Previousworkshavealreadyconsideredthisapproachforselect-

TitleSuppressedDuetoExcessiveLength15

ingthemostappropriateinductionalgorithmgivendatasetcharacteristics(seeforinstance[22,69,5]).However,applyingthismethodologyforselect-ingthemostappropriatedecompositiongivenacertaindataset,hasnotyetbeenconsidered.Themaindisadvantagesofthemeta-learningprocessconcerntheassumptionthatdatasetswithsimilarpropertiesbehavethesame.Furthermore,inmeta-learning,theamountofdataavailable(datasetdescriptionsanddiﬀerentperformances)isusuallyquitesmall,thusthemeta-decomposerisbasedonsmallmetadatasets.Nevertheless,themainadvantageofthisapproachisthatafterthemeta-learningphaseiscom-pleted,itcanbeusedtoselectthebestdecompositioninnegligibleprocess-ingtime.

Figures2and3presenttheschematicframeworkofthemeta-decomposer.Figure2presentsthemeta-datagenerationphase.Figure3presents(A)themeta-learningphaseand(B)theusageofthemeta-decomposer.AsitcanbeseeninFigure2,theDataset-Generatorcomponentisresponsibletoextendtheoriginaldatasetsrepositoryintoamuchbiggerrepositorybymanipulatingtheoriginaldatasets.

OriginalDatasetsRepositoryDatasetGeneratorDecomposersEvaluatorManipulatedDatasetsRepositoryDatasetCharacterizerMeta-DataFig.2Meta-DataGenerationPhase.

5.1DatasetCharacterizer

Itappearsthatdatasetscanbedescribedusingavectorofnumericvaluesusingcertainfeatures.Itispossibletocategorizethecharacteristicmeasuresintothefollowingtypes[22]:

–Simplemeasures(e.g.numberofattributes,numberofclasses,propor-tionofbinary,categoricalorordinalattributes).

16LiorRokach

Meta-DataNewDatasetsInducerDatasetCharacterizerMeta-DecomposerAMeta-DecomposerWhat DecomposerShould be Used?BFig.3(A)MetaInductionPhaseFigureand(B)Meta-DecomposerUsage.

–Statisticalmeasures(e.g.standarddeviationratio).–Informationbasedmeasures(e.g.meanentropy).TheMetaattributesusedhereare:

1.Numberofinstancesinthetrainingdataset.2.Numberofattributes.

3.Ratioofnumberofinstancestothenumberofattributes-Potentialforoverﬁtting.Ifthisvalueissmall,inducersmayﬁndaclassiﬁerthatadaptstoowelltothespeciﬁcitiesofthetrainingdata,whichmaybecausedbynoisyorirrelevantattributes,andthusresultinpoorgener-alizationperformance.

4.Numberofclasses—Thedomainsizeofthetargetattribute.5.Proportionofthebinaryattributes.6.Proportionofthecategoricalattributes.7.Proportionoftheordinalattributes.

8.Defaultaccuracy—accuracyobtainedwhenusingthemostfrequentclassinthetrainingset,withoutbuildinganyclassiﬁer.

9.Meanofmeans—themeanofallnumericalattributesmeans.

10.Meanofstandarddeviation—themeanofallnumericalattributes

standarddeviations.

TitleSuppressedDuetoExcessiveLength17

11.MeanKurtosis-Themeanofallnumericalattributeskurtosismeasure.

Kurtosisisameasureofwhetherthedataarepeakedorﬂatrelativetoanormaldistribution.

12.MeanSkewness—Themeanofallnumericalattributesskewnessmea-sure.Skewnessisameasureofsymmetry,ormoreprecisely,thelackofsymmetry.

13.Meanofentropies—meanofallattributessimpleentropy.Entropy

measurestheamountofuncertaintyofarandomvariable.TheShanonentropymeasureresemblesinmanywaystotheclassicvariancemea-surement.Bothofthemaremeasuresofquantifyinguncertaintychanges.However,entropyisdiﬀerentfromvariancebyitsmetric-freenature:Itisdependentonlyontheprobabilitydistributionofarandomvariableandnotonitsvalues.However,theentropymeasureisnotexpressiveasmuchasthecumulativeeﬀectofallstatisticalmeasures.Forthispurposeitispossibletouseoneofthegeneralizedentropymeasuresavailableintheliterature.Forthispurposewecanusetwogeneralizationsproposedintheliterature:Renyi[55]andTsallis[67].

14.Averageabsolutecorrelationbetweennumericattributes:indicatero-bustnesstoirrelevantattributes.

15.Proportionofnumericattributeswithoutliers:Indicaterobustnessto

outlyingvalues.Acertainattributeisconsideredtohaveoutliersiftheratioofthevariancesofmeanvalueandtheα-trimmedmean(whereα=0.05)issmallerthan0.7.

16.AverageGainRatio—Averageinformationgainratioofthetarget

attributeobtainedbysplittingthedataaccordingtoeachattribute.Usefulasanindicativetotheamountofrelevantinformationembodiedintheinputattributes.

5.2DatasetManipulator

Asstatedaboveoneofthemaindrawbacksofthemeta-learningmethodol-ogyisthenecessitytoinducefromaverylimitedmeta-dataset.ThepurposeoftheDatasetManipulatorcomponentistoextendtheoriginalrepositoryofdatasetsintoamuchbiggerrepositoryandbythatovercomingthelimitedmeta-datasetproblem.

Obviouslythemanipulationoperatorsshouldeﬃcientlyaﬀectthedatasetcharacteristicsinordertoexplorenewoptionsthatarenotrepresentedintheoriginalrepository.Thefollowingsimpleoperatorscanbesuitabletothistask:

–Projection–Randomlychooseasubsetoftheoriginalinputattributessetandprojectthedataaccordingtoit.Thisoperationcanhavediﬀerentlevelsofmanipulationbysettingaparameterthatrepresentsthesubsetsizeasaportionoftheoriginalattributeset.Notethatthisoperationisdisablediftheparameterissetto100%.

18LiorRokach

–Selection–Randomlyselectasub-sampleoftheoriginaldataset.Thisoperationcanhavediﬀerentlevelsofmanipulationbysettingaparam-eterthatrepresentsthesub-samplesizeasaportionoftheoriginaldataset.Notethatthisoperationisdisablediftheparameterissetto100%.

–Distortion–Changingthedistributionofthetargetattributebyreas-signingsomeoftheinstancestoadiﬀerentclass.Thisoperationcanhavediﬀerentlevelsofmanipulationbysettingaparameterthatrep-resentstheportionofinstancesthatremainuntouched.Notethatthisoperationisdisablediftheparameterissetto100%.Eachmanipulateddatasetisobtainedbyperformingthesethreeopera-tions.Notethatthereisnomeaningtotheorderofmanipulativeoperations.5.3DecomposerEvaluator

Theaimofthiscomponentistoevaluatetheperformanceofeachdecompo-sitionmethodoneachdatasetinthemanipulatedrepository.InthispaperweusetheC4.5algorithm[52]astheinternalinducer.Furthermore,wechecktheperformanceofC4.5withoutperforminganydecompositionatall.Eachmanipulateddatasetisrepresentedinthemeta-datasetasasin-gletuple,whereitstargetvalueischosentobethemethod’snamehavingthehighestperformance(fromaccuracyperspective)forthisdataset.Theperformanceisevaluatedbyaveragingtheresultsobtainedby10-fold-cross-validationprocedurerepeated5times.6ExperimentalStudy

Inthissectionweexaminethesuggestedmeta-decompositionapproach.Forthispurposeweuse25datasetsfromtheUCIrepository[44].

Foreachmanipulationoperationwehaveexaminedfourdiﬀerentlevelsofmanipulation:100%,90%,75%and50%.Thisresultsindiﬀerentcom-binations(manipulateddatasets)foreverydatasetintheoriginalrepository.Becausetheoriginaldatasetrepositorycontains25datasets,themanipu-latedrepositoryconsistsof1600datasets.

Inthispaperwecompareonlythreedecompositionmethods:FeaturesetdecompositionusingDOGalgorithm[58],SpaceDecompositionusingK-Classiﬁeralgorithm[60]andSampleDecompositionusingClusterBasedConcurrentalgorithm[59].Allalgorithmswereexecutedusingtheirdefaultparameters.Obviouslythereareseverallimitationsinthemethodologypre-sentedabove.First,theresultscouldbealteredifthealgorithms’parametersaretuneddiﬀerently.Second,thereisnoguaranteethatthesealgorithmspreciselyrepresentthecorrespondeddecompositionmethod,namelyifdif-ferentdecompositionalgorithmshadbeenemployedheretheresultscouldbediﬀerent.

TitleSuppressedDuetoExcessiveLength19

Figure4presentsthedecisiontreeobtainedbyemployingtheC4.5algorithmonmeta-data(thetreehasbeenslightlymodiﬁedforlegibil-ity).Itcanbeseenthatspacedecompositionisusefulwhentherearenumericattributes.ThismakessenseastheemployedalgorithmusestheK-Meansalgorithmwhichismoresuitabletonumericinstancespace.TheInstance-AttributeRatioMetaattributeisusedtodiﬀerentiatebetweenAT-TRIBUTE(AttributeDecomposition)andNONE(notperformingdecom-positionatall)inthesecondandthirdleaves.Inthiscase,iftheInstance-AttributeRatioisbelowa78.77(namelytherearemanyattributesrela-tivelytothedatasetsize),thenattributedecompositionshouldbeapplied.AnotherinterestingobservationisthatbothMeanEntropy2andMeanTsal-lisEntropy2(Tsallis’EntropyMeasure)arefoundtoberelevant.Thisindi-catesthatthenewproposedmeasureisnotredundant.

numprop<=0

MeanEntropy2<=0.03−→ATTRIBUTE(362.0/57.0)MeanEntropy2>0.03

Catprop<=0.67

MeanTsallisEntropy2<=0.57

Instance-AttributeRatio<=78.77

−→ATTRIBUTE(336.0/135.0)

Instance-AttributeRatio>78.77

−→NONE(213.0/56.0)

MeanTsallisEntropy2>0.57

−→SAMPLE(425.0/235.0)

Catprop>0.67−→NONE(328.0)

numcount>0−→SPACE(280.0/151)Fig.4Meta-DecomposerDecisionTree.

6.1EvaluatingtheMeta-Decomposer

Toevaluatewhethermeta-decomposerbringssomebeneﬁts,wehavecarriedouttheleave-one-outprocedure.Accordingtothisprocedurethemeta-decomposerhasbeengeneratedbylearningfromapartialmeta-datasetobtainedbyremovingoneoriginaldataset(andallitsderiveddatasets).Thentheobtainedmeta-decomposerhasbeenevaluatedonthedatasetthathasbeenleftout.Thisprocedurehasbeenrepeated25times,eachtimeleavingoutadiﬀerentdataset.Table2presentstheobtainedresults.Theﬁrstcolumnshowsthedatasetname.Thesecondandthirdcolumnscorrespondinglypresenttheactualbestmethodandthemethodanticipated

20LiorRokach

bythemeta-decomposertobethemostsuitablemethod.Thefourth,ﬁfthandsixthcolumnspresenttheaccuracyobtainedusingC4.5,theactualbestmethodandtheanticipatedmethodrespectively.Asitcanbeseeninalmosttwothirdsofthedatasets,themeta-decomposerwasabletopredictwhichdecompositionmethod(ifatall)outperformsothermethods.Intheremainingcasestheperformanceoftheanticipatedmethodhasbeenslightlylessaccuratethantheactualbestmethodwithnostatisticalsigniﬁcance.Moreover,employingthemeta-decomposertogetherwithitscorrespondeddecompositionmethodcanimproveonaveragetheaccuracyofC4.5in6.86%±3.65%(whilethebestimprovementthatmightbeobtainedbyselectingthemostsuitabledecompositionmethodis7.07%±3.%).

Table2PerformanceResultsofMeta-DecomposerProcedure.DatasetName

ActualBestMethodAttributeAttributeAttributeAttributeNoneSpaceAttributeAttributeSpaceAttributeAttributeSampleAttributeNoneNoneAttributeAttributeSpaceAttributeNoneNoneAttributeAttributeSampleAttributeAttribute

AnticipatedAccuracyBestC4.5MethodNoneATTNoneAttributeNoneNoneAttributeAttributeSpaceAttributeAttributeSampleNoneAttributeSpaceAttributeAttributeAttributeAttributeSpaceNoneAttributeAttributeNoneAttributeAttribute

85.36±5.175±6.9592.99±2.8770.32±8.4696±3.3399.44±0.5587.72±12.7259.09±6.974.96±0.838.71±17.8275.81±8.261.54±8.693.44±3.7100±097.45±0.462.42±269.71±5.492.83±1.5291.2±1.985.7±1.6596.21±2.4585.96±6.993.07±5.875±9.278.33±3.669.8±4.7

AccuracyActualBestMethod86.52±2.578.5±6.5497.42±1.1780±6.95.33±5.0599.62±0.4398.24±4.5273.±5.577.46±0.93.55±10.0598.39±2.365.36±5.793.442±3.3100±090.65±0.691.73±1.477.12±8.794.9±4.6195.8±0.979.33±490.52±1.2396.63±3.998.02±3.0279±8.1.33±2.771.4±2.6

AccuracyAnticipatedBestMethod85.36±5.178.5±6.5492.99±2.8780±6.96±3.3399.44±0.5598.24±4.5273.±5.577.46±0.93.55±10.0598.39±2.365.36±5.793.44±3.7100±090.65±0.691.73±1.477.12±8.792.9±2.5695.8±0.979.33±496.21±2.4596.63±3.998.02±3.0275±9.2.33±2.771.4±2.6

AUST

AUDIOLOGYBCAN

HEPATITISIRIS

KR-VS-KPLABORLED17LETTERLCANMONKS1MONKS2MONKS3MUSHNURSEOPTICSONARSOYBEANSPITTTVOTEWINEZOO

ARCENEDEXTERMADELON

TitleSuppressedDuetoExcessiveLength21

7TheRelationtoOtherMethodologies

Themaindistinctionbetweenexistingapproaches,suchasensemblemeth-odsanddistributeddataminingtodecompositionmethodology,focusesonthefollowingfact:theassumptionthateachmodelhasaccesstoacom-parablequalityofdataisnotvalidinthedecompositionapproach[68].Indecompositionmethodologyclassiﬁersmayhavesigniﬁcantvariationsintheiroverallperformance.Furthermorewhenindividualclassiﬁershavesubstantiallydiﬀerentperformancesoverdiﬀerentpartsoftheinputspace,combiningisstilldesirable[68].Neverthelessneithersimplecombinersnormoresophisticatedcombinersareparticularlywell-suitedforthetypeofproblemsthatarise.

Theensemblemethodologyiscloselyrelatedtothedecompositionmethodology.Inbothcasestheﬁnalmodelisacompositeofmultiplemod-elscombinedinsomefashion.However,itispossibletodistinguishbetweenthesemethodologiesinthefollowingway[65]:themainideaofensemblemethodologyistocombineasetofmodels,eachofwhichsolvesthesameoriginaltask.Thepurposeofensemblemethodologyistoobtainamoreaccurateandreliableperformancethanwhenusingasinglemodel.Ontheotherhand,thepurposeofdecompositionmethodologyistobreakdownacomplexproblemintoseveralmanageableproblems,enablingeachinducertosolveadiﬀerenttask.Therefore,inensemblemethodology,anymodelcanprovideasuﬃcientsolutiontotheoriginaltask.Ontheotherhand,indecompositionmethodology,acombinationofallmodelsismandatoryforobtainingareliablesolution.

Distributeddatamining(DDM)dealswithminingdatathatmightbeinherentlydistributedamongdiﬀerent,looselycoupledsiteswithslowcon-nectivity,suchasgeographicallydistributedsitesconnectedovertheInter-net[33].UsuallyDDMiscategorizedaccordingtodatadistribution:–Homogeneous–Inthiscase,thedatasetsinallthesitesarebuiltfromthesamecommonsetofattributes.Thisstateisequivalenttothesampledecompositiondiscussedabove,whenthedecompositionstructureissetbytheenvironment.

–Heterogeneous–Inthiscase,thequalityandquantityofdataavailabletoeachsitemayvarysubstantially.Sinceeachspeciﬁcsitemaycon-taindatafordiﬀerentattributes,leadingtolargediscrepanciesintheirperformance,integratingclassiﬁcationmodelsderivedfromdistinctanddistributeddatabasesiscomplex.DDMcanbeusefulalsointhecaseof“mergersandacquisitions”ofcorporations.Insuchcases,sinceeachcompanyinvolvedmayhaveitsownITlegacysystems,diﬀerentsetsofdataareavailable.

InDDMthediﬀerentsourcesaregiven,namelytheinstancesarepre-decomposed.Asaresult,DDMismainlyfocusedoncombiningthevariousmethods.Severalresearchersdiscusswaysofleveragingdistributedtech-niquesinknowledgediscovery,suchasdatacleaningandpreprocessing,transformation,andlearning.

22LiorRokach

TheJAMsystemisameta-learningapproach[50].Themeta-learningapproachisaboutcombiningseveralmodels(describingseveralsetsofdatafromseveralsourcesofdata)intoonehigh-levelmodel.

Anothermeta-learningconceptistheknowledgeprobing[23].Inknowl-edgeprobing,supervisedlearningisorganizedintotwostages.Intheﬁrststage,asetofbaseclassiﬁersisconstructedusingthedistributeddatasets.Inthesecondstage,therelationshipbetweenanattributevectorandtheclasspredictionsfromallofthebaseclassiﬁersisdetermined.

AcloselyrelatedﬁeldisParallelDataMining(PDM).PDMdealswithminingdatabyusingseveraltightly-coupledsystemswithfastinterconnec-tion,asinthecaseofaclusterofsharedmemoryworkstations[71].

ThemaingoalofPDMtechniquesistoscale-upthespeedofthedataminingonlargedatasets.Itaddressestheissuebyusinghighperformance,multi-processorcomputers.Theincreasingavailabilityofsuchcomputerscallsforextensivedevelopmentofdataanalysisalgorithmsthatcanscaleupasweattempttoanalyzedatasetsmeasuredinterabytesonparallelma-chineswiththousandsofprocessors.Thistechnologyisparticularlysuitableforapplicationsthattypicallydealwithlargeamountsofdata,e.g.companytransactiondata,scientiﬁcsimulationandobservationdata.Anotherim-portantexampleofPDMistheSPIDERprojectthatusesshared-memorymultiprocessorssystems(SMPs)toaccomplishPDMondistributeddatasets[72].

8ConclusionandFutureWork

Inthispaperwehavereviewedthenecessityofdecompositionmethodol-ogyinpatternrecognition,machinelearninganddatamining.Wehavesuggestedanapproachtocategorizeelementarydecompositionmethods.Wealsodiscussedthemaincharacteristicsofdecompositionmethodsanddemonstratedthesecharacteristicsonvariousmethodspresentedinthelit-erature.

Finallyweproposedameta-decompositionapproachandvalidateditspredictioncapabilities.Wehaveshownempiricallythattheproposedmeta-decomposerusuallyselectthemostappropriatedecompositionmethod.Infactusingthemeta-decomposerachievedanaccuracyimprovementof6.8%whileusingtheposterioribestmethodhasprovidedonlyslightlybetterresults(7.1%).

Additionalresearchissuesinmetadecomposerapproachinclude:Ex-tendingthemeta-learningschematoinvestigateotherdecompositionmeth-odspresentedinSection3,moreprecisely:FunctionDecompositionandConceptAggregation;Checkingwhetherthemeta-learningresultsarestillvalidwhendiﬀerentdecompositionalgorithmsimplementationsareused,namelyexaminetherobustnessofthemeta-decomposer;Examinetheef-fectivenessofrecursivelyusingthemeta-decomposer;andﬁnallyhowcanweutilizepriorknowledgefordecompositionmethodology.

TitleSuppressedDuetoExcessiveLength23

9OriginalityandContribution

Thispaperintroducesanoveltaxonomyfordecompositionmethodsasap-pliedtoclassiﬁcationtasks.Theproposedtaxonomyreferstoelementarydecompositionmethodsthatcanbeusedasbuildingblockstoconstructamorecomplicateddecomposition.Thetaxonomyisillustratedusinganextensivereviewofexistingdecompositionmethods.

Thetaxonomyissubsequentlyusedasthebasisforanewmeta-decompositionmethodologywhichisdesignedtoautomaticallyselectthebestdecompositionmethodforagivendatabase.Forthispurposeweex-amineameta-decomposerframeworkthatcontainstwophases.Intheﬁrstphasethemeta-datasetisgeneratedbyevaluatingtheperformanceofsev-eraldecompositionmethodsonagivensetofdatasets.Everydatasetischaracterizedbyasetofwell-knownmeasures,suchasentropyandskew-ness.Inordertosigniﬁcantlyextendthedatasetvariety,wesuggesttousearandomizedatasetmanipulator.Inthesecondphaseadecisiontree-basedmeta-decomposeristrainedbyusingthegeneratedmeta-dataset.Thenwhenanewdatasetisprovided,weemploythemeta-decomposertoselectthemostpromisingdecompositionmethod.10Acknowledgment

TheauthorwouldliketothankProf.OdedMaimonforhishelpfulandinspiringcomments.References

1.AliK.M.,PazzaniM.J.,ErrorReductionthroughLearningMultipleDe-scriptions,MachineLearning,24:3,173-202,1996.

2.AnandR,MethrotraK,MohanCK,RankaS.Eﬃcientclassiﬁcationformulti-classproblemsusingmodularneuralnetworks.IEEETransNeuralNetworks,6(1):117-125,1995.

3.Baxt,W.G.,Useofanartiﬁcialneuralnetworkfordataanalysisinclinicaldecisionmaking:Thediagnosisofacutecoronaryocclusion.NeuralCompu-tation,2(4):480-4,1990.

4.Bay,S.,Nearestneighborclassiﬁcationfrommultiplefeaturesubsets.Intelli-gentDataAnalysis,3(3):191-209,1999.

5.BensusanH.andKalousisA.,EstimatingthePredictiveAccuracyofaClas-siﬁer,InProc.Proceedingsofthe12thEuropeanConferenceonMachineLearning,pages25-36,2001.

6.BerryM.,andLinoﬀG.,MasteringDataMining,JohnWiley&Sons,2000.7.BhargavaH.K.,DataMiningbyDecomposition:AdaptiveSearchforHy-pothesisGeneration,INFORMSJournalonComputingVol.11,Iss.3,pp.239-47,1999.

8.Biermann,A.W.,Faireld,J.,andBeres,T.,1982.Signaturetablesystemsandlearning.IEEETrans.Syst.ManCybern.,12(5):635-8.

24LiorRokach

9.BlumA.,andMitchellT.,CombiningLabeledandUnlabeledDatawithCo-Training.InProc.ofthe11thAnnualConferenceonComputationalLearningTheory,pages92-100,1998.

10.BreimanL.,Baggingpredictors,MachineLearning,24(2):123-140,1996.11.Buntine,W.,“GraphicalModelsforDiscoveringKnowledge”,inU.Fayyad,

G.Piatetsky-Shapiro,P.Smyth,andR.Uthurusamy,editors,AdvancesinKnowledgeDiscoveryandDataMining,pp59-82.AAAI/MITPress,1996.12.ChanP.K.andStolfoS.J,OntheAccuracyofMeta-learningforScalable

DataMining,J.IntelligentInformationSystems,8:5-28,1997.

13.ChenK.,WangL.andChiH.,MethodsofCombiningMultipleClassiﬁers

withDiﬀerentFeaturesandTheirApplicationstoText-IndependentSpeakerIdentiﬁcation,InternationalJournalofPatternRecognitionandArtiﬁcialIn-telligence,11(3):417-445,1997.

14.Cherkauer,K.J.,HumanExpert-LevelPerformanceonaScientiﬁcImage

AnalysisTaskbyaSystemUsingCombinedArtiﬁcialNeuralNetworks.InWorkingNotes,IntegratingMultipleLearnedModelsforImprovingandScal-ingMachineLearningAlgorithmsWorkshop,ThirteenthNationalConferenceonArtiﬁcialIntelligence.Portland,OR:AAAIPress,1996.

15.Dietterich,T.G.,andGhulumBakiri.Solvingmulticlasslearningproblems

viaerror-correctingoutputcodes.JournalofArtiﬁcialIntelligenceResearch,2:263-286,1995.

16.Domingos,P.,&Pazzani,M.,OntheOptimalityoftheNaiveBayesClassiﬁer

underZero-OneLoss,MachineLearning,29:2,103-130,1997.

17.Domingos,P.,UsingPartitioningtoSpeedUpSpeciﬁc-to-GeneralRuleIn-duction.InProceedingsoftheAAAI-96WorkshoponIntegratingMultipleLearnedModels,pp.29-34,AAAIPress,1996.

18.Fischer,B.,“DecompositionofTimeSeries-ComparingDiﬀerentMethods

inTheoryandPractice”,EurostatWorkingPaper,1995.

19.Friedman,J.H.,“MultivariateAdaptiveRegressionSplines”,TheAnnualOf

Statistics,19,1-141,1991.20.F¨urnkranz,J.,Moreeﬃcientwindowing,InProceedingofThe14thnational

ConferenceonArtiﬁcialIntelegence(AAAI-97),pp.509-514,Providence,RI.AAAIPress,1997.

21.GamaJ.,ALinear-BayesClassiﬁer.InC.Monard,editor,AdvancesonArti-ﬁcialIntelligence–SBIA2000.LNAI1952,pp269-279,SpringerVerlag,200022.Giraud–CarrierC.,VilaltaR.,BrazdilP.,IntroductiontotheSpecialIssueof

onMeta-Learning,MachineLearning,54(3),197-194,2004.

23.GuoY.andSutiwaraphunJ.,KnowledgeprobingindistributedDataMining,

inProc.4hInt.Conf.KnowledgeDiscoveryDataMining,pp61-69,1998.24.Hampshire,J.B.,andWaibel,A.Themeta-Pinetwork-buildingdistributed

knowledgerepresentationsforrobustmultisourcepattern-recognition.PatternAnalysesandMachineIntelligence14(7):751-769,1992.

25.HansenJ.,CombiningPredictors.MetaMachineLearningMethodsandBias,

Variance&AmbiguityDecompositions.PhDdissertation.AurhusUniversity.2000.

26.HeD.W.,StregeB.,TolleH.,andKusiakA.,DecompositioninAutomatic

GenerationofPetriNetsforManufacturingSystemControlandScheduling,InternationalJournalofProductionResearch,38(6):1437-1457,2000.

27.Holmstrom,L.,Koistinen,P.,Laaksonen,J.,andOja,E.,Neuralandstatisti-calclassiﬁers-taxonomyandacasestudy.IEEETrans.onNeuralNetworks,8,:5–17,1997.

TitleSuppressedDuetoExcessiveLength25

28.HrycejT.,ModularLearninginNeuralNetworks.NewYork:Wiley,1992.29.Hu,X.,UsingRoughSetsTheoryandDatabaseOperationstoConstruct

aGoodEnsembleofClassiﬁersforDataMiningApplications.ICDM01.pp233-240,2001.

30.JenkinsR.andYuhas,B.P.Asimpliﬁedneuralnetworksolutionthrough

problemdecomposition:ThecaseofTruckbacker-upper,IEEETransactionsonNeuralNetworks4(4):718-722,1993.

31.JohansenT.A.andFossB.A.,Anarmaxmodelrepresentationforadaptive

controlbasedonlocalmodel-Modeling,IdentiﬁcationandControl,13(1):25-39,1992.

32.Jordan,M.I.,andJacobs,R.A.,Hierarchicalmixturesofexpertsandthe

EMalgorithm.NeuralComputation,6,181-214,1994.

33.Kargupta,H.andChanP.,eds,AdvancesinDistributedandParallelKnowl-edgeDiscovery,pp.185-210,AAAI/MITPress,2000.

34.KohaviR.andJohnG.,TheWrapperApproach,InFeatureExtraction,Con-structionandSelection:ADataMiningPerspective,H.LiuandH.Motoda(eds.),KluwerAcademicPublishers,1998.

35.KohaviR.,BeckerB.,andSommerﬁeldD.,ImprovingsimpleBayes.InPro-ceedingsoftheEuropeanConferenceonMachineLearning,1997.

36.Kononenko,I.,ComparisonofinductiveandNaiveBayeslearningapproaches

toautomaticknowledgeacquisition.InB.Wielinga(Ed.),CurrentTrendsinKnowledgeAcquisition,Amsterdam,TheNetherlandsIOSPress,1990.

37.Kusiak,E.Szczerbicki,andK.Park,ANovelApproachtoDecomposition

ofDesignSpeciﬁcationsandSearchforSolutions,InternationalJournalofProductionResearch,29(7):1391-1406,1991.

38.Kusiak,A.,“DecompositioninDataMining:AnIndustrialCaseStudy”,

IEEETransactionsonElectronicsPackagingManufacturing,Vol.23,No.4,pp.345-353,2000.

39.LiaoY.,andMoodyJ.,ConstructingHeterogeneousCommitteesviaInput

FeatureGrouping,inAdvancesinNeuralInformationProcessingSystems,Vol.12,S.A.Solla,T.K.LeenandK.-R.Muller(eds.),MITPress,2000.

40.LongC.,Bi-DecompositionofFunctionSetsUsingMulti-ValuedLogic,Eng.

Doc.Dissertation,TechnischenUniversitatBergakademieFreiberg2003.41.LuB.L.,ItoM.,TaskDecompositionandModuleCombinationBasedon

ClassRelations:AModularNeuralNetworkforPatternClassiﬁcation,IEEETrans.onNeuralNetworks,10(5):1244-1256,1999.

42.MaimonO.andRokachL.,“DataMiningbyAttributeDecompositionwith

semiconductorsmanufacturingcasestudy”,In“DataMiningforDesignandManufacturing:MethodsandApplications”,KluwerAcademicPublishers,pp.311-336,2001.

43.MaimonO.andRokachL.,“DecompositionMethodologyforKnowledgeDis-coveryandDataMining:TheoryandApplications”,WorldScientiﬁc,2005.44.Merz,C.J.andMurphy.P.M.,UCIRepositoryofmachinelearningdatabases.

Irvine,CA:UniversityofCalifornia,DepartmentofInformationandCom-puterScience,1998.

45.Michie,D.,Problemdecompositionandthelearningofskills,inProceedings

oftheEuropeanConferenceonMachineLearning,pp.17-31,Springer-Verlag,1995.

46.NowlanS.J.,andHintonG.E.Evaluationofadaptivemixturesofcompeting

experts.InAdvancesinNeuralInformationProcessingSystems,R.P.Lipp-mann,J.E.Moody,andD.S.Touretzky,Eds.,vol.3,pp.774-780,MorganKaufmannPublishersInc.,1991.

26LiorRokach

47.Ohno-Machado,L.,andMusen,M.A.Modularneuralnetworksformedical

prognosis:Quantifyingthebeneﬁtsofcombiningneuralnetworksforsurvivalprediction.ConnectionScience9,1,1997,71-86.

48.Peng,F.andJacobsR.A.,andTannerM.A.,BayesianInferenceinMixtures-of-ExpertsandHierarchicalMixtures-of-ExpertsModelsWithanApplicationtoSpeechRecognition,JournaloftheAmericanStatisticalAssociation,1995.49.Pratt,L.Y.,Mostow,J.,andKammC.A.,DirectTransferofLearnedIn-formationAmongNeuralNetworks,in:ProceedingsoftheNinthNationalConferenceonArtiﬁcialIntelligence,Anaheim,CA,584-5,1991.

50.Prodromidis,A.L.,Stolfo,S.J.andChan,P.K.,Eﬀectiveandeﬃcientprun-ingofmetaclassiﬁersinadistributeddataminingsystem.TechnicalreportCUCS-017-99,ColumbiaUniv.,1999.

51.Provost,F.J.andKolluri,V.,ASurveyofMethodsforScalingUpInduc-tiveLearningAlgorithms,Proc.3rdInternationalConferenceonKnowledgeDiscoveryandDataMining,1997.

52.Quinlan,J.R.,C4.5:ProgramsforMachineLearning,MorganKaufmann,

LosAltos,1993.

53.Rahman,A.F.R.,andFairhurst,M.C.Anewhybridapproachincombin-ingmultipleexpertstorecognizehandwrittennumerals.PatternRecognitionLetters,18:781-790,1997.

54.Ramamurti,V.,andGhosh,J.,StructurallyAdaptiveModularNetworksfor

Non-StationaryEnvironments,IEEETransactionsonNeuralNetworks,10(1):152-160,1999.

55.R’enyiA.,ProbabilityTheory,North-Holland,Amsterdam,1970.

56.Ridgeway,G.,Madigan,D.,Richardson,T.andO’Kane,J.,Interpretable

BoostedNaiveBayesClassiﬁcation,ProceedingsoftheFourthInternationalConferenceonKnowledgeDiscoveryandDataMining,pp101-104,1998.57.RokachL.,MaimonO.,“DecompositionMethodologyforClassiﬁcation

Tasks”,ProceedingsoftheIEEEInternationalConferenceonGranularCom-puting,Beijing,July2005,IEEEComputerSocietyPress,ISBN:0-7803-9017-2,pp636-1.

58.RokachL.andMaimonO.,“Featuresetdecompositionfordecisiontrees”,

IntelligentDataAnalysis,6(2):1-28,2005.

59.RokachL.,MaimonO.,AradO.,“ImprovingSupervisedLearningbySam-pleDecomposition”,InternationalJournalofComputationalIntelligenceandApplications,5(1):37-54,2005.

60.RokachL.,MaimonO.andLaviI.,“SpaceDecompositionInDataMining:

AClusteringApproach”,Proceedingsofthe14thInternationalSymposiumOnMethodologiesForIntelligentSystems,Maebashi,Japan,LectureNotesinComputerScience,Springer-Verlag,2003,pp.24–31.

61.Ronco,E.,Gollee,H.,andGawthrop,P.J.,Modularneuralnetworkandself-decomposition.CSCResearchReportCSC-96012,CentreforSystemsandControl,UniversityofGlasgow,1996.

62.Saaty,X.,Theanalytichierarchyprocess:A1993overview.CentralEuropean

JournalforOperationsResearchandEconomics,Vol.2,No.2,p.119-137,1993.

63.Samuel,A.,SomestudiesinmachinelearningusingthegameofcheckersII:

Recentprogress.IBMJ.Res.Develop.,11:601-617,1967.

.Schaﬀer,C.,Selectingaclassiﬁcationmethodbycross-validation.Machine

Learning13(1):135-143,1993.

TitleSuppressedDuetoExcessiveLength27

65.Sharkey,A.,Oncombiningartiﬁcialneuralnets,ConnectionScience,Vol.8,

pp.299-313,1996.

66.Sharkey,A.,Multi-NetIystems,InSharkeyA.(Ed.)CombiningArtiﬁ-cialNeuralNetworks:EnsembleandModularMulti-NetSystems.pp.1-30,Springer

-Verlag,1999.

67.TsallisC.,PossibleGeneralizationofBoltzmann-GibbsStatistics,J.

Stat.Phys.,52,479-487,1988.

68.Tumer,K.,andGhoshJ.,LinearandOrderStatisticsCombinersforPattern

Classiﬁcation,inCombiningArticialNeuralNets,A.Sharkey(Ed.),pp.127-162,Springer-Verlag,1999.

69.VilaltaR.,Giraud–CarrierC.,BrazdilP.,“Meta-Learning”,inO.Maimon

andL.Rokach(Eds.),HandbookofDataMiningandKnowledgeDiscoveryinDatabases,pp.731-748,Springer,2005.

70.Weigend,A.S.,Mangeas,M.,andSrivastava,A.N.Nonlineargatedexperts

fortime-series-discoveringregimesandavoidingoverﬁtting.InternationalJournalofNeuralSystems6(5):373-399,1995.

71.Zaki,M.J.,HoC.T.,Eds.,Large-ScaleParallelDataMining.NewYork:

Springer-Verlag,2000.

72.Zaki,M.J.,HoC.T.,andAgrawal,R.,Scalableparallelclassiﬁcationfor

DataMiningonshared-memorymultiprocessors,inProc.IEEEInt.Conf.DataEng.,Sydney,Australia,WKDD99,pp.198–205,1999.

73.Zupan,B.,Bohanec,M.,DemsarJ.,andBratko,I.,Featuretransformation

byfunctiondecomposition,IEEEintelligentsystems&theirapplications,13:38-43,1998.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文