sewices.exe微信是什么么?

Beijing LEEO Consulting Service Company
Introduction
Established in early 2005,Beijing LEEO is a certified business registration agency and registered consulting service Co., Ltd in China. Familiar with relevant Chinese laws, regulations and operations, we can bring you an effective and speedy service. We are dedicated to provide comprehensive and professional services and constructive suggestions for foreigners, who desire to do business, work, invest or reside in China.
Our one-stop services include Chinese company
registration service, Chinese visa service, China driver license application, Tibet Travel permit (TTP) application, investment consulting, home-stay service and personal assistant service and etc.
Beijing LEEO Consulting Service Co., Ltd. is well recognized by its numerous clients for its professional, speedy services and useful information and suggestions provided.
Address: Room 1503, Aviation Mansion, No. 2 Jia, Xidawang Road, Chaoyang District, 100026 Beijing, P.R.China (500 meters north of Dawanglu Subway Station)
A China visa is a permit issued by Chinese visa authorities (Chinese Embassies, Chinese Consulates, Chinese visa offices or Public Security Bureau) to non-Chinese citizens for entry into and transit through China mainland (Hong Kong SAR and Macau SAR have different visa systems). Citizens of most countries are required to obtain Chinese visas before entering China mainland. There are eight categories of ordinary China visas, which are respectively marked with the letters C, D, F , G, J-1, J-2, L , X and Z .
Before you enter into China mainland, you can apply for a China visa in the Chinese Embassy in your country or where you stay. When you are already in China mainland, if you want to stay longer, you don't need to exit from China mainland to apply for a new China visa. Instead, you can have your visa extended or converted through a visa agent. Beijing LEEO can help you do that.
The Chinese visas issued by Chinese Embassies (Consulates or visa offices) have
limitations of each stay (e.g. 030, 060, 090, 120, or 180 days). Make sure you yourself ENTER BEFORE the date mentioned on the Chinese visa. And don't overstay your visa since the limitation of each stay is calculated right from the entry date (the red oval stamp you get on your passport when you enter into China through customs).
The Chinese visas which Beijing LEEO help you get are issued by Public Security Bureau. There are NO limitations of each stay. Thus if it is a 2-entry China visa, it means during the visa valid period, you can exit and re-enter twice. You are not forced to exit and re-enter. If you wish, you can use neither of the entries. Although HK and Macau are part of China, if you enter into China mainland from HK or Macau, 1 entry will also be used up.
Hot Services
Provider:Beijing LEEO Consulting Service Company
Provider:Beijing LEEO Consulting Service Company
Provider:Beijing LEEO Consulting Service Company
Provider:Beijing LEEO Consulting Service Company
Provider:Beijing LEEO Consulting Service Company
Related Information
Recommended Products & Services
Price : $1000 ~ $1800 Star:
Enterprises with 4 Stars or Above in China Service Mall
Price : $1250 ~ $2640 Star:
Enterprises with 4 Stars or Above in China Service Mall
Price : $950 ~ $1500 Star:
Enterprises with 4 Stars or Above in China Service Mall
Price : $1250 ~ $2640 Star:
Enterprises with 4 Stars or Above in China Service Mall
Price : $600 ~ $1100 Star:
Enterprises with 4 Stars or Above in China Service Mall
Service Reservation
Friend links文档分类:
下载前请先预览,预览内容跟原文是一样的,在线预览图片经过高度压缩,下载原文更清晰。
您的浏览器不支持进度条
淘豆网网友近日为您收集整理了关于《A Hadoop Performance Prediction Model Based on Random Forest》的文档,希望对您的工作和学习有所帮助。以下是文档介绍:A Hadoop Per颤啪咖ance P弛dic60n Modd Based on Random Forestzhendong Bei,Zhibin Yu,Huiling zhang,Chengzhong Xu,Shenzhong Feng,Zhenjiang Dong,and Hengsheng zhangDoI:lo.396明.i咖.1673_51黯.2013.02.晰http:f~哪w肋Hm州1【cm枷etail/3峨1294.1N.加13啪1.1524.∞1.h咖I,pu埘shed oIlliIIe Jllly l,2013A Hadoop Performance Prediction Model Based onRandom Forestzhendong Bei 1,zhibin Yu 1,HIIiling zhang 1,chengzhong xu“,shemmong Feng 1,zhenjiang D咖93,and Hengsheng zhan93(1.She肥hen IIlstitlltes ofAdv粕ced Technology,Chinese Academy ofSciences,Shenzhen .Wayne State University,Detroit,Michigan 48202,USA;3.puting aJld r11 Illstitute ofZ7rE Co平oration,Nanjing 210012,C)AbstractMapReduce is apIo伊删ng model for processing large data sets,and Hadoop is the most popular open—source implementation ofM印Reduce.To a出eve high pe而珊粕ce,up to 190 Had00p co曲guration pammeters mustbe manually tunned.nis is notordytime—consuming but also enD卜pron.In this paper,we pmpose a new ped-oml帅ce model b踮ed on random forest,a recently devel-oped machine—le锄ing algorithm.The model,called RFMS,is used to predict the pedl0珊ance of aHado叩system according to thesystem’s co—igumtion parameters.RFMS is created f而m 2000 distinct fine—grained perfb珊ance obsen,ations with diff.erent Hado叩configurations.We test RFMS against the measured perfo珊ance of representative workloads f如m the Had00p Micr0一benchmarksuite.7111e results show thatthe predic“on accumcy of RFMS achieves 95%on average and up to 99%.7111is new,h唔lIly accumte pre-diction model can be used to automatically optimize the ped-omlance of Hadoop systems.KMapRHmicro—benchmark1 IntroductionThe M印Reduce pmgramming model is widely usedinbig—data applications because it issimple to pro-gram and can handle large data sets.A popularopen—source implementation of MapReduce isApache Hadoop,which has been used for web indexing【1】,ma·chine leaming【2】,log file analysis【3],6nancial analysis【4】,and bioinfbnnatics research『51.With Hadoop,a pmgrammer needs to manually tune up to190 parameters to ensurehigh system ped’o硼ance.However,without in—depth knowledge of the Hadoop system,the pro一舯mmer mav find such a task tedious and may even seriouslyde擘删e svstem ped.0mance.This issue has been coIlfi珊ed bymany researchers『6卜『91.It is therefbre desirable to automatically tune the co甜igIlra—tion pa“Ⅱneters.To this end,a peIf0珊ance prediction modelbased on historicaLl observation is required.The key to impmv—ing perfb珊ance is to use a highly accurate model with low Iun—time overhead.】Ⅵanv researchers have tried to constIuct such amodel.In『101,a set of cost—based mathematical functions isused to estimate the fine—grained mn time of phases within themap and to reduce tasks when ajob is executed.In[6],acourse一卿ined SVM regression model is used to estimate pletion time of jobs that belong to a cluster.This estima—tion depends on the allocation of resources,p啪meter settings,and size of input data.However,these models are not accurate enough because oftheir assumptions about cluster node homogeneity andover—simpli6cations.For example,the local and remote CPUand I/O costs during each phase of executing aMapReduce jobdi矗brs according to the Hadoop parameter settings,eVen in ho-mogenousnodes in a cluster.If we define operational cost asmicroseconds per megabyte,V撕ations inp眦meter settingsgive rise to v撕ations in operational cost(Fig.1).Also,plicated multiphase pmcessing.As well as con—taining m印and reduce phases,a M印Reduce worl沮ow alsocontains fine—gmined phases such as read, coUect, spiU,merge, sh衄e, sort,and write. Each phase peIf0瑚s finerl乒ained opemtions with di珏brent costs.For example,a spillphase pression,and涮ting.AU tlleabove factors may be intenwined to contest put.ing resources,such as CPU and I,O.In such a context,if amodel is overly simplistic,it is definitely in accurate.We propose a model to predict the perfb瑚ance of a Hadoopsvstem.This model is based on fine一舯ined operations and us—es a random forest a190rithm.Random forest is arecently de.veloped non—p锄metric regression model.Unlike traditionalregression models,r帅dom bines tI.ee predictions sothat each tree deDends on the values of a random vector sam.万方数据J、■o_ ■SDeclal l op|c孤×一25I∞享2()【1三 15(~乏I【l(.J剐A Hado叩Perfb珊柚∞Predicti蚰Modd B嬲ed佃R粕dom For姻tzhendong Bei,Z№in Yu,Huiling Zhang,Chengzhong Xu,SheIlzhong Feng,zhenjiang Dong,and Hengsheng Zh柚g3259“luf.s“Jn(al MERGfl()()(1 2(M)【】 3(X)o 4()(X)325 gn)ups(·onnguraIil'fl st·lli”gs)(:^壁蔓z!=====!======J——=!=!!芒啪 2【KXJ 3【XX) 4(W_4325灯oups conngLJraIion s‘llljIlgsklⅦ‘RITE l|0(:AI.I()▲n即re 1.Cost ofnp懈蛐ta6Ve opem60璐访m 4325咖dompa瑚meter seuin擎.pled independendy.This random vector has tIle same distribu.tion for all trees in the forest【11】.We use random forest fortwo reasons:1)it does not make linear assumptions betweenpammeters(in our case,t}Ie con靠guration parameters),and 2)it can deal诵th ala略e number of co曲guration parameters,even when there Dlex interactions.In ourproposal,a ligh卅eight ped_o珊ance pmfiler capturesthe time taken to execute tasks and t王le size of the data Dro—cessed in each phase.We constmct a fine一舯ined model fbrpredicting the peIf0瑚ance of aHadoop system.This model isused to accurately estimate phase—level ped'o珊ance underv耐ous workloads and without assuming cluster nodes are ho.mogenous.We evaluate RFMS under di&rent workloads gen—erated by Hadoop Micro—benchmark suite.The results showthat RFMS provides prediction accuracy of 95%on averageandupto 99%.In section 2,we describe related work.In section 3.we intro.duce our approach based on random forest.In section 4,we de—scribe ourexperimental methodology.In section 5,we pmvideexperimental results and describe some applications of ourap·pmach.Section 6 concludes tlle paper.2 Related Work2.1 A衄lyzin2 MapReduce Perfbm锄ceAnaJyzjng the perfb肋ance of a MapReduce dist—buted sys.tem ischallenging because asystem prises sev.eral tIlousand of programs mnning on thousands of machines.IJow—level per|’oImance details are hidden f而m useI_s bv usingahigh level datanow model阱Several印proaches have beentaken to understand u8e卜defined workload behavior in a Ma—pReduce system【12卜【14】.With these appI.oaches,infb瑚a-tion collected f}om previous iob execution logs is used to iden一粪醺f赢—_~——l黧嘲‘。,}ffor hotspot detectiontif.y peI{bmance bottle.necks. For example, Ha—doop job history files areoften used to analyze per—fb珊ance bottlenecks in aHadoop svstem.Log 6les are often notexposed todevelopers,andthis means that system per—fbnnance often not oDti.mal.In[9】,logs are leve卜aged through automatic loganalysis to improve ped'or-mance.Such an approachcan be used£0 show thedatanow breakdown of themap/reduce phases for v廿ious iobs, but human in.volvement is stiⅡ neededPed.0rmance aIlalysis based on datanow appears to be suit.able fbr Hhowever,the fbcus is on monitoring the data.ndw process,and the user needs to identif.y possible bottle.necks in aHadoop cluster.Furthe瑚ore,a datanow approachcannot be used to evaluate ped.0珊ance when con|jguration pa-mmeters are mdomlv set.Our model is based on worknowanaIysis and uses a mdom forest to predict tIle mnning timeof each ph鹊e.This is aprecise model that can be used toguide the setting of coⅢiguration pammeters.2-2 Automatic CoIlfigllr蚰onV鲥ous approaches have been taken to automatically co曲g.ure distributed data processing systems 【15]一[18】.Thepay—pe卜use utility model of puting creates new op—ponunities to deploy the MapReduce fhmework. However,choosing which configumtion parameter settings诮ll result inhigh perfb玎Tlance plex【19】.Automatic approaches tosetting co耐jguration parameters inHado叩systems have there—fbre been the fbcus of attention recentlv.The first model for predicting the ped'o瑚ance of aHadoopsystem with automatically set parameters is described in【10】.The model describes the fine—grained datanow and cost forphases within map and reduces tasks[10】.However'it as.sumes that the costs for CPU and I/O in each phase of execut.ing aM印Reduce job are the same for all nodes in a cluster.Inpractice,this assumption is not t11Je(Fig.1).AROMA[6】uses aped'o珊ance model based on supportvector machine(SVM)to integrate aspects of resource pmvi—sioning and auto—uration fbr Hadoop iobs.Based on allo—cated resources,coⅢjguration parameters,and t}le size of in.put data,AROMA can estimate pletion time of iobs be—longing to acluster.However'uIllike 6ne一灯ained cost estima.tion models,it cannot quantitatively analyze the data processes㈣州删砌删锄州ⅫⅫ…H_HHNHHHMHⅢ吼蚶W阿瓠“N丑¨rlI■P●rlpl■lL,L拦XXXKWKMK●S7 65串一2iI}l至1pulIs‘;’:一万方数据A Hado叩Pe响珊柚ce Predicti蚰ModeI Based蚰Rand哪For船tZhendong Bei,Zhibin Yu,Huiling Zhang,Chengzhong Xu,Shenzhong Feng,Zhenjiang Dong,and Hengsheng Zhangof each phase of a distdbuted systemdatanow.3 Perf.omance MOdel Based on RandomIn this section,we describe how to capture the execution fba—tures of a job.We then use these features to constllJct a Ha.doop ped_0肿ance model based on a random forest.3.1 Job Cha阳cte一豫ti佃RFMS uses dvnamic tools to collect mn—time monitoring in—fonIlation without modifying MapReduce workloads on H孙doop.One such tool is BTrace——a saf毛dynamic tracingtoolthat runs on the Java pladb珊and captures the execution fea—tures of a MapReduce job【20】.The execution of a MapReduce iob can be broken into themap and reduce stages.The map stage can be fIlrther dividedinto I.eading,map processing,bufkr data coUecting,spilling,and mell舀ng phases.Similarly,the reduce stagecan be diVidedinto shuming,soning,reduce processing,and writing phases·Eaeh phase is part of the overall execution of the job in Ha—doop.When Hadoop mns a MapReduce job,BTrace tmces speci—fied Java classes to generate a task feature file.A feature file isa detailed representation of the task execution that captures in—fomation at the phase level.The feature 6le generally logs exe—cution time,input data size,and output data size.However,theshufne phase of the reduce stage requires special attention be—cause it contains multiple operations,such work transfbr—ring and me增ing.To simplify the opemtions in the phase andbetter analvze the result,BTmce oIlly records the timing —work transfbrring,and the timing of me唱ing is added to thenext phase,called sort.This phase only has merging opera—tions.When a iob is 6nished,RFMS coUects the feature filesof all tasks and Droduces a statistical result of the three charac—te—stics of each phase of ajob.3.2 Bundillg a Perfo】rma眦e ModelAs described in【l 0],the perfbnTlance model f.or aMapRe‘duce job can be giVen asⅣ=,(p,d,,,c)where F is the ped.0mance estimation model for aMapReducejob(p,d,,,c)that mns program p on input data d and usescluster resources,and coⅢigumtion parameter settlngs DBecause a MapReduce prises nine phases,each of which is denoted f砺,the ped.omance model for awhole MapReduce iob can be西ven by9P,=∑mases(p,ds,,,cs)where.F:f确asl%is the peIfomance model f10r each phase,and dis the size of the data processed in.f踢.The size of the ini—tial input data in the map and reduce stage dete瑚ines the sizeof吐.The parameter settings related to J场asB aregiVen by G.The perfo珊ance model for each phase can be estimated byFPhase。=FPe—raskPhases x numlIo嘲WaveswherenumTotajWaves=I numTasks f totalTaskslot^andtotall缸kslots=numNodes x numTaskSiotPe州ode(3)(5)In(3),即纠‰七助as岛is the pe而mance of鼢丑s岛when ex’ecuting asingle mapor reduce task,and J,啪乃脚I矿rayes istlle total number of task execution waves.In似1,fD脚7'as批fs is the sum of the task slots n姗7’as七一舶嘏刊Vb如allocated for map and reduce tasks in each nodeof a cluster.The total number of map or reduce tasks is力朐一7如七s.In the reduce stage,力umZhsb is aconfigurable par帅e—ter.In the map stage,the co娟gurability of舢脚乃sb dependson the size of input data d The number of map tasks is givenbvmapNumTasks=tot丑lData&ze|splitDataS】zewhere fD招ZDa招5:沥is the size of input data吐and囝埘t伪招一S沈is the size ofthe input split of each m印task.口he def如his 64 MB if pressed.)In(5),月口m^rDd舀s is the total number of nodes in a cluster.Function f31 aHows us to estimate Ffbr7k七f砺as岛;thus,itis necessary to leam刀phase perfomance models for a work—load.These models aure used to analvze the wod【flow so that wecan use(2)to evaluate overall perfbmance.More importantly’by building aperfbmance model for月per.Z'as蠡纠Iase’we can estimate J¥when the size of the input datais small.In tum,we canpredict the workload peIf0Imancewhen the size of the input data is lar{!-er by calculating n脚7’o-脚W.,aI,es in the map and reduce stage.In f211,砰'er豫s橱basB is calculated using a set of func—tions based on a constant cost assumption(a cost—based mod—en.It is assumed that the 10cal and remote CPU and I,O costsper phase in executing aM印Reduce job areequal acmss allthe nodes in a cluster with the same hardware resource.Howev—er,varying the coTlfigu眦ion parameters may Vary these costsand provethis assumption false.RFMS addresses this problem by using a machine le帅ingtechniaue to constmct丹研死s七黝丑s岛models for di艉rentDhases.RFMS uses the mndom forest(RF)regression modeltoestimate冈%rZl丑s触踢asB of a workload with varying coIlfi昏腿一tion pammeters.RF methodology is precise when there are re·l!,.ession problems and peIfo珊s consistently weU[1 l】.Applica—tions that use RFs demonstrate tllat RF is one of the best ofavailable methodologies for modeling the ped’o瑚ance -Dlex Hadoop workloads in the cloud environment【22】.RFMS万方数据A Hadoop Perfb咖ance Prediction Model Based on Random Forestzhendong Bei,zbjbin Yu,Huiling zhang,chengzhong xu,Shenzhong Feng,Zhenjiang Dong,and Hengsheng z}IaIlgcan produee importance measures for each v耐able [23].These measures indicate which variables have the strongest ef-fect on the dependent va矗ables being investigated.RFMS cancapture the time taken to execute the phases of Hadoop work-loads with val啊ng coⅢiguration parameters.The ten co耐i目Jra—tion p锄meters that areimportant to oVerall perfoHnance in aHadoop system are listed in Table 1.These parameters areused as feature candidates to train RFMS.For西ven input da-ta.the size of the given by£愿嘲doesthe size of the split datapressed split data for each m印task,not change in the map stage.Thus,is not used to tmin RFMS.However,the input data(shume bytes)f.or eachcording to the mapred.reduce.tasksreduce task changes ac—par砌eter.FurtheHIlore,the size of the shume bytes significantly afEbcts the executiontime of a reduce task.Therefore,the shume b)rtes for each re—duce task鼬H胁跏功幽乃出should be used as a featurecandidate for the reduce stage.5由u趣抬墨y玉渤c矗7'且s七is giVenby鼬幽助幽曲础=型婴筹寒旒产(7)where5ekd咄yM。。。=&胁蜥。,×&拈cd哟陀。mm×5e‰矗曲陀。。,,(8)ands出c矗啦蜘,州。。=l∑、f=loperationSelectivity伪乳ski、|nSelectivitv can be defined as the sta时stical Hltio of the out—put size to input size fbr a stage oroperation in a workload.In(7),&如cⅡ相如。g。is the map stage selectiV畸.Because bine,press operations reduce the input data size,&kd咄脚。ge can be calculated using(8).In(8),&拈c打吐PM。P,.5:幽c蹦哦№。。bIne,and&以∞fj订妣。。P—are operation selectiV—itv.These three selectivities are dete珊ined by related opera—tion,and they can be calculated using(9).In(9),(功e_ra五如5:B-上ec矗订f,,(V了1a呔is the ratio of output size to input size of an op—emtion in a task,and n is the number 0f m印tasks in a givenworkload.I[)pe_豫£函nSe,ec矗嘶(习17'as七can be calculated usingthe data captured by our pmfiler in a feature file.press opemtions areoptional,and t}le value of驯台c疗卜矗托。。b№and妣c矗访耽邮pre。。is set at 1 if the user does notspecif-y these operations.An胛can be de6ned as a coUection of tIee—structured pre—dictors『111:RF={乜(X∞,丘=1….,N。。l (10)where缸is the女th individual tree,乜(.)is the tree’s predic—tion,andⅣ。。is the number of trees.The samples of the totaltraining set aregiven by X={地,魄),where J|}=1….,旭。。山。.The predictor fbatures aregiven by,反,and the ph鹊e time isgiven by.P缸.The total trainingset is divided into two indepen—dent subsets:one to train the Dredictor血and t}Ie other to testrablc 1.ConfIguration parameters selected fbr testingCI小ngumti()n f'ammetersio.舯n.fhctor)cl荀0.66faI献l()0(0.8 0 7 0.05l小)falsf1-(·sI RHIlm10一100O.2一O.9lruPllr rak10—1000l一100 0 0—0.8 0.01—0.5:).25—0.6514 laskMPmm4Pl_f-d.(’11ild.jaVa.opl8nle orfhlsethe predictor’s accumcy.In(10),良are random variables.Thenature and dimensionality of pt depends on randomness in theconstmction ofⅣtrees.This mndomness may be caused bv therandom selection of J7、7:m。training records drawn f而m X with re—placement.It may also be caused by脚竹,the random numberof difI’erent fbatures tried at each sDlit of a tree.In RFg.ession,it is dimcult to define the predictor fea—tures fb tllat allow the base predictors to be tmined to accu—ratelv predict the tar#ret on the out—of-bag data.When select—ing important features,importance is assessed by replacingeach fbature witll舢dom noise and obsenring tIle increase inthe mean squared error(MSE)for the out—of-bag vabdation.The features are then sorted by relative imponance,and an im—Dortant subset of f色atures can be abstmcted.The RF is itera—tivelv retrained,and each time,the least important features a脂removed.In practice,Ⅳlree。and M时are used to tune the RFmodel and minimize the MSE.Lc'wer MSE can make the modelbetter fit the training da上a.ⅣIre。。and^,lry are tuned usingscripts that iteHltively change each parameter one—by—one andregenerate the regression model.The optimum value of M呻ranges f而m 6 to 10,and tlle optimal value ofⅣl。。ranges f而m500to 1000.We use a stepwise Ie}iression model for the data sets collect—ed f而m our test prising Hadoop nodes.In section 5,we discuss the accuracy of the prediction a昱:ainst the trainingdata and provide out—of-bag accuracy sta上istics.4 Experiment MethOdolOgyThe experiments aIe perfb瑚ed on a test pIising tenSugon sen,ers and a gigabit work.Each server hasa quad—core Intel(R)Xeon(R)CPU E5—2407 0 at 2.20 GHzand 32 GB PC3 memory.The cluster is virtualized by Xen 3.0.We create a pool 0f virtual machines(VMs)f而m the virtual.ized sendce cluster.Each has ei小t virtual CPUs and 8 GBmemory.We then run the VMs as Hadoop nodes.Each VM us—es SUSE Linux Enterprise Senrer l l and Hadoop 1.0.4.We designate one senrer to host the master VM node and~肌…噼一拿jq¨,墨_畔…州…aaaU万方数据A H鲥h'p Perfoman∞PIHlic廿蚰M0del B躯ed on R粕d锄Fb嘲tZhendong Bei,Zhjbin Yu,Huiling Z}lang,Chengzhong Xu,Shenzhong Feng,Zhenjiallg Dong,柚d Hengsheng zhaIlguse the remaining seⅣers to host the nine slave V】Ⅵnodes.The master node Iuns the Job7I'racker and the N锄eNode.Each slave node mns both the Task7I’mcker and the DataNode.Each VM is initially coIlfigured with four map slots,fbur re.duce slots,and 300 MB memory pertask.The data b10ck is setat 64 MB.The RFMS ponent needs to mn on eachslave VM node so that BTrace can caDtuI℃tlle task executiontimes.ponents of RFMS caIl mn on a separate VMTabIe 2.The six types of configuration parameter settingsParameter3or st卸dalone machine bec跏se RFMS processes tlle gat}Iered 呲¨,”。I&,‰。.Iasksfeatures omine.We mn representative Hadoop workloads,such鹊TeraSort,to test the precision of RFMS诵th 10 di船rent co血gurationp锄meters cI'able 1).We useHadoop benchmark to producedata of v耐ous sizes.5 Reslllts and Analysis5.1 The Constant-Cost Ass岫ptionWe test the constant—cost assumption p耐ng six costfbatures 0f TeraSon.Six di巧erent conn鲫rations were used(rra—ble 2).First,we randomly varied the config嘞tion pammetervalues up to 4325 times when 11lnning TeraSon.The p啪me.ters霉reneRlted within the test range aI_e listed in Table 1.To ob.tain cl_edible results,we use the profiler in『101 to captul-e ev.ery cost feature.We use the plots 0f tlle TeraSort benchmark toshow山e results.Then.we select six cost f.eatuI.es f而m a t0.talof 12 cost features.These six cost prise CPU fea.tures(REDUCE—CPU,PARTITION—CPU and MERGE—CPU)and I/O features (READ LOCAL IO,WRl7I.E LOCAL ⅣORKl.From 4325 configurations,we select siX tllathave typical cost distributions.Fig.2 shows six difkrent values for each cost feature whenconfiguration settings are v耐ed.(Note the log scale on thev—a】【is.)For example,REDUCE CPU ranges f而m 6610 to9318 ms,MB,and MERGE CPU ranges from 0.81 to 427 ms/MB.7rhese values have signi6canⅡycates that cost features are afkcted bymtion settings.Co以gumtion5l 2 3 3I 6l 75 0.29 O.24 0.6fiJse f砒se trIle825 720 717 1 97 79hs‘J吣pilLpercent 0.5 ().80 0.57r∞pred.job.8hume.illput.bu仃jr.percenl 0.22 O.26 O.89st●九.rct·lJrd.1JcrcPn【o.sort.IIl【.n8pmd.conlpress.nlap.oIltpu(】f洲96 0.12 0.03 120 108falsP tnIP山洲圳山憎MERGE—CPL llE^D.Um^I。m硼吒一10C^I.The six types uf conn#▲H印re 2.C鸺ts for s扭typ蚂of p啪eter删n擎璐illg Tera∞rtbenchmark.where d铀—醌舷e is total shuffle size f211.In Fig.3,Ⅳe抑。矗一乃a力s国死me is si鼬ificantly胡fected by inconstantⅣe抑n擅.cDsf.We set d劬删田bJ51l:纪at a constaJlt 2040 MB.The totalshume size is geneHlted by a map phase with 10 GB input da—ta.The cost of work tIansfer is csⅣen¨Df如osf.Using(1 1),we can predict^bfwn出7b力s细7h前e when thechanged,and this indi- collfiguration p眦meters are V撕ed.We set csⅣe似C-Dsf atv耐ations in the co血gu- a constant 8. m“MB obsenred with default co曲gIl·Although we do not show the cost fbatures of otller work.10ads f而m Hadoop benchmark,f而m our obsenrations,thesefeatures have similar pmperties to each other.Therefore,webelieve the constant—cost assumption is f砒se f.or the 12 opera—tions in a MapReduce worknow.Furthe珊ore.we detenTIine whether the inconstant cost val—ue甜kcts the accuracy of ped'o瑚ance prediction.Fig.3 showst}le Ne卅orkTransferTime dist曲ution against the cost .WORK when№nsferTing 2040 MB data.rI’llis distribution isderived f而m actllal measurement.,V-e“m放7'.卸妇7锄e is thetime taken t0 works in the shume phase aIId is giv—enbyNe啪rk tmsfer time=dShumeSize x wDrkCDstHltion par哪eters.In Fig.4,predictedⅣefwn接Zh如s矗玎7,如e isplotted against the real measurement of胁抑n政7bns如r死珊e.7rhe predictions shown in Fig.4 aI.epoor,which suggests thatthe constant—cost assumptionis false.5.2 Accuracv 0fRFMSFig.5 shows phase timing predicted using RFMS against themeasured phase timing for a 7reraSort workload衍th 10G inputdata.The phase timing is measured in 100 groups of唧eri-ments.In Fig.5,ph嬲e timing predicted using RFMS is similar tothe measured phase timing for a MapReduce wodmow.Ourprediction model is reasonably accurate.Although tlle predic.tions for read timing are not particuldy accumte,tIle缸kctedperfom锄ce砌ge is oIlly be伽een 2000 arId 2500,which is万方数据special Topic-A Had∞p Perfb兀n柚艘Predic舶n Model Based蚰枷哪For郫tZhendong Bei,Zmbin Yu,Huiling Zh柚g,Chengzhong Xu,Shenzllong Feng,zhenji粕g Don禹柚d Hengsheng zh如g▲FigIl弛3.Ne抑ork t删疵r廿me for 2舛O MB ofdata w岫in伽staIItnd渐哪rk cost.10w.To quantitatively evaluate the accuracy of our RFMS mod—el,we use the relative errorE,百ven byE=(薯等笋)肠、,=I1t‘‘uo,Where Pref is the jth value predicted by RFMS,Re吐is tIle jthvalue actually measured,aIld妇is the total number of tests.Pre—cisely,we useaverage relative errorof 100 predictions to represent theDrediction erTor mte between eachreal and con.esponding predicted Val—ue,selected f而m different mnges ofphase timing distribution.The error mtes for the nine phasesthat we experimented with were cal—culated using E and aIe shown inthe 7Ik山le 3.The ermr mte variesf而m 0.566%to 7.169%.AU are lessthan 8%,and the average is 5%.This indicates that our RFMS pIedic—tions are close to the measuredphase timing.6 COnclusionIn this p印er,we haVe con6medthat varying the configuration paI乜m—eter values sigIli6candy a能cts theprecisionof a cost—based model.Therefore.we haveaphase—basedmance analysistuning Hadoopdesigned RFMS,workflow pe面r-model forparameterscused onnon—transparenmance tuning for V耐ousworImow processes.RFMS prediction isprecisely.We fo—t perfor-Hadoopsignificandy20(棚(n(删8(1.()0l6(1.()0l40.(删2U.(HX2()(JOI▲Fi鲥re 4.Predicted ne押ork tnInsf打血眦vS.珊哪村ed ne押orktI翟嘣br劬e.cost—based model.The avera盟e accumcy reaches 95%,whichis reasonably 900d.Besides being aceumte,our印pmach hasseveral other adv锄tages in pmctice.It h鹪锄impmved,light—weight workload in—deptll profiler tIlat coUects key t髂k execu—tion i—’oHnation f}om an u砌odified M印Reduc创rHadoop work—load.It also has a novel dataflow锄alvsis mechanism fbr Ha.better than that pmvided by a▲Fi羽m 5·11le best盯mM for predic曲g ph蛾劬ing·似拟洲蹦砌帕砌一∞E~譬皇lJ&墨妥一{oi芑z删一K锄呲。咧罴∽咖Ⅲ:㈣一姗壹|‰一㈣竺M发湖万方数据A Hadoop Pe响mance Pnmction Model B髂ed on Random For鹤tzhendong Bei,z11ibin Yu,Huinng zh锄g,Chengzhong Xu,Shenzhong Feng,zlIenjiallg Dong,锄d Hengsheng zh蛐gVTabk 3.Rda6ve En.0r of R研讧SPhase Name ReIative EⅡor f%Read 6.053mapcofIPrlspiun1P培oshum(sonrcductwrIIeAvg.4.865 0.566I.893 6.583 7.169 4.593 4.353 3.458 4.393doop.However.there are severalasDects of RFMS that need tobe improved.We would liketo fhrther inVestigate the po-tential fbr self_confi霉:uIation.We also need to reduce theRFMS overhead for onlineHadoop sentuning.』sf^伽¥n班删con护ud坞Indianapolis,IN,20lO,pp.137一142.【20】^D胂am如胁蛐m棚招庙n丁纠衙妇忸【0nline】.Availabk:kemi.c0IIl,proj-ects,btrace.『211 H.Herodotou,。Had00p Perfbmance Models,。Computer Science DepartmentDuke Univ.,Tech.Rep.May 2011.【22]T.LIlo,K.Kmmer'D.B.GoldgoC L.O.Hall,S.S枷son,A.Remsen,et a1.。。Rec唧izing pl明kton images f曲m the shadow image panicle pmfiling evalua·tion recorder,”衄EE n彻s.5瞪胁s,胁门,棚d Cr6eme如,R岍髓D击emef-慨v01.34 no.4,pp.,Aug.2004.『231 H.Ishw眦n,E.H.Blackstom,C.E.Polhier’and M.S.IJauer'。Relative riskforests for exerci8e heart mte他covery鹊a predictor of咖rtality'。知岍Tm,nr曲e A删疵船舶f括t删月嬲D幽如珥v01.99,no.467,pp.591—600,Sep.2004.Acknowled舯entManu8。ript”i”8挑y20'is work is partiaUy sup—ported by the cooperationproject of Re8earch on GreenCloud IDC Resource Scheduling with ZTE Co叩oration.【l】E.0Ien,R.Delbru,M.Cat船la,R.CygaⅡiak,H.Stenzhom,and G.Tu咖arello,。:a document oriented 100kup index for open linked data,’如^上朋如da慨5锄加鸲册d DhfnJ昭协,v01.3,no.1.,pp.37—52,Nov.2008.【2】胁^叫f一伽拍e S硪H,肿而埘d日ti叩p咖f由nme腭萨【0mine】.AVailable:hnp:仇ucene.印ache.o吲mahout[3】K.S.Beyer,V.Ercegovac,R.Gemulla,A.Balmin,M.Y.E1tabaldl,C.C.Kanne,F.拄∞锄,蚰d E.J.Shekita,。J8ql:a scri埘ng language for Iarge scale semistⅢc·tured data allalvsis。。JP¨吼v01.4,no.12,p.,2011.【4】Mao—Ping Wen,Hsio—Yi Lin,An—Pin Chen,and Chyan Yang,。An inte孕atedhome fin蚰cial investmentle姗ing envi瑚ment applying puting in so—work analvsis,。in如^c0“AS0_MM Kaohsiun‰2011,pp.75l一754.【5】B.L明gmead,M.C.Schatz,J.“n,M.Pop'and S.L Salzberg,。Searching forsNPs with pu【ing,。岛仃鲫e崩a如阱R134,Ocl.2009.『61P.La眦粕d Xiaobo Zhou,。AROMA:automated resource allocation粕d co曲异II-mtion of mapReduce environment in the cloud,。in,如c.5抽抽^Chd:AⅡ托啊om-Jc白目妒Ⅱ妇g口甜a San Jose,2012,pp.63—72.【7】H.Herodotou,H.“m,G.Luo,N.Borisov,L.Dong'F.B.Cetin,and S.Babu,。Starfish:a sentuning svstem for bi只data analytics,4 in nw:Cmf如no憎盯卵D茸f8跏fe脚月∞倒吨Asilomar,CA,J粕.9一12,20lI,pp.26l一272.【8】K.Kambada,A.P8Ih出¨d H.Pucha,“Towards叩timizing hadoop pmvisi∞一ing in the cl叫d,’in只口c.胁fc扬耐’0哽San Die印,CA.【9】Jinquan Dai,Jie Huan昏Shen铲heng HuaIlg,Bo Huang,and Yan bu,。HiThne:datallow_based perfbmance analvsis for b谊data c10ud,“in nm嬲删化’JJ,Sh∞ghai,201l,PP.7—7.【l o】H.Herodotou and S.Babu,“Pr06lirlg,what—if analysis,and cos卜b鹊ed optimi-zation of MapReduce pmg豫ms,4 in n帆仡观seattle,WA,Aug.29一Sep.3,,no.11,pp..【ll】L.Breim粕,。R蛐dom f0陀sIsI”胁曲面e翩m昏v01.45,no.1,pp.5—32,2001.【12】V.Bhat,S.Gogate,蛐d M.Bh卸dark甄‘Hadoop Vaidya,’Had00p World,2009.【13】Jiaqi Tan粕d Xinghao Pan,。Kahuna:pmblem diagIIosis for M印Re-duce—based puting enviroments,’in,h,2晴ZEEZ≯Z册^M)^IS2D,傀Osaka,pp.1 12一l 19.【14】肺e,订'衄e Pe而册口noe Ana(r掰r【0Illine】.AvailabIe:hIlp:,,so脚are.,en—u“intel—vnIne,【15】CuoJ-ej Jjallg and Hajfeng Chen,“Aulolu面ng Con69uratjons抽Djst—buted Sys-tems for Perfomance Improvements using Evo…onary Stmtegies,’in』∞岱’I口&Beijing,Jun.2008,pp.769—776.【16】Jia R∞,Ⅺ蚰韶i“g Bu,Cheng—Zh仰g xu,and Kun wang,。A dist曲utedself二learIIi醒approach for el鹊tic pmvisioIling of virtualized cloud reso眦es,”in A弛2们j上磁游J妯如L S肿p.枞5∞ZS Sin帮pore,pp.45—54.[17】Cheng-Zhong Xu,Jia R∞,and Xiangping Bu,“URL:a unmed Minforcementle删ing approach fbr autonomic cloud m粕a驽ement,’.,ou』l“耐。f.戤翟fI纠a,,d腑£曲u^甜c0岘pⅡ£咄voI.72,no.2,pp.95一105,Feb.2012.f18】Yanfei Guo,P.L绷a and Xiaobo Zhou,。Automated and agile senrer p眦metertuning witlI le锄ing粕d contml,4 in 2D伦艇E施由如^胆阱毪Sh粕ghai,2012,pp.656—667.【19】S.Babu,。T0wards automaIic oplimization of MapReduce progmmg,’in A∞.Zhend蚰g Bei is a PhD student puter application8 at SheIIzhen Institutes ofAdv粕ced Technology,China.He received his BS d。gree fbm the National Univer.sity 0f Defeme Technology,China,in 2006,and received}lis MS degree fbm Cen—trd South University,China,in 2009.His Iesearch interests include cloud eompul-ing,data minin岛machine le椰in昏and image pmcessing.ZMbjn Yu(zb.vu@siat.)received his PhD de肿e puter science fmmHu犯h帅g Unive璐ity 0f Science釉d Technolo科in 2008.He spent叩e year as a vis—iIinz schol盯in the Labom【ory puter Architectu他,University ofTexas at Aus.tin.He is cun_endy an鹊sociate pmfessor at the Shenzhen I璐titutes 0f AdvanceTechnolo盯.His陀search interests include microarchitecIure puterarchitecture,workload characterization and驽eneration, perfb皿卸ce evaluabon,mIlltico弛architecture,and virtualization technolo面es.In 2005,he won fi璐t prize inthe HUST Young kctu陀璐Teaching Contest,and in 2003,he won second prize inthe teaching quality鹊sessment 0f HUsT.He is a member of IEEE卸d ACM.HlliUng Zh蛐g received her MS degree in singal and info玎nation processing f如msouthwest university,China,jn 2011.She joined the Center for High—puting at Shellzhen lnstitutes of Advanced Technology and now works鹊a re·search鹊sist卸t there.Her cunent Iesearch inte他sts include higIl-puting,and machine leaming and its applications in bioi山珊atics.ch蛐g出佃g xu(czxu@_wayne.edu)舱ceived his BS degree and MSc deg陀e -puter science and en菇neering f如m Nanjing University in 1986 and 1989.He re-ceived his PhD de#,ee fbm the Unive璐ity of Hong Kong in 1993.His re8earch in—terests puter architecture,distributed syste咖,virtualization,and puting.Dr.Xu is a pmfessor of electrical puter engineering at WayneState Universitv.He is also the director of Ihe Cloud and puter kIbom.tory at Wayne Sta忙Unive璐ity.He is an IEEE senior member蚰d ACM member.Shen毋血ong Feng is ap∞fessor,deputy director of the Institute 0f -puting舭d Di西tal Engineerin昏SheIlzhen Institute of AdVanced 1khnology.His re-∞arch imerests arepamuel a190rithms,putin昏and bioi—豳atics.In par.ticular'he is fbcused ondeveloping novel,effective methods 0f modeUng digital cit-ies and applications.ing to SIAT,he worked in the Institute put-ing Tech∞logy,Chinese Academy of Science8,and panicipated in research∞theDawIling puter.He肿duated fbm the Unive璐ity of Science and Technolo.g),0f cllina in 199l and received his PhD肺m Be酊ing Institute 0f Technolqgy in1997.功e珂iang D佃g(dong.zhenjiang@)is the vice president of muni·cation Sewices R&D Institute for putjng and IT 0pemtion,ZTE C唧om-tion.He received his MS degree蠡rom Harbin InstiIute of 7rechnology in 1996.Hisre8earch intere8ts include puting,working,and mobile ne卜working.Hen擎h蚰g zhang(zha“g.he“铲heⅡg@zte.)received his bachelor’8 deg陀e&om AIlllui University,China.He joined ZTE in 2005,and is a pre—resea弛h engi-neer and senior aI℃hitect.His Itsearch interests include vahIe added selvices puling.万方数据播放器加载中,请稍候...
该用户其他文档
下载所得到的文件列表《A Hadoop Performance Prediction Model Based on Random Forest》.pdf
文档介绍:
A Hadoop Per颤啪咖ance P弛dic60n Modd Based on Random Forestzhendong Bei,Zhibin Yu,Huiling zhang,Chengzhong Xu,Shenzhong Feng,Zhenjiang Dong,and Hengsheng zhangDoI:lo.396明.i咖.1673_51黯.2013.02.晰http:f~哪w肋Hm州1【cm枷etail/3峨1294.1N.加13啪1.1524.∞1.h咖I,pu埘shed oIlliIIe Jllly l,2013A Hadoop Pe...
内容来自淘豆网转载请标明出处.}

我要回帖

更多关于 微信是什么 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信