linux 伪静态伪分布式这是怎么回事

posts - 362,&
comments - 11,&
trackbacks - 0
本文的配置环境是VMware10+centos2.5。
在学习大数据过程中,首先是要搭建环境,通过实验,在这里简短粘贴书写关于自己搭建大数据伪分布式环境的经验。
如果感觉有问题,欢迎咨询评论。
零:下载ruanjian
2.下载网址
  https://archive.apache.org/dist/hadoop/common/&
一:伪分布式准备工作
1.规划目录
2.修改目录所有者和所属组
3.删除原有的jdk
4.上传需要的jdk包
5.增加jdk 的执行权限
7.修改profile的JAVA_HOME,PATH
8.使文件生效
  不需要使用root用户。
9.检验jdk是否成功
二:搭建为分布式(主要是namenode与datanode)
1.解压hadoop
2.进入hadoop主目录
3.获取JAVA_HOME的目录
4.*.env.sh
5.修改hadoop-env.sh的JAVA_HOME
6.修改mapred-env.h的JAVA_HOME
  虽然官网没说,但是也需要修改。
7.修改yarn-env.sh的JAVA_HOME
  虽然官网没说,但是也需要修改。
&8.*-site.xml配置
9.配置core-site.xml
  8020是交互端口,namenode启动以后,可以通过浏览器进行访问hdfs文件系统。
  新建一个临时目录:
    注意点:sudo chown -R beifeng:beifeng data
  配置:
10.修改slave的配置
11.修改hdfs.site.xml
13.检验hdfs
14.格式化hdfs
  对文件操作系统进行格式化。
15.启动namenode 以及datanode进程
  注意:
    sudo chmod -R a+w hadoop-2.5.0/ 增加权限,因为要产生log文件夹。
16.查看浏览器,方便管理HDFS
  http://linux-:50070/
17.在HDFS上新建文件夹
15.在HDFS上上传文件
16.在HDFS上读取wenjian
17.在HDFS上下载文件到本地
18.删除在HDFS上的文件
  bin/hdfs dfs -rm -f core-site.xml
  如果不知道可以使用bin/hdfs dfs ,在确认后就弹出使用方法
三:继续搭建伪分布式(YARN部分的搭建)
2.配置yarn-site.xml
2..配置MapReduce的配置,MapReduce.site.xml
  表示mapreduce将要运行在yarn上
  sbin/yarn-daemon.sh start resourcemanager
  sbin/yarn-daemon.sh start nodemanager
4.浏览器上观察
  端口为8088.
&  http://linux-:8088
5.新建将要测试的文件
6.在HDFS上新建文件目录
7.上传本地的wc.input文件进刚刚新建的目录
8.在yarn上运行计算
  bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount mapreduce/wordcount/input mapreduce/wordcount/output1
9.查看结果
  bin/hdfs dfs -text mapreduce/wordcount/output1/pa*& &
  这个时候因为没有配置历史服务器,所以在途中的history没有用。
四:历史服务器的配置 
1.配置历史服务器,修改mapred-xite.xml
  历史服务器可以查看已经完成的MR程序作业记录。
  默认情况下历史服务器是不启动的。
  所以配置在mapred-site.xml中。
2.启动服务器
3.建议历史服务器在yarn启动之后紧接着启动
  sbin/mr-jobhistory-daemon.sh start historyserver
4.浏览器观察
  web端口是19888.
  再点击一下刚才的history,这里不需要再次运行mapreduce程序。
五:日志聚集功能
1.问题由来
  这个log的聚集主要说的是yarn里面的日志功能。
  会将MR程序运行的日志上传到HDFS上的目录中,默认会在&/&下产生一个tmp目录,这个tmp可以在HDFS的50070上看到,同时这个tmp对用户是无效的,没有权限。
  很多mapreduce会对应需要的日志,如果将日志聚集到hdfs上,可以方便的查看。
  19888上的logs:
  50070上的tmp
2.日志聚集功能,修改yarn.site.xml
3.重新启动resourcemanager,nodemanager,jobhistory
6.再次在yarn上运行程序
7.这时就可以点击logs,在yarn的管理页面上观看日志文件
8.logs点击的结果
但是问题还是没有完全解决好,有下面的问题。
9.HDFS用户权限的修改,点击tmp时,出现的问题效果
10.修改hdfs.xite.xml,使hdfs不在检查用户权限
  HDFS上会存在用户权限检查。
11.重新启动HDFS
  这个时候,其实,yarn也需要关闭,只是在验证tmp时可以不启动yarn。
12.再次点击tmp,即可进入
六:静态用户名的修改
1.修改静态用户名,之前的状态
&2.修改core.site.xml
3.重启HDFS和YARN
4.重启任务
5.这时静态用户将会变成设置的用户
阅读(...) 评论()CentOS7伪分布式下 hive安装过程中遇到的问题及解决办法 - hive-spark- - ITkeyowrd
CentOS7伪分布式下 hive安装过程中遇到的问题及解决办法
推荐:IPv6addr.c:106:31: error: clplumbing/cl_log.h: No such file or directory cc1: warnings being treated as errors IPv6addr.c: In function ‘main’: IPv6a
一、Exception in thread &main& java.lang.RuntimeException: java.net.ConnectException:
&span style=&font-size:18&&&span style=&font-size:14&&Exception in thread &main& java.lang.RuntimeException: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: C For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: C For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 8 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 28 more
&/span&&/span&
解决办法:
开启Hadoop
二、Exception in thread &main& java.lang.RuntimeException: java.lang.
&span style=&font-size:18&&&span style=&font-size:14&&Exception in thread &main& java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.&init&(Path.java:172)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:563)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 8 more
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.&init&(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 11 more
&/span&&/span&
解决办法:
1、修改/hive/conf/hive-site.xml,中含有&system:java.io.tmpdir&的配置项(共四处)
2、在/hive/目录下新建iotmp文件夹
3、将含有&system:java.io.tmpdir&的配置项的值修改为iotmp地址,修改后如下:
&span style=&font-size:18&&
&property&
&name&hive.exec.local.scratchdir&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Local scratch space for Hive jobs&/description&
&/property&&/span&
&span style=&font-size:18&&
&property&
&name&hive.downloaded.resources.dir&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Temporary local directory for added resources in the remote file system.&/description&
&/property&&/span&
&span style=&font-size:18&&
&property&
&name&hive.querylog.location&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Location of Hive run time structured log file&/description&
&/property&&/span&
&span style=&font-size:18&&
&property&
&name&hive.server2.logging.operation.log.location&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Top level directory where operation logs are stored if logging functionality is enabled&/description&
&/property&&/span&
三、[ERROR] Terminal i falling back to unsupported&java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
&span style=&font-size:18&&[ERROR] Terminal i falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.&init&(ConsoleReader.java:229)
at jline.console.ConsoleReader.&init&(ConsoleReader.java:221)
at jline.console.ConsoleReader.&init&(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Exception in thread &main& java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.console.ConsoleReader.&init&(ConsoleReader.java:230)
at jline.console.ConsoleReader.&init&(ConsoleReader.java:221)
at jline.console.ConsoleReader.&init&(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)&/span&
问题原因:jline***.jar文件版本过低。 解决办法:
1、把/hive/bin/目录下的jline2.12.jar,复制到/hadoop/share/hadoop/yarn/lib/目录下;
2、删除/hadoop/share/hadoop/yarn/lib/目录下的jline***.jar文件
3、修改jline2.12.jar权限为775。 chmod 775 &jline2.12.jar
四、配置文件的修改
1、首先执行 cp hive-env.sh.template hive-env.sh
&&cp hive-default.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
2、vim hive-env.sh
&export HADOOP_HOME=/home/sky/hadoop
3、vim hive-site.xml
&span style=&font-size:18&&&?xml version=&1.0& encoding=&UTF-8& standalone=&no&?&
&?xml-stylesheet type=&text/xsl& href=&configuration.xsl&?&&!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.
See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the &License&); you may not use this file except in compliance with
the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an &AS IS& BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--&&configuration&
&!-- WARNING!!! This file is auto generated for documentation purposes ONLY! --&
&!-- WARNING!!! Any changes you make to this file will be ignored by Hive.
&!-- WARNING!!! You must make your changes in hive-site.xml instead.
&!-- Hive Execution Parameters --&
&property&
&name&hive.exec.script.wrapper&/name&
&description/&
&/property&
&property&
&name&hive.exec.plan&/name&
&description/&
&/property&
&property&
&name&hive.plan.serialization.format&/name&
&value&kryo&/value&
&description&
Query plan format serialization between client and task nodes.
Two supported values are : kryo and javaXML. Kryo is default.
&/description&
&/property&
&property&
&name&hive.exec.stagingdir&/name&
&value&.hive-staging&/value&
&description&Directory name that will be created inside table locations in order to support HDFS encryption. This is replaces ${hive.exec.scratchdir} for query results with the exception of read-only tables. In all cases ${hive.exec.scratchdir} is still used for other temporary files, such as job plans.&/description&
&/property&
&property&
&name&hive.exec.scratchdir&/name&
&value&/tmp/hive&/value&
&description&HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&username& is created, with ${hive.scratch.dir.permission}.&/description&
&/property&
&property&
&name&hive.exec.local.scratchdir&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Local scratch space for Hive jobs&/description&
&/property&
&property&
&name&hive.downloaded.resources.dir&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Temporary local directory for added resources in the remote file system.&/description&
&/property&
&property&
&name&hive.scratch.dir.permission&/name&
&value&700&/value&
&description&The permission for the user specific scratch directories that get created.&/description&
&/property&
&property&
&name&hive.exec.submitviachild&/name&
&value&false&/value&
&description/&
&/property&
&property&
&name&hive.exec.submit.local.task.via.child&/name&
&value&true&/value&
&description&
Determines whether local tasks (typically mapjoin hashtable generation phase) runs in
separate JVM (true recommended) or not.
Avoids the overhead of spawning new JVM, but can lead to out-of-memory issues.
&/description&
&/property&
&property&
&name&hive.exec.script.maxerrsize&/name&
&value&100000&/value&
&description&
Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task).
This prevents runaway scripts from filling logs partitions to capacity
&/description&
&/property&
&property&
&name&hive.exec.script.allow.partial.consumption&/name&
&value&false&/value&
&description&
When enabled, this option allows a user script to exit successfully without consuming
all the data from the standard input.
&/description&
&/property&
&property&
&name&stream.stderr.reporter.prefix&/name&
&value&reporter:&/value&
&description&Streaming jobs that log to standard error with this prefix can log counter or status information.&/description&
&/property&
&property&
&name&stream.stderr.reporter.enabled&/name&
&value&true&/value&
&description&Enable consumption of status and counter messages for streaming jobs.&/description&
&/property&
&property&
&name&press.output&/name&
&value&false&/value&
&description&
This controls whether the final outputs of a query (to a local/HDFS file or a Hive table) is compressed.
The compression codec and other options are determined from Hadoop config variables press*
&/description&
&/property&
&property&
&name&press.intermediate&/name&
&value&false&/value&
&description&
This controls whether intermediate files produced by Hive between multiple map-reduce jobs are compressed.
The compression codec and other options are determined from Hadoop config variables press*
&/description&
&/property&
&property&
&name&pression.codec&/name&
&description/&
&/property&
&property&
&name&pression.type&/name&
&description/&
&/property&
&property&
&name&hive.exec.reducers.bytes.per.reducer&/name&
&value&&/value&
&description&size per reducer.The default is 256Mb, i.e if the input size is 1G, it will use 4 reducers.&/description&
&/property&
&property&
&name&hive.exec.reducers.max&/name&
&value&1009&/value&
&description&
max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is
negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.
&/description&
&/property&
&property&
&name&hive.exec.pre.hooks&/name&
&description&
Comma-separated list of pre-execution hooks to be invoked for each statement.
A pre-execution hook is specified as the name of a Java class which implements the
org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.
&/description&
&/property&
&property&
&name&hive.exec.post.hooks&/name&
&description&
Comma-separated list of post-execution hooks to be invoked for each statement.
A post-execution hook is specified as the name of a Java class which implements the
org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.
&/description&
&/property&
&property&
&name&hive.exec.failure.hooks&/name&
&description&
Comma-separated list of on-failure hooks to be invoked for each statement.
An on-failure hook is specified as the name of Java class which implements the
org.apache.hadoop.hive.ql.hooks.ExecuteWithHookContext interface.
&/description&
&/property&
&property&
&name&hive.exec.query.redactor.hooks&/name&
&description&
Comma-separated list of hooks to be invoked for each query which can
tranform the query before it's placed in the job.xml file. Must be a Java class which
extends from the org.apache.hadoop.hive.ql.hooks.Redactor abstract class.
&/description&
&/property&
&property&
&name&hive.client.stats.publishers&/name&
&description&
Comma-separated list of statistics publishers to be invoked on counters on each job.
A client stats publisher is specified as the name of a Java class which implements the
org.apache.hadoop.hive.ql.stats.ClientStatsPublisher interface.
&/description&
&/property&
&property&
&name&hive.exec.parallel&/name&
&value&false&/value&
&description&Whether to execute jobs in parallel&/description&
&/property&
&property&
&name&hive.exec.parallel.thread.number&/name&
&value&8&/value&
&description&How many jobs at most can be executed in parallel&/description&
&/property&
&property&
&name&hive.mapred.reduce.tasks.speculative.execution&/name&
&value&true&/value&
&description&Whether speculative execution for reducers should be turned on. &/description&
&/property&
&property&
&name&hive.exec.counters.pull.interval&/name&
&value&1000&/value&
&description&
The interval with which to poll the JobTracker for the counters the running job.
The smaller it is the more load there will be on the jobtracker, the higher it is the less granular the caught will be.
&/description&
&/property&
&property&
&name&hive.exec.dynamic.partition&/name&
&value&true&/value&
&description&Whether or not to allow dynamic partitions in DML/DDL.&/description&
&/property&
&property&
&name&hive.exec.dynamic.partition.mode&/name&
&value&strict&/value&
&description&
In strict mode, the user must specify at least one static partition
in case the user accidentally overwrites all partitions.
In nonstrict mode all partitions are allowed to be dynamic.
&/description&
&/property&
&property&
&name&hive.exec.max.dynamic.partitions&/name&
&value&1000&/value&
&description&Maximum number of dynamic partitions allowed to be created in total.&/description&
&/property&
&property&
&name&hive.exec.max.dynamic.partitions.pernode&/name&
&value&100&/value&
&description&Maximum number of dynamic partitions allowed to be created in each mapper/reducer node.&/description&
&/property&
&property&
&name&hive.exec.max.created.files&/name&
&value&100000&/value&
&description&Maximum number of HDFS files created by all mappers/reducers in a MapReduce job.&/description&
&/property&
&property&
&name&hive.exec.default.partition.name&/name&
&value&__HIVE_DEFAULT_PARTITION__&/value&
&description&
The default partition name in case the dynamic partition column value is null/empty string or any other values that cannot be escaped.
This value must not contain any special character used in HDFS URI (e.g., ':', '%', '/' etc).
The user has to be aware that the dynamic partition value should not contain this value to avoid confusions.
&/description&
&/property&
&property&
&name&hive.lockmgr.zookeeper.default.partition.name&/name&
&value&__HIVE_DEFAULT_ZOOKEEPER_PARTITION__&/value&
&description/&
&/property&
&property&
&name&hive.exec.show.job.&/name&
&value&true&/value&
&description&
If a job fails, whether to provide a link in the CLI to the task with the
most failures, along with debugging hints if applicable.
&/description&
&/property&
&property&
&name&hive.exec.job.debug.capture.stacktraces&/name&
&value&true&/value&
&description&
Whether or not stack traces parsed from the task logs of a sampled failed task
for each failed job should be stored in the SessionState
&/description&
&/property&
&property&
&name&hive.exec.job.debug.timeout&/name&
&value&30000&/value&
&description/&
&/property&
&property&
&name&hive.exec.tasklog.debug.timeout&/name&
&value&20000&/value&
&description/&
&/property&
&property&
&name&hive.output.file.extension&/name&
&description&
String used as a file extension for output files.
If not set, defaults to the codec extension for text files (e.g. &.gz&), or no extension otherwise.
&/description&
&/property&
&property&
&name&hive.exec.mode.local.auto&/name&
&value&false&/value&
&description&Let Hive determine whether to run in local mode automatically&/description&
&/property&
&property&
&name&hive.exec.mode.local.auto.inputbytes.max&/name&
&value&&/value&
&description&When hive.exec.mode.local.auto is true, input bytes should less than this for local mode.&/description&
&/property&
&property&
&name&hive.exec.mode.local.auto.input.files.max&/name&
&value&4&/value&
&description&When hive.exec.mode.local.auto is true, the number of tasks should less than this for local mode.&/description&
&/property&
&property&
&name&hive.exec.drop.ignorenonexistent&/name&
&value&true&/value&
&description&Do not report an error if DROP TABLE/VIEW/Index/Function specifies a non-existent table/view/index/function&/description&
&/property&
&property&
&name&hive.ignore.mapjoin.hint&/name&
&value&true&/value&
&description&Ignore the mapjoin hint&/description&
&/property&
&property&
&name&hive.file.max.footer&/name&
&value&100&/value&
&description&maximum number of lines for footer user can define for a table file&/description&
&/property&
&property&
&name&hive.resultset.use.unique.column.names&/name&
&value&true&/value&
&description&
Make column names unique in the result set by qualifying column names with table alias if needed.
Table alias will be added to column names for queries of type &select *& or
if query explicitly uses table alias &select r1.x..&.
&/description&
&/property&
&property&
&name&fs.har.impl&/name&
&value&org.apache.hadoop.hive.shims.HiveHarFileSystem&/value&
&description&The implementation for accessing Hadoop Archives. Note that this won't be applicable to Hadoop versions less than 0.20&/description&
&/property&
&property&
&name&hive.metastore.warehouse.dir&/name&
&value&/user/hive/warehouse&/value&
&description&location of default database for the warehouse&/description&
&/property&
&property&
&name&hive.metastore.uris&/name&
&description&Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.&/description&
&/property&
&property&
&name&hive.metastore.connect.retries&/name&
&value&3&/value&
&description&Number of retries while opening a connection to metastore&/description&
&/property&
&property&
&name&hive.metastore.failure.retries&/name&
&value&1&/value&
&description&Number of retries upon failure of Thrift metastore calls&/description&
&/property&
&property&
&name&hive.metastore.client.connect.retry.delay&/name&
&value&1s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
Number of seconds for the client to wait between consecutive connection attempts
&/description&
&/property&
&property&
&name&hive.metastore.client.socket.timeout&/name&
&value&600s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
MetaStore Client socket timeout in seconds
&/description&
&/property&
&property&
&name&hive.metastore.client.socket.lifetime&/name&
&value&0s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
MetaStore Client socket lifetime in seconds. After this time is exceeded, client
reconnects on the next MetaStore operation. A value of 0s means the connection
has an infinite lifetime.
&/description&
&/property&
&property&
&name&javax.jdo.option.ConnectionPassword&/name&
&value&hive&/value&
&description&password to use against metastore database&/description&
&/property&
&property&
&name&hive.metastore.local&/name&
&value&true&/value&
&description&&/description&
&/property&
&property&
&name&hive.metastore.ds.connection.url.hook&/name&
&description&Name of the hook to use for retrieving the JDO connection URL. If empty, the value in javax.jdo.option.ConnectionURL is used&/description&
&/property&
&property&
&name&javax.jdo.option.Multithreaded&/name&
&value&true&/value&
&description&Set this to true if multiple threads access metastore through JDO concurrently.&/description&
&/property&
&property&
&name&javax.jdo.option.ConnectionURL&/name&
&value&jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&/value&
&description&JDBC connect string for a JDBC metastore&/description&
&/property&
&property&
&name&hive.hmshandler.retry.attempts&/name&
&value&10&/value&
&description&The number of times to retry a HMSHandler call if there were a connection error.&/description&
&/property&
&property&
&name&hive.hmshandler.retry.interval&/name&
&value&2000ms&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified.
The time between HMSHandler retry attempts on failure.
&/description&
&/property&
&property&
&name&hive.hmshandler.force.reload.conf&/name&
&value&false&/value&
&description&
Whether to force reloading of the HMSHandler configuration (including
the connection URL, before the next metastore query that accesses the
datastore. Once reloaded, this value is reset to false. Used for
testing only.
&/description&
&/property&
&property&
&name&hive.metastore.server.max.message.size&/name&
&value&&/value&
&description&Maximum message size in bytes a HMS will accept.&/description&
&/property&
&property&
&name&hive.metastore.server.min.threads&/name&
&value&200&/value&
&description&Minimum number of worker threads in the Thrift server's pool.&/description&
&/property&
&property&
&name&hive.metastore.server.max.threads&/name&
&value&1000&/value&
&description&Maximum number of worker threads in the Thrift server's pool.&/description&
&/property&
&property&
&name&hive.metastore.server.tcp.keepalive&/name&
&value&true&/value&
&description&Whether to enable TCP keepalive for the metastore server. Keepalive will prevent accumulation of half-open connections.&/description&
&/property&
&property&
&name&hive.metastore.archive.intermediate.original&/name&
&value&_INTERMEDIATE_ORIGINAL&/value&
&description&
Intermediate dir suffixes used for archiving. Not important what they
are, as long as collisions are avoided
&/description&
&/property&
&property&
&name&hive.metastore.archive.intermediate.archived&/name&
&value&_INTERMEDIATE_ARCHIVED&/value&
&description/&
&/property&
&property&
&name&hive.metastore.archive.intermediate.extracted&/name&
&value&_INTERMEDIATE_EXTRACTED&/value&
&description/&
&/property&
&property&
&name&hive.metastore.kerberos.keytab.file&/name&
&description&The path to the Kerberos Keytab file containing the metastore Thrift server's service principal.&/description&
&/property&
&property&
&name&hive.metastore.kerberos.principal&/name&
&value&hive-metastore/_&/value&
&description&
The service principal for the metastore Thrift server.
The special string _HOST will be replaced automatically with the correct host name.
&/description&
&/property&
&property&
&name&hive.metastore.sasl.enabled&/name&
&value&false&/value&
&description&If true, the metastore Thrift interface will be secured with SASL. Clients must authenticate with Kerberos.&/description&
&/property&
&property&
&name&hive.metastore.thrift.framed.transport.enabled&/name&
&value&false&/value&
&description&If true, the metastore Thrift interface will use TFramedTransport. When false (default) a standard TTransport is used.&/description&
&/property&
&property&
&name&hive.pact.protocol.enabled&/name&
&value&false&/value&
&description&
If true, the metastore Thrift interface will use TCompactProtocol. When false (default) TBinaryProtocol will be used.
Setting it to true will break compatibility with older clients running TBinaryProtocol.
&/description&
&/property&
&property&
&name&hive.cluster.delegation.token.store.class&/name&
&value&org.apache.hadoop.hive.thrift.MemoryTokenStore&/value&
&description&The delegation token store implementation. Set to org.apache.hadoop.hive.thrift.ZooKeeperTokenStore for load-balanced cluster.&/description&
&/property&
&property&
&name&hive.cluster.delegation.token.store.zookeeper.connectString&/name&
&description&
The ZooKeeper token store connect string. You can re-use the configuration value
set in hive.zookeeper.quorum, by leaving this parameter unset.
&/description&
&/property&
&property&
&name&hive.cluster.delegation.token.store.zookeeper.znode&/name&
&value&/hivedelegation&/value&
&description&
The root path for token store data. Note that this is used by both HiveServer2 and
MetaStore to store delegation Token. One directory gets created for each of them.
The final directory names would have the servername appended to it (HIVESERVER2,
METASTORE).
&/description&
&/property&
&property&
&name&hive.cluster.delegation.token.store.zookeeper.acl&/name&
&description&
ACL for token store entries. Comma separated list of ACL entries. For example:
sasl:hive/host1@MY.DOMAIN:cdrwa,sasl:hive/host2@MY.DOMAIN:cdrwa
Defaults to all permissions for the hiveserver2/metastore process user.
&/description&
&/property&
&property&
&name&hive.metastore.cache.pinobjtypes&/name&
&value&Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order&/value&
&description&List of comma separated metastore object types that should be pinned in the cache&/description&
&/property&
&property&
&name&datanucleus.connectionPoolingType&/name&
&value&BONECP&/value&
&description&Specify connection pool library for datanucleus&/description&
&/property&
&property&
&name&datanucleus.validateTables&/name&
&value&false&/value&
&description&validates existing schema against code. turn this on if you want to verify existing schema&/description&
&/property&
&property&
&name&datanucleus.validateColumns&/name&
&value&false&/value&
&description&validates existing schema against code. turn this on if you want to verify existing schema&/description&
&/property&
&property&
&name&datanucleus.validateConstraints&/name&
&value&false&/value&
&description&validates existing schema against code. turn this on if you want to verify existing schema&/description&
&/property&
&property&
&name&datanucleus.storeManagerType&/name&
&value&rdbms&/value&
&description&metadata store type&/description&
&/property&
&property&
&name&datanucleus.autoCreateSchema&/name&
&value&true&/value&
&description&creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once&/description&
&/property&
&property&
&name&datanucleus.fixedDatastore&/name&
&value&false&/value&
&description/&
&/property&
&property&
&name&hive.metastore.schema.verification&/name&
&value&false&/value&
&description&
Enforce metastore schema version consistency.
True: Verify that version information stored in metastore matches with one from Hive jars.
Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
&/description&
&/property&
&property&
&name&hive.metastore.schema.verification.record.version&/name&
&value&true&/value&
&description&
When true the current MS version is recorded in the VERSION table. If this is disabled and verification is
enabled the MS will be unusable.
&/description&
&/property&
&property&
&name&datanucleus.autoStartMechanismMode&/name&
&value&checked&/value&
&description&throw exception if metadata tables are incorrect&/description&
&/property&
&property&
&name&datanucleus.transactionIsolation&/name&
&value&read-committed&/value&
&description&Default transaction isolation level for identity generation.&/description&
&/property&
&property&
&name&datanucleus.cache.level2&/name&
&value&false&/value&
&description&Use a level 2 cache. Turn this off if metadata is changed independently of Hive metastore server&/description&
&/property&
&property&
&name&datanucleus.cache.level2.type&/name&
&value&none&/value&
&description/&
&/property&
&property&
&name&datanucleus.identifierFactory&/name&
&value&datanucleus1&/value&
&description&
Name of the identifier factory to use when generating table/column names etc.
'datanucleus1' is used for backward compatibility with DataNucleus v1
&/description&
&/property&
&property&
&name&datanucleus.rdbms.useLegacyNativeValueStrategy&/name&
&value&true&/value&
&description/&
&/property&
&property&
&name&datanucleus.plugin.pluginRegistryBundleCheck&/name&
&value&LOG&/value&
&description&Defines what happens when plugin bundles are found and are duplicated [EXCEPTION|LOG|NONE]&/description&
&/property&
&property&
&name&hive.metastore.batch.retrieve.max&/name&
&value&300&/value&
&description&
Maximum number of objects (tables/partitions) can be retrieved from metastore in one batch.
The higher the number, the less the number of round trips is needed to the Hive metastore server,
but it may also cause higher memory requirement at the client side.
&/description&
&/property&
&property&
&name&hive.metastore.batch.retrieve.table.partition.max&/name&
&value&1000&/value&
&description&Maximum number of table partitions that metastore internally retrieves in one batch.&/description&
&/property&
&property&
&name&hive.metastore.init.hooks&/name&
&description&
A comma separated list of hooks to be invoked at the beginning of HMSHandler initialization.
An init hook is specified as the name of Java class which extends org.apache.hadoop.hive.metastore.MetaStoreInitListener.
&/description&
&/property&
&property&
&name&hive.metastore.pre.event.listeners&/name&
&description&List of comma separated listeners for metastore events.&/description&
&/property&
&property&
&name&hive.metastore.event.listeners&/name&
&description/&
&/property&
&property&
&name&hive.metastore.event.db.listener.timetolive&/name&
&value&86400s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
time after which events will be removed from the database listener queue
&/description&
&/property&
&property&
&name&hive.metastore.authorization.storage.checks&/name&
&value&false&/value&
&description&
Should the metastore do authorization checks against the underlying storage (usually hdfs)
for operations like drop-partition (disallow the drop-partition if the user in
question doesn't have permissions to delete the corresponding directory
on the storage).
&/description&
&/property&
&property&
&name&hive.metastore.event.clean.freq&/name&
&value&0s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
Frequency at which timer task runs to purge expired events in metastore.
&/description&
&/property&
&property&
&name&hive.metastore.event.expiry.duration&/name&
&value&0s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
Duration after which events expire from events table
&/description&
&/property&
&property&
&name&hive.metastore.execute.setugi&/name&
&value&true&/value&
&description&
In unsecure mode, setting this property to true will cause the metastore to execute DFS operations using
the client's reported user and group permissions. Note that this property must be set on
both the client and server sides. Further note that its best effort.
If client sets its to true and server sets it to false, client setting will be ignored.
&/description&
&/property&
&property&
&name&hive.metastore.partition.name.whitelist.pattern&/name&
&description&Partition names will be checked against this regex pattern and rejected if not matched.&/description&
&/property&
&property&
&name&hive.metastore.integral.jdo.pushdown&/name&
&value&false&/value&
&description&
Allow JDO query pushdown for integral partition columns in metastore. Off by default. This
improves metastore perf for integral columns, especially if there's a large number of partitions.
However, it doesn't work correctly with integral values that are not normalized (e.g. have
leading zeroes, like 0012). If metastore direct SQL is enabled and works, this optimization
is also irrelevant.
&/description&
&/property&
&property&
&name&hive.metastore.try.direct.sql&/name&
&value&true&/value&
&description&
Whether the Hive metastore should try to use direct SQL queries instead of the
DataNucleus for certain read paths. This can improve metastore performance when
fetching many partitions or column statistics by however, it
is not guaranteed to work on all RDBMS-es and all versions. In case of SQL failures,
the metastore will fall back to the DataNucleus, so it's safe even if SQL doesn't
work for all queries on your datastore. If all SQL queries fail (for example, your
metastore is backed by MongoDB), you might want to disable this to save the
try-and-fall-back cost.
&/description&
&/property&
&property&
&name&hive.metastore.direct.sql.batch.size&/name&
&value&0&/value&
&description&
Batch size for partition and other object retrieval from the underlying DB in direct
SQL. For some DBs like Oracle and MSSQL, there are hardcoded or perf-based limitations
that necessitate this. For DBs that can handle the queries, this isn't necessary and
may impede performance. -1 means no batching, 0 means automatic batching.
&/description&
&/property&
&property&
&name&hive.metastore.try.direct.sql.ddl&/name&
&value&true&/value&
&description&
Same as hive.metastore.try.direct.sql, for read statements within a transaction that
modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL
select query has incorrect syntax or something similar inside a transaction, the
entire transaction will fail and fall-back to DataNucleus will not be possible. You
should disable the usage of direct SQL inside transactions if that happens in your case.
&/description&
&/property&
&property&
&name&hive.metastore.orm.retrieveMapNullsAsEmptyStrings&/name&
&value&false&/value&
&description&Thrift does not support nulls in maps, so any nulls present in maps retrieved from ORM must either be pruned or converted to empty strings. Some backing dbs such as Oracle persist empty strings as nulls, so we should set this parameter if we wish to reverse that behaviour. For others, pruning is the correct behaviour&/description&
&/property&
&property&
&name&hive.metastore.disallow.incompatible.col.type.changes&/name&
&value&false&/value&
&description&
If true (default is false), ALTER TABLE operations which change the type of a
column (say STRING) to an incompatible type (say MAP) are disallowed.
RCFile default SerDe (ColumnarSerDe) serializes the values in such a way that the
datatypes can be converted from string to any type. The map is also serialized as
a string, which can be read as a string as well. However, with any binary
serialization, this is not true. Blocking the ALTER TABLE prevents ClassCastExceptions
when subsequently trying to access old partitions.
Primitive types like INT, STRING, BIGINT, etc., are compatible with each other and are
not blocked.
See HIVE-4409 for more details.
&/description&
&/property&
&property&
&name&hive.table.parameters.default&/name&
&description&Default property values for newly created tables&/description&
&/property&
&property&
&name&hive.ddl.createtablelike.properties.whitelist&/name&
&description&Table Properties to copy over when executing a Create Table Like.&/description&
&/property&
&property&
&name&hive.metastore.rawstore.impl&/name&
&value&org.apache.hadoop.hive.metastore.ObjectStore&/value&
&description&
Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface.
This class is used to store and retrieval of raw metadata objects such as table, database
&/description&
&/property&
&property&
&name&javax.jdo.option.ConnectionDriverName&/name&
&value&com.mysql.jdbc.Driver&/value&
&description&Driver class name for a JDBC metastore&/description&
&/property&
&property&
&name&javax.jdo.PersistenceManagerFactoryClass&/name&
&value&org.datanucleus.api.jdo.JDOPersistenceManagerFactory&/value&
&description&class implementing the jdo persistence&/description&
&/property&
&property&
&name&hive.metastore.expression.proxy&/name&
&value&org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore&/value&
&description/&
&/property&
&property&
&name&javax.jdo.option.DetachAllOnCommit&/name&
&value&true&/value&
&description&Detaches all objects from session so that they can be used after transaction is committed&/description&
&/property&
&property&
&name&javax.jdo.option.NonTransactionalRead&/name&
&value&true&/value&
&description&Reads outside of transactions&/description&
&/property&
&property&
&name&javax.jdo.option.ConnectionUserName&/name&
&value&hive&/value&
&description&Username to use against metastore database&/description&
&/property&
&property&
&name&hive.metastore.end.function.listeners&/name&
&description&List of comma separated listeners for the end of metastore functions.&/description&
&/property&
&property&
&name&hive.metastore.partition.inherit.table.properties&/name&
&description&
List of comma separated keys occurring in table properties which will get inherited to newly created partitions.
* implies all the keys will get inherited.
&/description&
&/property&
&property&
&name&hive.metastore.filter.hook&/name&
&value&org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl&/value&
&description&Metastore hook class for filtering the metadata read results. If hive.security.authorization.manageris set to instance of HiveAuthorizerFactory, then this value is ignored.&/description&
&/property&
&property&
&name&hive.metastore.dml.events&/name&
&value&false&/value&
&description&If true, the metastore will be asked to fire events for DML operations&/description&
&/property&
&property&
&name&hive.metastore.client.drop.partitions.using.expressions&/name&
&value&true&/value&
&description&Choose whether dropping partitions with HCatClient pushes the partition-predicate to the metastore, or drops partitions iteratively&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.enabled&/name&
&value&true&/value&
&description&Whether aggregate stats caching is enabled or not.&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.size&/name&
&value&10000&/value&
&description&Maximum number of aggregate stats nodes that we will place in the metastore aggregate stats cache.&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.max.partitions&/name&
&value&10000&/value&
&description&Maximum number of partitions that are aggregated per cache node.&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.fpp&/name&
&value&0.01&/value&
&description&Maximum false positive probability for the Bloom Filter used in each aggregate stats cache node (default 1%).&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.max.variance&/name&
&value&0.01&/value&
&description&Maximum tolerable variance in number of partitions between a cached node and our request (default 1%).&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.ttl&/name&
&value&600s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
Number of seconds for a cached node to be active in the cache before they become stale.
&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.max.writer.wait&/name&
&value&5000ms&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified.
Number of milliseconds a writer will wait to acquire the writelock before giving up.
&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.max.reader.wait&/name&
&value&1000ms&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified.
Number of milliseconds a reader will wait to acquire the readlock before giving up.
&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.max.full&/name&
&value&0.9&/value&
&description&Maximum cache full % after which the cache cleaner thread kicks in.&/description&
&/property&
&property&
&name&hive.metastore.aggregate.stats.cache.clean.until&/name&
&value&0.8&/value&
&description&The cleaner thread cleans until cache reaches this % full size.&/description&
&/property&
&property&
&name&hive.metadata.export.location&/name&
&description&
When used in conjunction with the org.apache.hadoop.hive.ql.parse.MetaDataExportListener pre event listener,
it is the location to which the metadata will be exported. The default is an empty string, which results in the
metadata being exported to the current user's home directory on HDFS.
&/description&
&/property&
&property&
&name&hive.metadata.move.exported.metadata.to.trash&/name&
&value&true&/value&
&description&
When used in conjunction with the org.apache.hadoop.hive.ql.parse.MetaDataExportListener pre event listener,
this setting determines if the metadata that is exported will subsequently be moved to the user's trash directory
alongside the dropped table data. This ensures that the metadata will be cleaned up along with the dropped table data.
&/description&
&/property&
&property&
&name&hive.cli.errors.ignore&/name&
&value&false&/value&
&description/&
&/property&
&property&
&name&hive.cli.print.current.db&/name&
&value&false&/value&
&description&Whether to include the current database in the Hive prompt.&/description&
&/property&
&property&
&name&hive.cli.prompt&/name&
&value&hive&/value&
&description&
Command line prompt configuration value. Other hiveconf can be used in this configuration value.
Variable substitution will only be invoked at the Hive CLI startup.
&/description&
&/property&
&property&
&name&hive.cli.pretty.output.num.cols&/name&
&value&-1&/value&
&description&
The number of columns to use when formatting output generated by the DESCRIBE PRETTY table_name command.
If the value of this property is -1, then Hive will use the auto-detected terminal width.
&/description&
&/property&
&property&
&name&hive.metastore.fs.handler.class&/name&
&value&org.apache.hadoop.hive.metastore.HiveMetaStoreFsImpl&/value&
&description/&
&/property&
&property&
&name&hive.session.id&/name&
&description/&
&/property&
&property&
&name&hive.session.silent&/name&
&value&false&/value&
&description/&
&/property&
&property&
&name&hive.session.history.enabled&/name&
&value&false&/value&
&description&Whether to log Hive query, query plan, runtime statistics etc.&/description&
&/property&
&property&
&name&hive.query.string&/name&
&description&Query being executed (might be multiple per a session)&/description&
&/property&
&property&
&name&hive.query.id&/name&
&description&ID for query being executed (might be multiple per a session)&/description&
&/property&
&property&
&name&hive.jobname.length&/name&
&value&50&/value&
&description&max jobname length&/description&
&/property&
&property&
&name&hive.jar.path&/name&
&description&The location of hive_cli.jar that is used when submitting jobs in a separate jvm.&/description&
&/property&
&property&
&name&hive.aux.jars.path&/name&
&description&The location of the plugin jars that contain implementations of user defined functions and serdes.&/description&
&/property&
&property&
&name&hive.reloadable.aux.jars.path&/name&
&description&Jars can be renewed by executing reload command. And these jars can be used as the auxiliary classes like creating a UDF or SerDe.&/description&
&/property&
&property&
&name&hive.added.files.path&/name&
&description&This an internal parameter.&/description&
&/property&
&property&
&name&hive.added.jars.path&/name&
&description&This an internal parameter.&/description&
&/property&
&property&
&name&hive.added.archives.path&/name&
&description&This an internal parameter.&/description&
&/property&
&property&
&name&hive.auto.progress.timeout&/name&
&value&0s&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified.
How long to run autoprogressor for the script/UDTF operators.
Set to 0 for forever.
&/description&
&/property&
&property&
&name&hive.script.auto.progress&/name&
&value&false&/value&
&description&
Whether Hive Transform/Map/Reduce Clause should automatically send progress information to TaskTracker
to avoid the task getting killed because of inactivity.
Hive sends progress information when the script is
outputting to stderr.
This option removes the need of periodically producing stderr messages,
but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.
&/description&
&/property&
&property&
&name&hive.script.operator.id.env.var&/name&
&value&HIVE_SCRIPT_OPERATOR_ID&/value&
&description&
Name of the environment variable that holds the unique script operator ID in the user's
transform function (the custom mapper/reducer that the user has specified in the query)
&/description&
&/property&
&property&
&name&hive.script.operator.truncate.env&/name&
&value&false&/value&
&description&Truncate each environment variable for external script in scripts operator to 20KB (to fit system limits)&/description&
&/property&
&property&
&name&hive.script.operator.env.blacklist&/name&
&value&hive.txn.valid.txns,hive.script.operator.env.blacklist&/value&
&description&Comma separated list of keys from the configuration file not to convert to environment variables when envoking the script operator&/description&
&/property&
&property&
&name&hive.mapred.mode&/name&
&value&nonstrict&/value&
&description&
The mode in which the Hive operations are being performed.
In strict mode, some risky queries are not allowed to run. They include:
Cartesian Product.
No partition being picked up for a query.
Comparing bigints and strings.
Comparing bigints and doubles.
Orderby without limit.
&/description&
&/property&
&property&
&name&hive.alias&/name&
&description/&
&/property&
&property&
&name&hive.map.aggr&/name&
&value&true&/value&
&description&Whether to use map-side aggregation in Hive Group By queries&/description&
&/property&
&property&
&name&hive.groupby.skewindata&/name&
&value&false&/value&
&description&Whether there is skew in data to optimize group by queries&/description&
&/property&
&property&
&name&hive.join.emit.interval&/name&
&value&1000&/value&
&description&How many rows in the right-most join operand Hive should buffer before emitting the join result.&/description&
&/property&
&property&
&name&hive.join.cache.size&/name&
&value&25000&/value&
&description&How many rows in the joining tables (except the streaming table) should be cached in memory.&/description&
&/property&
&property&
&name&hive.cbo.enable&/name&
&value&true&/value&
&description&Flag to control enabling Cost Based Optimizations using Calcite framework.&/description&
&/property&
&property&
&name&hive.cbo.returnpath.hiveop&/name&
&value&false&/value&
&description&Flag to control calcite plan to hive operator conversion&/description&
&/property&
&property&
&name&hive.cbo.costmodel.extended&/name&
&value&false&/value&
&description&Flag to control enabling the extended cost model based onCPU, IO and cardinality. Otherwise, the cost model is based on cardinality.&/description&
&/property&
&property&
&name&hive.cbo.costmodel.cpu&/name&
&value&0.000001&/value&
&description&Default cost of a comparison&/description&
&/property&
&property&
&name&hive.cbo.costmodel.network&/name&
&value&150.0&/value&
&description&Default cost of a transfering expressed as multiple of CPU cost&/description&
&/property&
&property&
&name&hive.cbo.costmodel.local.fs.write&/name&
&value&4.0&/value&
&description&Default cost of writing a byte to local FS; expressed as multiple of NETWORK cost&/description&
&/property&
&property&
&name&hive.cbo.costmodel.local.fs.read&/name&
&value&4.0&/value&
&description&Default cost of reading a byte from local FS; expressed as multiple of NETWORK cost&/description&
&/property&
&property&
&name&hive.cbo.costmodel.hdfs.write&/name&
&value&10.0&/value&
&description&Default cost of writing a byte to HDFS; expressed as multiple of Local FS write cost&/description&
&/property&
&property&
&name&hive.cbo.costmodel.hdfs.read&/name&
&value&1.5&/value&
&description&Default cost of reading a byte from HDFS; expressed as multiple of Local FS read cost&/description&
&/property&
&property&
&name&hive.mapjoin.bucket.cache.size&/name&
&value&100&/value&
&description/&
&/property&
&property&
&name&hive.mapjoin.optimized.hashtable&/name&
&value&true&/value&
&description&
Whether Hive should use memory-optimized hash table for MapJoin. Only works on Tez,
because memory-optimized hashtable cannot be serialized.
&/description&
&/property&
&property&
&name&hive.mapjoin.hybridgrace.hashtable&/name&
&value&true&/value&
&description&Whether to use hybridgrace hash join as the join method for mapjoin. Tez only.&/description&
&/property&
&property&
&name&hive.mapjoin.hybridgrace.memcheckfrequency&/name&
&value&1024&/value&
&description&For hybrid grace hash join, how often (how many rows apart) we check if memory is full. This number should be power of 2.&/description&
&/property&
&property&
&name&hive.mapjoin.hybridgrace.minwbsize&/name&
&value&524288&/value&
&description&For hybrid grace hash join, the minimum write buffer size used by optimized hashtable. Default is 512 KB.&/description&
&/property&
&property&
&name&hive.mapjoin.hybridgrace.minnumpartitions&/name&
&value&16&/value&
&description&For hybrid grace hash join, the minimum number of partitions to create.&/description&
&/property&
&property&
&name&hive.mapjoin.optimized.hashtable.wbsize&/name&
&value&&/value&
&description&
Optimized hashtable (see hive.mapjoin.optimized.hashtable) uses a chain of buffers to
store data. This is one buffer size. HT may be slightly faster if this is larger, but for small
joins unnecessary memory will be allocated and then trimmed.
&/description&
&/property&
&property&
&name&hive.smbjoin.cache.rows&/name&
&value&10000&/value&
&description&How many rows with the same key value should be cached in memory per smb joined table.&/description&
&/property&
&property&
&name&hive.groupby.mapaggr.checkinterval&/name&
&value&100000&/value&
&description&Number of rows after which size of the grouping keys/aggregation classes is performed&/description&
&/property&
&property&
&name&hive.map.aggr.hash.percentmemory&/name&
&value&0.5&/value&
&description&Portion of total memory to be used by map-side group aggregation hash table&/description&
&/property&
&property&
&name&hive.mapjoin.followby.map.aggr.hash.percentmemory&/name&
&value&0.3&/value&
&description&Portion of total memory to be used by map-side group aggregation hash table, when this group by is followed by map join&/description&
&/property&
&property&
&name&hive.map.aggr.hash.force.flush.memory.threshold&/name&
&value&0.9&/value&
&description&
The max memory to be used by map-side group aggregation hash table.
If the memory usage is higher than this number, force to flush data
&/description&
&/property&
&property&
&name&hive.map.aggr.hash.min.reduction&/name&
&value&0.5&/value&
&description&
Hash aggregation will be turned off if the ratio between hash
table size and input rows is bigger than this number.
Set to 1 to make sure hash aggregation is never turned off.
&/description&
&/property&
&property&
&name&hive.multigroupby.singlereducer&/name&
&value&true&/value&
&description&
Whether to optimize multi group by query to generate single M/R
job plan. If the multi group by query has
common group by keys, it will be optimized to generate single M/R job.
&/description&
&/property&
&property&
&name&hive.map.groupby.sorted&/name&
&value&false&/value&
&description&
If the bucketing/sorting properties of the table exactly match the grouping key, whether to perform
the group by in the mapper by using BucketizedHiveInputFormat. The only downside to this
is that it limits the number of mappers to the number of files.
&/description&
&/property&
&property&
&name&hive.map.groupby.sorted.testmode&/name&
&value&false&/value&
&description&
If the bucketing/sorting properties of the table exactly match the grouping key, whether to perform
the group by in the mapper by using BucketizedHiveInputFormat. If the test mode is set, the plan
is not converted, but a query property is set to denote the same.
&/description&
&/property&
&property&
&name&hive.groupby.orderby.position.alias&/name&
&value&false&/value&
&description&Whether to enable using Column Position Alias in Group By or Order By&/description&
&/property&
&property&
&name&hive.new.job.grouping.set.cardinality&/name&
&value&30&/value&
&description&
Whether a new map-reduce job should be launched for grouping sets/rollups/cubes.
For a query like: select a, b, c, count(1) from T group by a, b,
4 rows are created per row: (a, b, c), (a, b, null), (a, null, null), (null, null, null).
This can lead to explosion across map-reduce boundary if the cardinality of T is very high,
and map-side aggregation does not do a very good job.
This parameter decides if Hive should add an additional map-reduce job. If the grouping set
cardinality (4 in the example above), is more than this value, a new MR job is added under the
assumption that the original group by will reduce the data size.
&/description&
&/property&
&property&
&name&hive.exec.copyfile.maxsize&/name&
&value&&/value&
&description&Maximum file size (in Mb) that Hive uses to do single HDFS copies between directories.Distributed copies (distcp) will be used instead for bigger files so that copies can be done faster.&/description&
&/property&
&property&
&name&hive.udtf.auto.progress&/name&
&value&false&/value&
&description&
Whether Hive should automatically send progress information to TaskTracker
when using UDTF's to prevent the task getting killed because of inactivity.
Users should be cautious
because this may prevent TaskTracker from killing tasks with infinite loops.
&/description&
&/property&
&property&
&name&hive.default.fileformat&/name&
&value&TextFile&/value&
&description&
Expects one of [textfile, sequencefile, rcfile, orc].
Default file format for CREATE TABLE statement. Users can explicitly override it by CREATE TABLE ... STORED AS [FORMAT]
&/description&
&/property&
&property&
&name&hive.default.fileformat.managed&/name&
&value&none&/value&
&description&
Expects one of [none, textfile, sequencefile, rcfile, orc].
Default file format for CREATE TABLE statement applied to managed tables only. External tables will be
created with format specified by hive.default.fileformat. Leaving this null will result in using hive.default.fileformat
for all tables.
&/description&
&/property&
&property&
&name&hive.query.result.fileformat&/name&
&value&TextFile&/value&
&description&
Expects one of [textfile, sequencefile, rcfile].
Default file format for storing result of the query.
&/description&
&/property&
&property&
&name&hive.fileformat.check&/name&
&value&true&/value&
&description&Whether to check file format or not when loading data files&/description&
&/property&
&property&
&name&hive.default.rcfile.serde&/name&
&value&org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe&/value&
&description&The default SerDe Hive will use for the RCFile format&/description&
&/property&
&property&
&name&hive.default.serde&/name&
&value&org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe&/value&
&description&The default SerDe Hive will use for storage formats that do not specify a SerDe.&/description&
&/property&
&property&
&name&hive.serdes.using.metastore.for.schema&/name&
&value&org.apache.hadoop.hive.ql.io.orc.OrcSerde,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe,org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe,org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe,org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe&/value&
&description&SerDes retriving schema from metastore. This an internal parameter. Check with the hive dev. team&/description&
&/property&
&property&
&name&hive.querylog.location&/name&
&value&/home/sky/hive/iotmp&/value&
&description&Location of Hive run time structured log file&/description&
&/property&
&property&
&name&hive.querylog.enable.plan.progress&/name&
&value&true&/value&
&description&
Whether to log the plan's progress every time a job's progress is checked.
These logs are written to the location specified by hive.querylog.location
&/description&
&/property&
&property&
&name&hive.querylog.plan.progress.interval&/name&
&value&60000ms&/value&
&description&
Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified.
The interval to wait between logging the plan's progress.
If there is a whole number percentage change in the progress of the mappers or the reducers,
the progress is logged regardless of this value.
The actual interval will be the ceiling of (this value divided by the value of
hive.exec.counters.pull.interval) multiplied by the value of hive.exec.counters.pull.interval
I.e. if it is not divide evenly by the value of hive.exec.counters.pull.interval it will be
logged less frequently than specified.
Th}

我要回帖

更多关于 linux hadoop伪分布式 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信