电脑弹出 internal:thstartattack:thcreatethread参数sdata failed是什么意思

HBase 官方文档中文版(转)
16:14:31&&&来源:&&&评论: 点击:
Chapter&11.&性能调优
Table of Contents
11.1.&操作系统
11.1.1.&内存
RAM, RAM, RAM. 不要饿着 HBase.
11.1.2.&64-bit
使用 64-bit 平台(和64-bit JVM).
11.1.3.&交换区
小心交换,将交换区设为0。
11.2.&网络
也许,避免网络问题降低Hadoop和HBase性能的最重要因素就是所使用的交换机硬件。在项目范围内,集群大小翻倍或三倍甚至更多时,早期的决定可能导致主要的问题。
要考虑的重要事项:
设备交换机容量
系统连接数量
11.2.1.&单交换机
单交换机配置最重要的因素,是硬件交换容量,所有系统连接到交换机产生的流量的处理能力。一些低价硬件商品,相对全交换机,具有较低交换能力。
11.2.2.&多交换机
多交换机在系统结构中是潜在陷阱。低价硬件的最常用配置是1Gbps上行连接到另一个交换机。 该常被忽略的窄点很容易成为集群通讯的瓶颈。特别是MapReduce任务通过该上行连接同时读写大量数据时,会导致饱和。
缓解该问题很简单,可以通过多种途径完成:
针对要创建的集群容量,采用合适硬件。
采用更大单交换机配置,如单48 端口相较2x 24 端口为优。
配置上行端口聚合(port trunking)来利用多网络接口增加交换机带宽。(译者注:port trunk:
将交换机上的多个端口在物理上连接起来,在逻辑上捆绑在一起,形成一个拥有较大带宽的端口,组成一个干路,以达到平衡负载和提供备份线路,扩充带宽的目的。
11.2.3.&多机架
多机架配置带来多交换机同样的潜在问题。导致性能降低的原因主要来自两个方面:
较低的交换机容量性能
到其他机架的上行链路不足
如果机架上的交换机有合适交换容量,可以处理所有主机全速通信,那么下一个问题就是如何自动导航更多的交错在机架中的集群。最简单的避免横跨多机架问题的办法,是采用端口聚合来创建到其他机架的捆绑的上行的连接。然而该方法下行侧,是潜在被使用的端口开销。举例:从机架A到机架B创建 8Gbps 端口通道,采用24端口中的8个来和其他机架互通,ROI(投资回报率)很低。采用太少端口意味着不能从集群中传出最多的东西。
机架间采用10Gbe 链接将极大增加性能,确保交换机都支持10Gbe 上行连接或支持扩展卡,后者相对上行连接,允许你节省机器端口。
11.2.4.&网络接口
所有网络接口功能正常吗?你确定?参考故障诊断用例:.
可以从 看起。这个文档讲了一些主要的影响性能的方面:RAM, 压缩, JVM 设置, 等等。然后,可以看看下面的补充内容。
打开RPC-level日志
在区域服务器打开RPC-level的日志对于深度的优化是有好处的。一旦打开,日志将喷涌而出。所以不建议长时间打开,只能看一小段时间。要想启用RPC-level的职责,可以使用区域服务器 UI点击Log Level。将 org.apache.hadoop.ipc 的日志级别设为DEBUG。然后tail 区域服务器的日志,进行分析。
要想关闭,只要把日志级别设为INFO就可以了.
11.3.&Java
11.3.1.&垃圾收集和Apache&HBase
11.3.1.1.&长时间GC停顿
在这个PPT , Todd Lipcon描述了在HBase中常见的两种&世界停止&式的GC操作,尤其是在加载的时候。一种是CMS失败的模式(译者注:CMS是一种GC的算法),另一种是老一代的堆碎片导致的。要想定位第一种,只要将CMS执行的时间提前就可以了,加入-XX:CMSInitiatingOccupancyFraction参数,把值调低。可以先从60%和70%开始(这个值调的越低,触发的GC次数就越多,消耗的CPU时间就越长)。要想定位第二种错误,Todd加入了一个实验性的功能,在HBase 0.90.x中这个是要明确指定的(在0.92.x中,这个是默认项),将你的Configuration中的hbase.hregion.memstore.mslab.enabled设置为true。详细信息,可以看这个PPT. [].&Be aware that when enabled, each MemStore instance will occupy at least an MSLAB instance of memory. If you have thousands of regions or lots of regions each with many column families, this allocation of MSLAB may be responsible for a good portion of your heap allocation and in an extreme case cause you to OOME. Disable MSLAB in this case, or lower the amount of memory it uses or float less regions per server.
GC日志的更多信息,参考&.
11.4.&配置
11.4.1.&Regions的数目
HBase中region的数目可以根据调整.也可以参见
11.4.2.&管理紧缩
对于大型的系统,你需要考虑管理
11.4.3.&hbase.regionserver.handler.count
参见.这个参数的本质是设置一个RegsionServer可以同时处理多少请求。 如果定的太高,吞吐量反而会降低;如果定的太低,请求会被阻塞,得不到响应。你可以读Log,来决定对于你的集群什么值是合适的。(请求队列也是会消耗内存的)
11.4.4.&hfile.block.cache.size
参见 . 对于区域服务器进程的内存设置。
11.4.5.&hbase.regionserver.global.memstore.upperLimit
参见 . 这个内存设置是根据区域服务器的需要来设定。
11.4.6.&hbase.regionserver.global.memstore.lowerLimit
参见 . 这个内存设置是根据区域服务器的需要来设定。
11.4.7.&hbase.hstore.blockingStoreFiles
参见. 如果在区域服务器的Log中block,提高这个值是有帮助的。
11.4.8.&hbase.hregion.memstore.block.multiplier
参见 . 如果有足够的RAM,提高这个值。
11.5.&ZooKeeper
配置ZooKeeper信息,请参考&& , 参看关于使用专用磁盘部分。
11.6.&模式设计
11.6.1.& 列族的数目
11.6.2.&键和属性长度
参考&. 参考 &&获取压缩申请终止( compression caveats)
11.6.3.&表的区域大小
区域大小可以通过基于每张表设置,当某些表需要与缺省设置的区域大小不同时,通过&&的setFileSize&的事件设置。
参考&&获取更多信息。
11.6.4.&布隆过滤(Bloom Filters)
布隆过滤可以每列族单独启用。使用&HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL)&对列族单独启用布隆。 Default =&NONE&没有布隆过滤。对&ROW,行键的哈希在每次插入行时将被添加到布隆。对&ROWCOL,行键 + 列族 + 列族修饰的哈希将在每次插入行时添加到布隆。
参考&&和&&获取更多信息。
11.6.5.&列族块大小
The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
参考&&and&获取更多信息。
11.6.6.&内存中的列族
ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the highest priority in the&, but it is not a guarantee that the entire table will be in memory.
参考&&获取更多信息。
11.6.7.&压缩
生产系统应该采用列族压缩定义。 参考&&获取更多信息。
11.6.7.1.&然而...
Compression deflates data&on disk. When it's in-memory (e.g., in the MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
参考&&on for schema design tips, and&&for more information on HBase stores data internally.
11.7.&HBase 通用模式
11.7.1.&常量
人们刚开始使用HBase时,趋向于写如下的代码:
Get get = new Get(rowkey);
Result r = htable.get(get);
byte[] b = r.getValue(Bytes.toBytes(&cf&), Bytes.toBytes(&attr&));
// returns current version of value
然而,特别是在循环内(和 MapReduce 工作内), 将列族和列名转为字节数组代价昂贵。最好使用字节数组常量,如下:
public static final byte[] CF = &cf&.getBytes();
public static final byte[] ATTR = &attr&.getBytes();
Get get = new Get(rowkey);
Result r = htable.get(get);
byte[] b = r.getValue(CF, ATTR);
// returns current version of value
11.8.&写到 HBase
11.8.1.&批量装载
如果可以的话,尽量使用批量导入工具,参见 .否则就要详细看看下面的内容。
11.8.2.&表创建: 预创建区域(Region)
默认情况下HBase创建表会新建一个区域。执行批量导入,意味着所有的client会写入这个区域,直到这个区域足够大,以至于分裂。一个有效的提高批量导入的性能的方式,是预创建空的区域。最好稍保守一点,因为过多的区域会实实在在的降低性能。下面是一个预创建区域的例子。 (注意:这个例子里需要根据应用的key进行调整。):
public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
admin.createTable( table, splits );
} catch (TableExistsException e) {
(&table & + table.getNameAsString() + & already exists&);
// the table already exists...
public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
byte[][] splits = new byte[numRegions-1][];
BigInteger lowestKey = new BigInteger(startKey, 16);
BigInteger highestKey = new BigInteger(endKey, 16);
BigInteger range = highestKey.subtract(lowestKey);
BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
lowestKey = lowestKey.add(regionIncrement);
for(int i=0; i & numRegions-1;i++) {
BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
byte[] b = String.format(&%016x&, key).getBytes();
splits[i] =
11.8.3.& 表创建: 延迟log刷写
Puts的缺省行为使用 Write Ahead Log (WAL),会导致 HLog 编辑立即写盘。如果采用延迟刷写,WAL编辑会保留在内存中,直到刷写周期来临。好处是集中和异步写HLog,潜在问题是如果RegionServer退出,没有刷写的日志将丢失。但这也比Puts时不使用WAL安全多了。
延迟log刷写可以通过
在表上设置,hbase.regionserver.optionallogflushinterval缺省值是1000ms.
11.8.4.&HBase 客户端: 自动刷写
当你进行大量的Put的时候,要确认你的的setAutoFlush是关闭着的。否则的话,每执行一个Put就要想区域服务器发一个请求。通过 htable.add(Put) 和
htable.add( &List& Put)来将Put添加到写缓冲中。如果 autoFlush = false,要等到写缓冲都填满的时候才会发起请求。要想显式的发起请求,可以调用flushCommits。在HTable实例上进行的close操作也会发起flushCommits
11.8.5.&HBase 客户端: 在Puts上关闭WAL
一个经常讨论的在Puts上增加吞吐量的选项是调用 writeToWAL(false)。关闭它意味着 RegionServer 不再将 Put 写到 Write Ahead Log, 仅写到内存。然而后果是如果出现 RegionServer 失败,将导致数据丢失。如果调用 writeToWAL(false) ,需保持高度警惕。你会发现实际上基本没有不同,如果你的负载很好的分布到集群中。
通常而言,最好对Puts使用WAL, 而增加负载吞吐量与使用
替代技术有关。
11.8.6.&HBase 客户端: RegionServer 成组写入
In addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush. There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own verison for those still on 0.90.x or earlier.
11.8.7.&MapReduce: 跳过 Reducer
When writing a lot of data to an HBase table from a MR job (e.g., with ), and specifically where Puts are being emitted from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case.
11.8.8.&Anti-Pattern: One Hot Region
If all your data is being written to one region at a time, then re-read the section on processing
Also, if you are pre-splitting regions and all your data is still winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a variety of reasons that regions may appear &well split& but won't work with your data. As the HBase client communicates directly with the RegionServers, this can be obtained via .
参考 , as well as
11.9. 从HBase读
11.9.1.&Scan 缓存
如果HBase的输入源是一个MapReduce Job,要确保输入的的setCaching值要比默认值0要大。使用默认值就意味着map-task每一行都会去请求一下region-server。可以把这个值设为500,这样就可以一次传输500行。当然这也是需要权衡的,过大的值会同时消耗客户端和服务端很大的内存,不是越大越好。
11.9.1.1.&Scan Caching in MapReduce Jobs
Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException) in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes), then set caching lower.
Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the processing that is often performed in MapReduce jobs tends to exacerbate this issue.
11.9.2.&Scan 属性选择
当Scan用来处理大量的行的时候(尤其是作为MapReduce的输入),要注意的是选择了什么字段。如果调用了 scan.addFamily,这个列族的所有属性都会返回。如果只是想过滤其中的一小部分,就指定那几个column,否则就会造成很大浪费,影响性能。
11.9.3.&MapReduce - 输入分割
For MapReduce jobs that use HBase tables as a source, if there a pattern where the &slow& map tasks seem to have the same Input Split (i.e., the RegionServer serving the data), see the Troubleshooting Case Study in .
11.9.4.&关闭 ResultScanners
这与其说是提高性能,倒不如说是避免发生性能问题。如果你忘记了关闭,会导致RegionServer出现问题。所以一定要把ResultScanner包含在try/catch 块中...
Scan scan = new Scan();
// set attrs...
ResultScanner rs = htable.getScanner(scan);
for (Result r = rs.next(); r != r = rs.next()) {
// process result...
} finally {
rs.close();
// always close the ResultScanner!
htable.close();
11.9.5.&块缓存
实例可以在RegionServer中使用块缓存,可以由setCacheBlocks方法控制。如果Scan是MapReduce的输入源,要将这个值设置为 false。对于经常读到的行,就建议使用块缓冲。
11.9.6.& 行键的负载优化
当一个表的时候, 如果仅仅需要行键(不需要no families, qualifiers, values 和 timestamps),在加入FilterList的时候,要使用Scanner的setFilter方法的时候,要填上MUST_PASS_ALL操作参数(译者注:相当于And操作符)。一个FilterList要包含一个
和一个 .通过这样的filter组合,就算在最坏的情况下,RegionServer只会从磁盘读一个值,同时最小化客户端的网络带宽占用。
11.9.7.&并发: 监测数据扩散
当优化大量读取时,监测数据扩散到目标表。如果目标表含区域太少,读取时感觉像只有很少节点提供服务一样。
11.9.8.&布隆过滤(Bloom Filters)
启用布隆过滤可以节省必须读磁盘过程,可以有助于改进读取延迟。
&在&中开发。[][]
11.9.8.1.&Bloom StoreFile footprint
Bloom filters add an entry to the&StoreFile&general&FileInfo&data structure and then two extra entries to the&StoreFilemetadata section.
11.9.8.1.1.&BloomFilter in the&StoreFile&FileInfo&data structure
FileInfo&has a&BLOOM_FILTER_TYPE&entry which is set to&NONE,&ROW&or&ROWCOL.
11.9.8.1.2.&BloomFilter entries in&StoreFile&metadata
BLOOM_FILTER_META&holds Bloom Size, Hash Function used, etc. Its small in size and is cached on&StoreFile.Reader&load
BLOOM_FILTER_DATA&is the actual bloomfilter data. Obtained on-demand. Stored in the LRU cache, if it is enabled (Its enabled by default).
11.9.8.2.&布隆过滤(Bloom Filter) 配置
11.9.8.2.1.&io.hfile.bloom.enabled&全局杀死开关
配置文件的io.hfile.bloom.enabled 是一个当出错时的杀死开关。Default =&true.
11.9.8.2.2.&io.hfile.bloom.error.rate
io.hfile.bloom.error.rate&= 平均误报率( average false positive rate ). 缺省 = 1%. 降低率为 ½ (如 .5%) == +1 位每布隆入口。
11.9.8.2.3.&io.hfile.bloom.max.fold
io.hfile.bloom.max.fold&= 保证最小折叠速率(guaranteed minimum fold rate). 大多时候不要管. Default = 7, 或压缩到原来大小的至少 1/128. 想获取更多本选项的意义,参看本文档&开发进程&节&
11.10.&从HBase删除
11.10.1.&将 HBase 表当 Queues用
HBase tables are sometimes used as queues. In this case, special care must be taken to regularly perform major compactions on tables used in this manner. As is documented in&, marking rows as deleted creates additional StoreFiles which then need to be processed on reads. Tombstones only get cleaned up with major compactions.
参考 & 和&.
11.10.2.&删除的 RPC 行为
Be aware that&htable.delete(Delete)&doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation. For a large number of deletes, consider&htable.delete(List).
11.11.&HDFS
由于 HBase 在&&上运行,it is important to understand how it works and how it affects HBase.
11.11.1.&Current Issues With Low-Latency Reads
The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority. With the increased adoption of HBase this is changing, and several improvements are already in development. 参考 the&.
11.11.2.&Leveraging local data
Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via&, it is possible for the DFSClient to take a &short circuit& and read directly from disk instead of going through the DataNode when the data is local. What this means for HBase is that the RegionServers can read directly off their machine's disks instead of having to open a socket to talk to the DataNode, the former being generally much faster[]. Also see&&thread for more discussion around short circuit reads.
To enable &short circuit& reads, you must set two configurations. First, the hdfs-site.xml needs to be amended. Set the property&dfs.block.local-path-access.user&to be the&only&user that can use the shortcut. This has to be the user that started HBase. Then in hbase-site.xml, set&dfs.client.read.shortcircuit&to be&true
For optimal performance when short-circuit reads are enabled, it is recommended that HDFS checksums are disabled. To maintain data integrity with HDFS checksums disabled, HBase can be configured to write its own checksums into its datablocks and verify against these. See&.
The DataNodes need to be restarted in order to pick up the new configuration. Be aware that if a process started under another username than the one configured here also has the shortcircuit enabled, it will get an Exception regarding an unauthorized access but the data will still be read.
11.11.3.&Performance Comparisons of HBase vs. HDFS
A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS will always be faster in this use-case.
11.12.&Amazon EC2
Performance questions are common on Amazon EC2 environments because it is a shared environment. You will not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same reason (i.e., it's a shared environment and you don't know what else is happening on the server).
If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that because EC2 issues are practically a separate class of performance issues.
11.13.&Case Studies
For Performance and Troubleshooting Case Studies, see&.
[]&The latest jvms do better regards fragmentation so make sure you are running a recent release. Read down in the message,.
[]&For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the&Development Process&section of the document&&attached to&.
[]&The bloom filters described here are actually version two of blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the&. The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well. Version 2 is a rewrite from scratch though again it starts with the one-lab work.
[]&See JD's&
Chapter&12.&HBase的故障排除和Debug
Table of Contents
12.1.&一般准则
总是先从主服务器的日志开始(TODO: 哪些行?)。通常情况下,他总是一行一行的重复信息。如果不是这样,说明有问题,可以Google或是用来搜索遇到的异常。
错误很少仅仅单独出现在HBase中,通常是某一个地方出了问题,引起各处大量异常和调用栈跟踪信息。遇到这样的错误,最好的办法是往上查日志,找到最初的异常。例如区域服务器会在退出的时候打印一些度量信息。Grep这个转储 应该可以找到最初的异常信息。
区域服务器的自杀是很&正常&的。当一些事情发生错误的,他们就会自杀。如果ulimit和xcievers(最重要的两个设定,详见)没有修改,HDFS将无法运转正常,在HBase看来,HDFS死掉了。假想一下,你的MySQL突然无法访问它的文件系统,他会怎么做。同样的事情会发生在HBase和HDFS上。还有一个造成区域服务器切腹自杀的常见的原因是,他们执行了一个长时间的GC操作,这个时间超过了ZooKeeper的会话时长。关于GC停顿的详细信息,参见Todd Lipcon的
by Todd Lipcon 和上面的 .
12.2.&Logs
重要日志的位置( &user&是启动服务的用户,&hostname& 是机器的名字)
NameNode: $HADOOP_HOME/logs/hadoop-&user&-namenode-&hostname&.log
DataNode: $HADOOP_HOME/logs/hadoop-&user&-datanode-&hostname&.log
JobTracker: $HADOOP_HOME/logs/hadoop-&user&-jobtracker-&hostname&.log
TaskTracker: $HADOOP_HOME/logs/hadoop-&user&-jobtracker-&hostname&.log
HMaster: $HBASE_HOME/logs/hbase-&user&-master-&hostname&.log
RegionServer: $HBASE_HOME/logs/hbase-&user&-regionserver-&hostname&.log
ZooKeeper: TODO
12.2.1.&Log 位置
对于单节点模式,Log都会在一台机器上,但是对于生产环境,都会运行在一个集群上。
12.2.1.1.&NameNode
NameNode的日志在NameNode server上。HBase Master 通常也运行在NameNode server上,ZooKeeper通常也是这样。
对于小一点的机器,JobTracker也通常运行在NameNode server上面。
12.2.1.2.&DataNode
每一台DataNode server有一个HDFS的日志,Region有一个HBase日志。
每个DataNode server还有一份TaskTracker的日志,来记录MapReduce的Task信息。
12.2.2.&日志级别
12.2.2.1.&启用 RPC级别日志
Enabling the RPC-level logging on a RegionServer can often given insight on timings at the server. Once enabled, the amount of log spewed is voluminous. It is not recommended that you leave this logging on for more than short bursts of time. To enable RPC-level logging, browse to the RegionServer UI and click on Log Level. Set the log level to DEBUG for the package org.apache.hadoop.ipc (Thats right, for hadoop.ipc, NOT, hbase.ipc). Then tail the RegionServers log. Analyze.
To disable, set the logging level back to INFO level.
12.2.3.&JVM 垃圾收集日志
HBase is memory intensive, and using the default GC you can see long pauses in all threads including the Juliet Pause aka &GC of Death&. To help debug this or confirm this is happening GC logging can be turned on in the Java virtual machine.
To enable, in hbase-env.sh add:
export HBASE_OPTS=&-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/hbase/logs/gc-hbase.log&
Adjust the log directory to wherever you log. Note: The GC log does NOT roll automatically, so you'll have to keep an eye on it so it doesn't fill up the disk.
At this point you should see logs like so:
: [GC [1 CMS-initial-mark: 55704K)] 61272K), 0.0007360 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
: [CMS-concurrent-mark-start]
: [GC : [ParNew: 5567K-&576K(5568K), 0.0101110 secs] 2817105K-&61272K), 0.0102200 secs] [Times: user=0.07 sys=0.00, real=0.01 secs]
In this section, the first line indicates a 0.0007360 second pause for the CMS to initially mark. This pauses the entire VM, all threads for that period of time.
The third line indicates a &minor GC&, which pauses the VM for 0.0101110 seconds - aka 10 milliseconds. It has reduced the &ParNew& from about 5.5m to 576k. Later on in this cycle we see:
: [CMS-concurrent-mark: 1.542/2.492 secs] [Times: user=10.49 sys=0.33, real=2.49 secs]
: [CMS-concurrent-preclean-start]
: [GC : [ParNew: 5505K-&573K(5568K), 0.0062440 secs] 2868746K-&61272K), 0.0063360 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
: [GC : [ParNew: 5563K-&575K(5568K), 0.0072510 secs] 2869283K-&61272K), 0.0073320 secs] [Times: user=0.05 sys=0.01, real=0.01 secs]
: [GC : [ParNew: 5517K-&573K(5568K), 0.0120390 secs] 2869780K-&61272K), 0.0121150 secs] [Times: user=0.09 sys=0.00, real=0.01 secs]
: [GC : [ParNew: 5507K-&569K(5568K), 0.0086240 secs] 2870200K-&61272K), 0.0087180 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
: [GC : [ParNew: 5516K-&575K(5568K), 0.0107130 secs] 2870689K-&61272K), 0.0107820 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
: [CMS-concurrent-preclean: 0.070/0.133 secs] [Times: user=0.48 sys=0.01, real=0.14 secs]
: [CMS-concurrent-abortable-preclean-start]
: [GC : [ParNew: 5504K-&571K(5568K), 0.0087270 secs] 2871220K-&61272K), 0.0088220 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
: [GC : [ParNew: 5512K-&569K(5568K), 0.0063370 secs] 2871771K-&61272K), 0.0064230 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
: [CMS-concurrent-abortable-preclean: 0.007/0.037 secs] [Times: user=0.13 sys=0.00, real=0.03 secs]
: [GC[YG occupancy: 645 K (5568 K)]: [Rescan (parallel) , 0.0020210 secs]: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 55704K)] 61272K), 0.0049380 secs] [Times: user=0.00 sys=0.01, real=0.01 secs]
: [CMS-concurrent-sweep-start]
The first line indicates that the CMS concurrent mark (finding garbage) has taken 2.4 seconds. But this is a _concurrent_ 2.4 seconds, Java has not been paused at any point in time.
There are a few more minor GCs, then there is a pause at the 2nd last line:
: [GC[YG occupancy: 645 K (5568 K)]: [Rescan (parallel) , 0.0020210 secs]: [weak refs processing, 0.0027950 secs] [1 CMS-remark: 55704K)] 61272K), 0.0049380 secs] [Times: user=0.00 sys=0.01, real=0.01 secs]
The pause here is 0.0049380 seconds (aka 4.9 milliseconds) to 'remark' the heap.
At this point the sweep starts, and you can watch the heap size go down:
: [GC : [ParNew: 5501K-&569K(5568K), 0.0097350 secs] 2871958K-&61272K), 0.0098370 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
lines removed ...
: [GC : [ParNew: 5532K-&568K(5568K), 0.0070720 secs] 1365024K-&61272K), 0.0071930 secs] [Times: user=0.05 sys=0.00, real=0.01 secs]
: [CMS-concurrent-sweep: 2.030/3.332 secs] [Times: user=9.57 sys=0.26, real=3.33 secs]
At this point, the CMS sweep took 3.332 seconds, and heap went from about ~ 2.8 GB to 1.3 GB (approximate).
The key points here is to keep all these pauses low. CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.
This can be due to the size of the ParNew, which should be relatively small. If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if its too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.
Add this to HBASE_OPTS:
export HBASE_OPTS=&-XX:NewSize=64m -XX:MaxNewSize=64m &cms options from above& &gc logging options from above&&
For more information on GC pauses, see the
by Todd Lipcon and
12.3.&资源
&索引了全部邮件列表,很适合做历史检索。有问题时先在这里查询,因为别人可能已经遇到过你的问题。
12.3.2.&邮件列表
Ask a question on the&. The 'dev' mailing list is aimed at the community of developers actually building HBase and for features currently under development, and 'user' is generally used for questions on released versions of HBase. Before going to the mailing list, make sure your question has not already been answered by searching the mailing list archives first. Use&. Take some time crafting your question[]; a quality question that includes all context and exhibits evidence the author has tried to find answers in the manual and out on lists is more likely to get a prompt response.
12.3.3.&IRC
#hbase on irc.freenode.net
12.3.4.&JIRA
&在处理 Hadoop/HBase相关问题时也很有帮助。
12.4.&工具
12.4.1.1.&主服务器Web接口
主服务器启动了一个缺省端口是 60010的web接口。
The Master web UI lists created tables and their definition (e.g., ColumnFamilies, blocksize, etc.). Additionally, the available RegionServers in the cluster are listed along with selected high-level metrics (requests, number of regions, usedHeap, maxHeap). The Master web UI allows navigation to each RegionServer's web UI.
12.4.1.2.&区域服务器Web接口
区域服务器启动了一个缺省端口是 60030的web接口。
The RegionServer web UI lists online regions and their start/end keys, as well as point-in-time RegionServer metrics (requests, regions, storeFileIndexSize, compactionQueueSize, etc.).
参考&&获取更多度量信息。
12.4.1.3.&zkcli
zkcli是一个研究 ZooKeeper相关问题的有用工具。调用:
./hbase zkcli -server host:port &cmd& &args&
命令 (和参数) :
connect host:port
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
printwatches on|off
create [-s] [-e] path data acl
stat path [watch]
ls2 path [watch]
listquota path
setAcl path acl
getAcl path
redo cmdno
addauth scheme auth
delete path [version]
setquota -n|-b val path
12.4.2.&外部工具
12.4.2.1&tail
tail是一个命令行工具,可以用来看日志的尾巴。加入的&-f&参数后,就会在数据更新的时候自己刷新。用它来看日志很方便。例如,一个机器需要花很多时间来启动或关闭,你可以tail他的master log(也可以是region server的log)。
12.4.2.2&top
top是一个很重要的工具来看你的机器各个进程的资源占用情况。下面是一个生产环境的例子:
top - 14:46:59 up 39 days, 11:55,
load average: 3.75, 3.57, 3.84
Tasks: 309 total,
1 running, 308 sleeping,
0 stopped,
0.0%ni, 91.7%id,
k total, k used,
117476k free,
7196k buffers
Swap: k total, 14348k used, k free, k cached
SHR S %CPU %MEM TIME+
15558 hadoop 18
-2 g 3556 S
6523:52 java
13268 hadoop 18
-2 g 4104 S
5170:30 java
8895 hadoop 18
-2 m 3420 S
4002:32 java
这里你可以看到系统的load average在最近5分钟是3.75,意思就是说这5分钟里面平均有3.75个线程在CPU时间的等待队列里面。通常来说,最完美的情况是这个值和CPU和核数相等,比这个值低意味着资源闲置,比这个值高就是过载了。这是一个重要的概念,要想理解的更多,可以看这篇文章 .
处理负载,我们可以看到系统已经几乎使用了他的全部RAM,其中大部分都是用于OS cache(这是一件好事).Swap只使用了一点点KB,这正是我们期望的,如果数值很高的话,就意味着在进行交换,这对Java程序的性能是致命的。另一种检测交换的方法是看Load average是否过高(load average过高还可能是磁盘损坏或者其它什么原因导致的)。
默认情况下进程列表不是很有用,我们可以看到3个Java进程使用了111%的CPU。要想知道哪个进程是什么,可以输入&c&,每一行就会扩展信息。输入&1&可以显示CPU的每个核的具体状况。
12.4.2.3 &jps
jps是JDK集成的一个工具,可以用来看当前用户的Java进程id。(如果是root,可以看到所有用户的id),例如:
hadoop@sv4borg12:~$ jps
1322 TaskTracker
17789 HRegionServer
27862 Child
1158 DataNode
25115 HQuorumPeer
19750 ThriftServer
Hadoop TaskTracker,管理本地的Task
HBase RegionServer,提供region的服务
Child, 一个 MapReduce task,无法看出详细类型
Hadoop DataNode, 管理blocks
HQuorumPeer, ZooKeeper集群的成员
Jps, 就是这个进程
ThriftServer, 当thrif启动后,就会有这个进程
jmx, 这个是本地监控平台的进程。你可以不用这个。
你可以看到这个进程启动是全部命令行信息。
hadoop@sv4borg12:~$ ps aux | grep HRegionServer
1.2 4364 ?
Mar04 9855:48 /usr/java/jdk1.6.0_14/bin/java -Xmx8000m -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/export1/hadoop/logs/gc-hbase.log -Dcom.sun.management.jmxremote.port=10102 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=/home/hadoop/hbase/conf/jmxremote.password -Dcom.sun.management.jmxremote -Dhbase.log.dir=/export1/hadoop/logs -Dhbase.log.file=hbase-hadoop-regionserver-sv4borg12.log -Dhbase.home.dir=/home/hadoop/hbase -Dhbase.id.str=hadoop -Dhbase.root.logger=INFO,DRFA -Djava.library.path=/home/hadoop/hbase/lib/native/Linux-amd64-64 -classpath /home/hadoop/hbase/bin/../conf:[many jars]:/home/hadoop/hadoop/conf org.apache.hadoop.hbase.regionserver.HRegionServer start
12.4.2.4 &jstack
jstack 是一个最重要(除了看Log)的java工具,可以看到具体的Java进程的在做什么。可以先用Jps看到进程的Id,然后就可以用jstack。他会按线程的创建顺序显示线程的列表,还有这个线程在做什么。下面是例子:
这个主线程是一个RegionServer正在等master返回什么信息。
&regionserver60020& prio=10 tid=0xab4000 nid=0x45cf waiting on condition [0xa907f16b6a96a70]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for
&0xc2f30& (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:395)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:647)
at java.lang.Thread.run(Thread.java:619)
The MemStore flusher thread that is currently flushing to a file:
&regionserver60020.cacheFlusher& daemon prio=10 tid=0xf4e000 nid=0x45eb in Object.wait() [0xb807f16b5b87af0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:803)
- locked &0xb3a8& (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
at $plete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $plete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3390)
- locked &0xb470& (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3304)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:650)
at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:853)
at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:467)
- locked &0xe6f08& (a java.lang.Object)
at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:427)
at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:80)
at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1359)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:907)
at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:834)
at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:786)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:250)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:224)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
一个处理线程是在等一些东西(例如put, delete, scan...):
&IPC Server handler 16 on 60020& daemon prio=10 tid=0xd800 nid=0x4a5e waiting on condition [0x00007f16afefd000..0x00007f16afefd9f0]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for
&0xf8dd8& (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1013)
有一个线程正在忙,在递增一个counter(这个阶段是正在创建一个scanner来读最新的值):
&IPC Server handler 66 on 60020& daemon prio=10 tid=0xe800 nid=0x4a90 runnable [0x00007f16acb707f16acb77cf0]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.&init&(KeyValueHeap.java:56)
at org.apache.hadoop.hbase.regionserver.StoreScanner.&init&(StoreScanner.java:79)
at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1202)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.&init&(HRegion.java:2209)
at org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1063)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1055)
at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1039)
at org.apache.hadoop.hbase.regionserver.HRegion.getLastIncrement(HRegion.java:2875)
at org.apache.hadoop.hbase.regionserver.HRegion.incrementColumnValue(HRegion.java:2978)
at org.apache.hadoop.hbase.regionserver.HRegionServer.incrementColumnValue(HRegionServer.java:2433)
at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:560)
at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1027)
还有一个线程在从HDFS获取数据。
&IPC Client (47) connection to sv4borg9/10.4.24.40:9000 from hadoop& daemon prio=10 tid=0xd0000 nid=0x4fa3 runnable [0xd000..0xdbf0]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked &0xb68c00& (a sun.nio.ch.Util$1)
- locked &0xb68be8& (a java.util.Collections$UnmodifiableSet)
- locked &0x9b50& (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:304)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked &0x9178& (a java.io.BufferedInputStream)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:569)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:477)
这里是一个RegionServer死了,master正在试着恢复。
&LeaseChecker& daemon prio=10 tid=0xef800 nid=0x76cd waiting on condition [0xeae07f6d0eae2a70]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.hadoop.ipc.Client.call(Client.java:726)
- locked &0xcd28f80& (a org.apache.hadoop.ipc.Client$Call)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy1.recoverBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2636)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.&init&(DFSClient.java:2832)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:529)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:186)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:530)
at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:619)
at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1322)
at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
at org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:648)
at org.apache.hadoop.hbase.master.HMaster.joinCluster(HMaster.java:572)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:503)
12.4.2.5 &OpenTSDB
是一个Ganglia的很好的替代品,因为他使用HBase来存储所有的时序而不需要采样。使用OpenTSDB来监控你的HBase是一个很好的实践
这里有一个例子,集群正在同时进行上百个紧缩,严重影响了IO性能。(TODO: 在这里插入compactionQueueSize的图片)
给集群构建一个图表监控是一个很好的实践。包括集群和每台机器。这样就可以快速定位到问题。例如,在StumbleUpon,每个机器有一个图表监控,包括OS和HBase,涵盖所有的重要的信息。你也可以登录到机器上,获取更多的信息。
12.4.2.6 &clusterssh+top
clusterssh+top,感觉是一个穷人用的监控系统,但是他确实很有效,当你只有几台机器的是,很好设置。启动clusterssh后,你就会每台机器有个终端,还有一个终端,你在这个终端的操作都会反应到其他的每一个终端上。 这就意味着,你在一天机器执行&top&,集群中的所有机器都会给你全部的top信息。你还可以这样tail全部的log,等等。
12.5.&客户端
HBase 客户端的更多信息, 参考 .
当从客户端到RegionServer的RPC请求超时。例如如果Scan.setCacheing的值设置为500,RPC请求就要去获取500行的数据,每500次.next()操作获取一次。因为数据是以大块的形式传到客户端的,就可能造成超时。将这个 serCacheing的值调小是一个解决办法,但是这个值要是设的太小就会影响性能。
12.5.2.&普通操作时,Shell 或客户端应用抛出很多不太重要的异常
Since 0.20.0 the default log level for org.apache.hadoop.hbase.*is DEBUG.
On your clients, edit $HBASE_HOME/conf/log4j.properties and change this: log4j.logger.org.apache.hadoop.hbase=DEBUG to this: log4j.logger.org.apache.hadoop.hbase=INFO, or even log4j.logger.org.apache.hadoop.hbase=WARN.
12.5.3.&压缩时客户端长时暂停
这是一个在HBase区列表中经常被问的问题。场景一般是客户端正在插入大量数据到相对未优化的HBase集群中时发生。压缩正好加重暂停,尽管这不是问题源头。
,关于预先创建区域的模式部分,并确认表没有在单个区域中启动。
获取集群配置相关信息, 特别是 hbase.hstore.blockingStoreFiles, hbase.hregion.memstore.block.multiplier, MAX_FILESIZE (region size), 和 MEMSTORE_FLUSHSIZE.
A slightly longer explanation of why pauses can happen is as follows: Puts are sometimes blocked on the MemStores which are blocked by the flusher thread which is blocked because there are too many files to compact because the compactor is given too many small files to compact and has to compact the same data repeatedly. This situation can occur even with minor compactions. Compounding this situation, HBase doesn't compress data in memory. Thus, the 64MB that lives in the MemStore could become a 6MB file after compression - which results in a smaller StoreFile. The upside is that more data is packed into the same region, but performance is achieved by being able to write larger files - which is why HBase waits until the flushize before writing a new StoreFile. And smaller StoreFiles become targets for compaction. Without compression the files are much bigger and don't need as much compaction, however this is at the expense of I/O.
For additional information, see this thread on .
12.5.4.&ZooKeeper 客户端连接错误
错误类似于...
11/07/05 11:26:41 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/07/05 11:26:43 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
11/07/05 11:26:44 WARN zookeeper.ClientCnxn: Session 0x0 for server null,
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11/07/05 11:26:45 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
... 要么是 ZooKeeper 不在了,或网络不可达问题。
可以帮助调查 ZooKeeper 问题。
12.5.5.&客户端内存耗尽,但堆大小看起来不太变化( off-heap/direct heap 在增长)
You are likely running into the issue that is described and worked through in the mail thread HBase, mail # user - Suspected memory leak and continued over in HBase, mail # dev - FeedbackRe: Suspected memory leak. A workaround is passing your client-side JVM a reasonable value for -XX:MaxDirectMemorySize. By default, the MaxDirectMemorySize is equal to your -Xmx max heapsize setting (if -Xmx is set). Try seting it to something smaller (for example, one user had success setting it to 1g when they had a client-side heap of 12g). If you set it too small, it will bring on FullGCs so keep it a bit hefty. You want to make this setting client-side only especially if you are running the new experiemental server-side off-heap cache since this feature depends on being able to use big direct buffers (You may have to keep separate client-side and server-side config dirs).
12.5.6.&客户端变慢,在调用管理方法(flush, compact, 等)时发生
该客户端问题在
版本 0.90.6中修订。 客户端的ZooKeeper 内存泄露,而客户端被管理API的额外调用产生的ZooKeeper事件连续调用。
12.5.7.&安全客户端不能连接 ([由 GSSException 引起: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)])
There can be several causes that produce this symptom.
First, check that you have a valid Kerberos ticket. One is required in order to set up communication with a secure Apache HBase cluster. Examine the ticket currently in the credential cache, if any, by running the klist command line utility. If no ticket is listed, you must obtain a ticket by running the kinit command with either a keytab specified, or by interactively entering a password for the desired principal.
Then, consult the&. The most common problem addressed there is resolved by setting javax.security.auth.useSubjectCredsOnly system property value to false.
Because of a change in the format in which MIT Kerberos writes its credentials cache, there is a bug in the Oracle JDK 6 Update 26 and earlier that causes Java to be unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 or higher. If you have this problematic combination of components in your environment, to work around this problem, first log in with kinit and then immediately refresh the credential cache with kinit -R. The refresh will rewrite the credential cache without the problematic formatting.
Finally, depending on your Kerberos configuration, you may need to install the&, or JCE. Insure the JCE jars are on the classpath on both server and client systems.
You may also need to download the&. Uncompress and extract the downloaded file, and install the policy jars into &java-home&/lib/security.
12.6.&MapReduce
12.6.1.&你认为自己在用集群, 实际上你在用本地(Local)
如下的调用栈在使用 ImportTsv时发生,但同样的事可以在错误配置的任何任务中发生。
WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:111)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.&init&(MapTask.java:560)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:383)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:776)
at org.apache.hadoop.io.SequenceFile$Reader.&init&(SequenceFile.java:1424)
at org.apache.hadoop.io.SequenceFile$Reader.&init&(SequenceFile.java:1419)
at org.apache.hadoop.hbase.mapreduce.hadoopbackport.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:296)
.. 看到调用栈的关键部分了吗?就是...
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
LocalJobRunner 意思就是任务跑在本地,不在集群。
for more information on HBase MapReduce jobs and classpaths.
12.7.&NameNode
NameNode 更多信息, 参考 .
12.7.1.&表和区域的HDFS 工具
要确定HBase 用了HDFS多大空间,可在NameNode使用 hadoop shell命令,例如...
hadoop fs -dus /hbase/
...返回全部HBase对象磁盘占用的情况。
hadoop fs -dus /hbase/myTable
...返回HBase表'myTable'磁盘占用的情况。
hadoop fs -du /hbase/myTable
...返回HBase的'myTable'表的各区域列表的磁盘占用情况。
更多关于 HDFS shell 命令的信息,参考 .
12.7.2.&浏览 HDFS ,查看 HBase 对象
有时需要浏览HDFS上的 HBase对象 。对象包括WALs (Write Ahead Logs), 表,区域,存储文件等。最简易的方法是在NameNode web应用中查看,端口 50070。NameNode web 应用提供到集群中所有 DataNode 的链接,可以无缝浏览。
存储在HDFS集群中的HBase表的目录结构是...
(Tables in the cluster)
(Regions for the table)
/&ColumnFamiy&
(ColumnFamilies for the Region for the table)
/&StoreFile&
(StoreFiles for the ColumnFamily for the Regions for the table)
HDFS 中的HBase WAL目录结构是..
/&RegionServer&
(RegionServers)
(WAL HLog files for the RegionServer)
参考 获取其他非Shell诊断工具如fsck.
12.7.2.1.&用例
查询HDFS,获取HBase对象的两个通常用例是研究未紧缩表的程度。如果每个列族有大量存储文件(StoreFile),这表示需要主紧缩。另外,在主紧缩之后,如果存储文件的比较小,意味着表的列族要减少。
12.8.&网络
12.8.1.&网络峰值(Network Spikes)
如果看到周期性网络峰值,你可能需要检查compactionQueues,是不是主紧缩正在进行。
参考&&,获取更多管理紧缩的信息。
12.8.2.&回环IP(Loopback IP)
HBase 希望回环 IP 地址是 127.0.0.1. 参考开始章节&.
12.8.3.&网络接口
所有网络接口是否正常?你确定吗?参考故障诊断用例研究&.
12.9.&区域服务器
RegionServer 的更多信息,参考&.
12.9.1.&启动错误
12.9.1.1.&主服务器启动了,但区域服务器没有启动
主服务器相信区域服务器有IP地址127.0.0.1 - 这是 localhost 并被解析到主服务器自己的localhost.
区域服务器错误的通知主服务器他们的IP地址是127.0.0.1.
修改区域服务器的 /etc/hosts 从...
# Do not remove the following line, or various programs
# that require network functionality will fail.
fully.qualified.regionservername regionservername
localhost.localdomain localhost
localhost6.localdomain6 localhost6
... 改到 (将主名称从localhost中移掉)...
# Do not remove the following line, or various programs
# that require network functionality will fail.
localhost.localdomain localhost
localhost6.localdomain6 localhost6
12.9.1.2.&Compression Link Errors
因为LZO压缩算法需要在集群中的每台机器都要安装,这是一个启动失败的常见错误。如果你获得了如下信息
11/02/20 01:32:15 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1734)
at java.lang.Runtime.loadLibrary0(Runtime.java:823)
at java.lang.System.loadLibrary(System.java:1028)
就意味着你的压缩库出现了问题。参见配置章节的 .
12.9.2.1.&RegionServer Hanging
Are you running an old JVM (& 1.6.0_u21?)? When you look at a thread dump, does it look like threads are BLOCKED but no one holds the lock all are blocked on? 参考 . Adding -XX:+UseMembar to the HBase HBASE_OPTS in conf/hbase-env.sh may fix it.
Also, are you using ? These are discouraged because they can lock up the RegionServers if not managed properly.
12.9.2.2.&java.io.IOException...打开太多文件(Too many open files)
如果看到如下消息...
01:24:17,336 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Disk-related IOException in BlockReceiver constructor. Cause is java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:883)
... 参见快速入门的章节 .
12.9.2.3.&xceiverCount 258 exceeds the limit of concurrent xcievers 256
这个时常会出现在DataNode的日志中。
参见快速入门章节的 .
12.9.2.4. 系统不稳定,DataNode或者其他系统进程有 &java.lang.OutOfMemoryError: unable to create new native thread in exceptions&的错误
参见快速入门章节的 .. 最新的Linux发行版缺省值是 1024 - 这对HBase实在太小了。
如果你收到了如下的消息:
10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept xxx ms, ten times longer than scheduled: 10000
10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept xxx ms, ten times longer than scheduled: 15000
10:01:36,472 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to master for xxx milliseconds - retrying
... 或者看到了全GC压缩操作,你可能正在执行一个全GC。
12.9.2.6.&&No live nodes contain current block& and/or YouAreDeadException
这个错误有可能是OS的文件句柄溢出,也可能是网络故障导致节点无法访问。
参见快速入门章节 ,检查你的网络。
12.9.2.7.&ZooKeeper 会话超时事件
Master or RegionServers shutting down with messages like those in the logs:
WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x278bd16a96000f to sun.nio.ch.SelectionKeyImpl@355811ec
java.io.IOException: TIMED OUT
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer than scheduled: 5000
INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server hostname/IP:PORT
INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/IP:PORT remote=hostname/IP:PORT]
INFO org.apache.zookeeper.ClientCnxn: Server connection successful
WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000d to sun.nio.ch.SelectionKeyImpl@3544d65e
java.io.IOException: Session Expired
at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired
The JVM is doing a long running garbage collecting which is pausing every threads (aka &stop the world&). Since the RegionServer's local ZooKeeper client cannot send heartbeats, the session times out. By design, we shut down any node that isn't able to contact the ZooKeeper ensemble after getting a timeout so that it stops serving data that may already be assigned elsewhere.
Make sure you give plenty of RAM (in hbase-env.sh), the default of 1GB won't be able to sustain long running imports.
Make sure you don't swap, the JVM never behaves well under swapping.
Make sure you are not CPU starving the RegionServer thread. For example, if you are running a MapReduce job using 6 CPU-intensive tasks on a machine with 4 cores, you are probably starving the RegionServer enough to create longer garbage collection pauses.
Increase the ZooKeeper session timeout
If you wish to increase the session timeout, add the following to your hbase-site.xml to increase the timeout from the default of 60 seconds to 120 seconds.
&property&
&name&zookeeper.session.timeout&/name&
&value&1200000&/value&
&/property&
&property&
&name&hbase.zookeeper.property.tickTime&/name&
&value&6000&/value&
&/property&
Be aware that setting a higher timeout means that the regions served by a failed RegionServer will take at least that amount of time to be transfered to another RegionServer. For a production system serving live requests, we would instead recommend setting it lower than 1 minute and over-provision your cluster in order the lower the memory load on each machines (hence having less garbage to collect per machine).
If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider bulk loading.
for other general information about ZooKeeper troubleshooting.
12.9.2.8.&无服务区域异常(NotServingRegionException)
This exception is &normal& when found in the RegionServer logs at DEBUG level. This exception is returned back to the client and then the client goes back to .META. to find the new location of the moved region.
However, if the NotServingRegionException is logged ERROR, then the client ran out of retries and something probably wrong.
12.9.2.9.&区域列示先是域名,然后IP
修复 DNS. HBase 0.92.x以前的版本,反向 DNS 需要和正向查询相同答案。 参考
获取详细细节。
没有采用本地压缩库版本。 参考 。从hadoop的HBase库目录复制本地库或建立链接到正确位置,该消息将消失。
12.9.2.11.&Server handler X on 60020 caught: java.nio.channels.ClosedChannelException
If you see this type of message it means that the region server was trying to read/send data from/to a client but it already went away. Typical causes for this are if the client was killed (you see a storm of messages like this when a MapReduce job is killed or fails) or if the client receives a SocketTimeoutException. It's harmless, but you should consider digging in a bit more if you aren't doing something to trigger them.
12.9.3.&终止错误
12.10.&Master
Master 更多信息, 参考&.
12.10.1.&启动错误
12.10.1.1.&Master says that you need to run the hbase migrations script
Upon running that, the hbase migrations script says no files in root directory.
HBase expects the root directory to either not exist, or to have already been initialized by hbase running a previous time. If you create a new directory for HBase using Hadoop DFS, this error will occur. Make sure the HBase root directory does not currently exist or has been initialized by a previous run of HBase. Sure fire solution is to just use Hadoop dfs to delete the HBase root and let HBase create and initialize the directory itself.
12.10.2.&终止错误
12.11.&ZooKeeper
12.11.1.&启动错误
12.11.1.1.&找不到地址: xyz in list of ZooKeeper quorum servers
A ZooKeeper server wasn't able to start, throws that error. xyz is the name of your server.
This is a name lookup problem. HBase tries to start a ZooKeeper server on some machine but that machine isn't able to find itself in the&hbase.zookeeper.quorumconfiguration.
Use the hostname presented in the error message instead of the value you used. If you have a DNS server, you can set&hbase.zookeeper.dns.interface&andhbase.zookeeper.dns.nameserver&in&hbase-site.xml&to make sure it resolves to the correct FQDN.
12.11.2.&ZooKeeper, The Cluster Canary
ZooKeeper is the cluster's &canary in the mineshaft&. It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster.
参考 &&页。 It has suggestions and tools for checking disk and ne i.e. the operating environment your ZooKeeper and HBase are running in.
Additionally, the utility&&may help investigate ZooKeeper issues.
12.12.&Amazon EC2
12.12.1.&ZooKeeper 在 Amazon EC2上看起来不工作?
HBase does not start when deployed as Amazon EC2 instances. Exceptions like the below appear in the Master and/or RegionServer logs:
11:52:27,030 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server ec2-174-129-pute-/10.244.9.171:2181
11:52:27,032 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x0 to sun.nio.ch.SelectionKeyImpl@656dc861
java.net.ConnectException: Connection refused
Security group policy is blocking the ZooKeeper port on a public address. Use the internal EC2 host names when configuring the ZooKeeper quorum peer list.
12.12.2.& Amazon EC2 上不稳定?
关于 HBase 和Amazon EC2 的问题,经常在HBase 讨论列表上被问起。搜索旧线索,使用
12.12.3.&远程Java连接到EC2集群不工作
参考 Andrew回复,更新在用户列表:.
12.13.&HBase 和 Hadoop 版本问题
12.13.1.&当想在adoop-0.20.205.x (或 hadoop-1.0.x)运行 0.90.x 时报NoClassDefFoundError
HBase 0.90.x does not ship with hadoop-0.20.205.x, etc. To make it run, you need to replace the hadoop jars that HBase shipped with in its lib directory with those of the Hadoop you want to run HBase on. If even after replacing Hadoop jars you get the below exception:
sv4r6s38: Exception in thread &main& java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.&init&(DefaultMetricsSystem.java:37)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.&clinit&(DefaultMetricsSystem.java:34)
at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:209)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)
at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:229)
at org.apache.hadoop.security.KerberosName.&clinit&(KerberosName.java:83)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:202)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:177)
you need to copy under hbase/lib, the commons-configuration-X.jar you find in your Hadoop's lib directory. That should fix the above complaint.
12.14.&用例研究
性能和故障诊断用例,参考 .
[]参考获取答案
Chapter&13.&用例研究
Table of Contents
13.1.&概述
This chapter will describe a variety of performance and troubleshooting case studies that can provide a useful blueprint on diagnosing cluster issues.
For more information on Performance and Troubleshooting, see
13.2.&模式设计
13.2.1.&数据列表
The following is an exchange from the user dist-list regarding a fairly common question: how to handle per-user list data in HBase.
*** QUESTION ***
We're looking at how to store a large amount of (per-user) list data in HBase, and we were trying to figure out what kind of access pattern made the most sense. One option is store the majority of the data in a key, so we could have something like:
&FixedWidthUserName&&FixedWidthValueId1&:&& (no value)
&FixedWidthUserName&&FixedWidthValueId2&:&& (no value)
&FixedWidthUserName&&FixedWidthValueId3&:&& (no value)
The other option we had was to do this entirely using:
&FixedWidthUserName&&FixedWidthPageNum0&:&FixedWidthLength&&FixedIdNextPageNum&&ValueId1&&ValueId2&&ValueId3&...
&FixedWidthUserName&&FixedWidthPageNum1&:&FixedWidthLength&&FixedIdNextPageNum&&ValueId1&&ValueId2&&ValueId3&...
where each row would contain multiple values. So in one case reading the first thirty values would be:
scan { STARTROW =& 'FixedWidthUsername' LIMIT =& 30}
And in the second case it would be
get 'FixedWidthUserName\x00\x00\x00\x00'
The general usage pattern would be to read only the first 30 values of these lists, with infrequent access reading deeper into the lists. Some users would have &= 30 total values in these lists, and some users would have millions (i.e. power-law distribution)
The single-value format seems like it would take up more space on HBase, but would offer some improved retrieval / pagination flexibility. Would there be any significant performance advantages to be able to paginate via gets vs paginating with scans?
My initial understanding was that doing a scan should be faster if our paging size is unknown (and caching is set appropriately), but that gets should be faster if we'll always need the same page size. I've ended up hearing different people tell me opposite things about performance. I assume the page sizes would be relatively consistent, so for most use cases we could guarantee that we only wanted one page of data in the fixed-page-length case. I would also assume that we would have infrequent updates, but may have inserts into the middle of these lists (meaning we'd need to update all subsequent rows).
Thanks for help / suggestions / follow-up questions.
*** ANSWER ***
If I understand you correctly, you're ultimately trying to store triples in the form &user, valueid, value&, right? E.g., something like:
&user123, firstname, Paul&,
&user234, lastname, Smith&
(But the usernames are fixed width, and the valueids are fixed width).
And, your access pattern is along the lines of: &for user X, list the next 30 values, starting with valueid Y&. Is that right? And these values should be returned sorted by valueid?
Tdr version is that you should probably go with one row per user+value, and not build a complicated intra-row pagination scheme on your own unless you're really sure it is needed.
Your two options mirror a common question people have when designing HBase schemas: should I go &tall& or &wide&? Your first schema is &tall&: each row represents one value for one user, and so there are many rows in the the row key is user + valueid, and there would be (presumably) a single column qualifier that means &the value&. This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done. What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that. Doing it this way is generally recommended (see here ).
Your second option is &wide&: you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row. I'm guessing you jumped to the &paginated& version because you're assuming that storing millions of columns in a single row would be bad for performance, which ma as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse. The client has methods that allow you to get specific slices of columns.
Note that neither case fundamentally uses more disk
you're just &shifting& part of the identifying information for a value either to the left (into the row key, in option one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value still stores the whole row key, and column family name. (If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding HBase schema design: .
A manually paginated version has lots more complexities, as you note, like having to keep track of how many things are in each page, re-shuffling if new values are inserted, etc. That seems significantly more complex. It might have some slight speed advan}

我要回帖

更多关于 delphi createthread 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信