   & & 在博客 中已经使用了opencv自带的函数detectMultiScale()实现了对行人的检测,当然了,该算法采用的是hog算法,那么hog算法是怎样实现的呢?这一节就来简单分析一下opencv中自带 hog源码。
& & &  网上也有不少网友对opencv中的hog源码进行了分析,很不错,看了很有收获。比如:
&&&&&&&& 二、关于源码的一些简单说明
&&&&&&&& 本文不是讲解hog理论的,所以需要对hog算法有一定了解,这些可以去参考hog提出者的博士论文,写得很详细。
&&&&&&&& 按照正常流程,hog行人检测分为训练过程和检测过程,训练过程主要是训练得到svm的系数。在opencv源码中直接采用训练好了的svm系数,所以训练过程源码中没有涉及到多少。
&&&&&&&& 三、hog训练部分流程的简单理解
&&&&&&&& 虽然hog源码中很少涉及到训练部分的代码,不过了解下训练过程的流程会对整个检测过程有个整体认识。
&&&&&&&& 训练过程中正样本大小统一为128*64,即检测窗口的大小;该样本图片可以包含1个或多个行人。对该图片提前的hog特征长度刚好为3780维,每一个特征对应一个正样本标签进行训练。在实际的训练过程中,我们并不是去google上收集或者拍摄刚好128*64大小且有行人的图片,而是收集包含行人的任意图片(当然了,尺寸最好比128*64大),然后手工对这些正样本进行标注,即对有行人的地方画个矩形,其实也就是存了2个顶点的坐标而已,并把这个矩形的信息存储起来;最好自己写一个程序,每读入一张图片,就把矩形区域的内容截取出来并缩放到统一尺寸128*64,这样,对处理过后的该图片进行hog特征提取就可以当做正样本了。
&&&&&&&& 负样本不需要统一尺寸,只需比128*64大,且图片中不能包含任何行人。实际过程中,由于是负样本,里面没有目标信息,所以不需要人工进行标注。程序中可以对该图片随机进行截取128*64大小的图片,并提取出其hog特征作为负样本。
&&&&&&&& 四、ho行人检测过程
&&&&&&&& 检测过程中采用的是滑动窗口法,对应本代码中,滑动窗口法的流程如下:
&&&&&&&& 由上图可以看出,检测时,会对输入图片进行尺度缩放(一般是缩小),在每一层的图像上采用固定大小的滑动窗口(128*64)滑动,没个滑动窗口都提取出hog特征,送入到svm分类器中,看该窗口中是否有目标。有则存下目标区域来,无则继续滑动。
&&&&&&&& 检测过程中用到的函数为detectMultiScale(),其参数分配图如下:
&&&&&&&& 五、计算检测窗口中图像的梯度
&&&&&&&& 计算梯度前如果需要gamma校正的话就先进行gamma校正,所谓的gamma校正就是把原来的每个通道像素值范围从0~255变换到0~15.97(255开根号)。据作者说这样校正过后的图像计算的效果会更好,在计算梯度前不需要进行高斯滤波操作。
&&&&&&&& 梯度的计算是分别计算水平梯度图和垂直梯度图,然后求幅值和相位。水平梯度卷积算子为:
&&&&&&&& 在阅读该源码的时候,要特别注意梯度幅值和角度的存储方式。因为是对一个滑动窗口里的图像进行的,所以梯度幅值和角度按照道理来说应该都是128*64=8192维的向量。但实际过程中这2者都是用的128*64*2=16384维的向量。为什么呢?
&&&&&&&& 因为这里的梯度和角度都是用到了二线插值的。每一个点的梯度角度可能是0~180度之间的任意值,而程序中将其离散化为9个bin,即每个bin占20度。所以滑动窗口中每个像素点的梯度角度如果要离散化到这9个bin中,则一般它都会有2个相邻的bin(如果恰好位于某个bin的中心,则可认为对该bin的权重为1即可)。从源码中可以看到梯度的幅值是用来计算梯度直方图时权重投票的,所以每个像素点的梯度幅值就分解到了其角度相邻的2个bin了,越近的那个bin得到的权重越大。因此幅度图像用了2个通道,每个通道都是原像素点幅度的一个分量。同理,不难理解,像素点的梯度角度也用了2个通道,每个通道中存储的是它相邻2个bin的bin序号。序号小的放在第一通道。
&&&&&&&& 二线插值的示意图如下:
&&&&&&&& 其中,假设那3条半径为离散化后bin的中心,红色虚线为像素点O(像素点在圆心处)的梯度方向,梯度幅值为A,该梯度方向与最近的相邻bin为bin0,这两者之间的夹角为a.这该像素点O处存储的梯度幅值第1通道为A*(1-a),第2通道为A*a;该像素点O处存储的角度第1通道为0(bin的序号为0),第2通道为1(bin的序号为1)。
&&&&&&&& 另外在计算图像的梯度图和相位图时,如果该图像时3通道的,则3通道分别取梯度值,并且取梯度最大的那个通道的值为该点的梯度幅值。
&&&&&&&& 六、HOG缓存结构体
&&&&&&&& HOG缓存思想是该程序作者加快hog算法速度采用的一种内存优化技术。由于我们对每幅输入图片要进行4层扫描,分别为图像金字塔层,每层中滑动窗口,每个滑动窗口中滑动的block,每个block中的cell,其实还有每个cell中的像素点;有这么多层,每一层又是一个二维的,所以速度非常慢。作者的采用的思想是HOG缓存,即把计算得到的每个滑动窗口的数据(其实最终是每个block的hog描述子向量)都存在内存查找表中,由于滑动窗口在滑动时,很多个block都会重叠,因此重叠处计算过的block信息就可以直接从查找表中读取,这样就节省了很多时间。
&&&&&&&& 在这个HOG存储结构体中,会计算滑动窗口内的hog描述子,而这又涉及到滑动窗口,block,cell直接的关系,其之间的关系可以参考下面示意图:
&&&&&&&& 外面最大的为待检测的图片,对待检测的图片需要用滑动窗口进行滑动来判断窗口中是否有目标,每个滑动窗口中又有很多个重叠移动的block,每个block中还有不重叠的cell。其实该程序的作者又将每个block中的像素点对cell的贡献不同,有将每个cell分成了4个区域,即图中蓝色虚线最小的框。
&&&&&&&& 那么block中不同的像素点对它的cell(默认参数为1个block有4个cell)的影响是怎样的呢?请看下面示意图。
    & & & &&
&&&&&&&& 如果所示,黑色框代表1个block,红实线隔开的为4个cell,每个cell用绿色虚线隔开的我们称之为4个区域,所以该block中共有16个区域,分别为A、B、C、…、O、P。
&&&&&&&& 程序中将这16个区域分为4组:
&&&&&&&& 第1组:A、D、M、P;该组内的像素点计算梯度方向直方图时只对其所在的cell有贡献。
&&&&&&&& 第2组:B、C、N、O;该组内的像素点计算梯度直方图时对其所在的左右cell有贡献。
&&&&&&&& 第3组:E、I、H、L;该组内的像素点计算梯度直方图时对其所在的上下cell有贡献。
&&&&&&&& 第4组:F、G、J、K;该组内的像素点对其上下左右的cell计算梯度直方图时都有贡献。
&&&&&&&& 那到底是怎么对cell贡献的呢?举个例子来说,E区域内的像素点对cell0和cell2有贡献。本来1个block对滑动窗口贡献的向量维数为36维,即每个cell贡献9维,其顺序分别为cell0,cell1,cell2,cell3.而E区域内的像素由于同时对cell0和cell2有贡献,所以在计算E区域内的像素梯度投票时,不仅要投向它本来的cell0,还要投向下面的cell2,即投向cell0和cell2有一个权重,该权重与该像素点所在位置与cell0,cell2中心位置的距离有关。具体的关系可以去查看源码。
&&&&&&&& 该结构体变量内存分配图如下,可以增强读代码的直观性:
& & & & & &在读该部分源码时,需要特别注意以下几个地方:
    1)&&&&&&&& 结构体BlockData中有2个变量。1个BlockData结构体是对应的一个block数据。histOfs和imgOffset.其中histOfs表示为该block对整个滑动窗口内hog描述算子的贡献那部分向量的起始位置;imgOffset为该block在滑动窗口图片中的坐标(当然是指左上角坐标)。
    2)&&&&&&&& 结构体PixData中有5个变量,1个PixData结构体是对应的block中1个像素点的数据。其中gradOfs表示该点的梯度幅度在滑动窗口图片梯度幅度图中的位置坐标;qangleOfs表示该点的梯度角度在滑动窗口图片梯度角度图中的位置坐标;histOfs[]表示该像素点对1个或2个或4个cell贡献的hog描述子向量的起始位置坐标(比较抽象,需要看源码才懂)。histWeight[]表示该像素点对1个或2个或4个cell贡献的权重。gradWeight表示该点本身由于处在block中位置的不同因而对梯度直方图贡献也不同,其权值按照二维高斯分布(以block中心为二维高斯的中心)来决定。
    3)&&&&&&&& 程序中的count1,cout2,cout4分别表示该block中对1个cell、2个cell、4个cell有贡献的像素点的个数。
&&&&&&&& 八、关于HOG的初始化
&&&&&&&& Hog初始化可以采用直接赋初值;也直接从文件节点中读取(有相应的格式,好像采用的是xml文件格式);当然我们可以读取初始值,也可以在程序中设置hog算子的初始值并写入文件,这些工作可以采用源码中的read,write,load,save等函数来完成。
& & & & &九、hog源码的注释
&&&&&&&& 在读源码时,由于里面用到了intel的ipp库,优化了算法的速度,所以在程序中遇到#ifdef HAVE_IPP后面的代码时,可以直接跳过不读,直接读#else后面的代码,这并不影响对原hog算法的理解。
&&&&&&&& 首先来看看hog源码中用到的头文件目录图,如下:
By downloading, copying, installing or using the software you agree to this license.
If you do not agree to this license, do not download, install,
copy or use the software.
License Agreement
For Open Source Computer Vision Library
// Copyright (C) , Intel Corporation, all rights reserved.
// Copyright (C) 2009, Willow Garage Inc., all rights reserved.
// Third party copyrights are property of their respective owners.
// Redistribution and use in source and binary forms, with or without modification,
// are permitted provided that the following conditions are met:
* Redistribution's of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistribution's in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* The name of the copyright holders may not be used to endorse or promote products
derived from this software without specific prior written permission.
// This software is provided by the copyright holders and contributors &as is& and
// any express or implied warranties, including, but not limited to, the implied
// warranties of merchantability and fitness for a particular purpose are disclaimed.
// In no event shall the Intel Corporation or contributors be liable for any direct,
// indirect, incidental, special, exemplary, or consequential damages
// (including, but not limited to, procurement of substitu
// loss of use, data, or business interruption) however caused
// and on any theory of liability, whether in contract, strict liability,
// or tort (including negligence or otherwise) arising in any way out of
// the use of this software, even if advised of the possibility of such damage.
#include &precomp.hpp&
#include &iterator&
#ifdef HAVE_IPP
#include &ipp.h&
The code below is implementation of HOG (Histogram-of-Oriented Gradients)
descriptor and object detection, introduced by Navneet Dalal and Bill Triggs.
The computed feature vectors are compatible with the
INRIA Object Detection and Localization Toolkit
namespace cv
size_t HOGDescriptor::getDescriptorSize() const
CV_Assert(blockSize.width % cellSize.width == 0 &&
blockSize.height % cellSize.height == 0);
CV_Assert((winSize.width - blockSize.width) % blockStride.width == 0 &&
(winSize.height - blockSize.height) % blockStride.height == 0 );
return (size_t)nbins*
((winSize.width - blockSize.width)/blockStride.width + 1)*
((winSize.height - blockSize.height)/blockStride.height + 1);
double HOGDescriptor::getWinSigma() const
return winSigma &= 0 ? winSigma : (blockSize.width + blockSize.height)/8.;
bool HOGDescriptor::checkDetectorSize() const
size_t detectorSize = svmDetector.size(), descriptorSize = getDescriptorSize();
return detectorSize == 0 ||
detectorSize == descriptorSize ||
detectorSize == descriptorSize + 1;
void HOGDescriptor::setSVMDetector(InputArray _svmDetector)
_svmDetector.getMat().convertTo(svmDetector, CV_32F);
CV_Assert( checkDetectorSize() );
#define CV_TYPE_NAME_HOG_DESCRIPTOR &opencv-object-detector-hog&
bool HOGDescriptor::read(FileNode& obj)
if( !obj.isMap() )
FileNodeIterator it = obj[&winSize&].begin();
it && winSize.width && winSize.
it = obj[&blockSize&].begin();
it && blockSize.width && blockSize.
it = obj[&blockStride&].begin();
it && blockStride.width && blockStride.
it = obj[&cellSize&].begin();
it && cellSize.width && cellSize.
obj[&nbins&] &&
obj[&derivAperture&] && derivA
obj[&winSigma&] && winS
obj[&histogramNormType&] && histogramNormT
obj[&L2HysThreshold&] && L2HysT
obj[&gammaCorrection&] && gammaC
obj[&nlevels&] &&
FileNode vecNode = obj[&SVMDetector&];
if( vecNode.isSeq() )
vecNode && svmD
void HOGDescriptor::write(FileStorage& fs, const String& objName) const
if( !objName.empty() )
fs && objN
&& &winSize& && winSize
&& &blockSize& && blockSize
&& &blockStride& && blockStride
&& &cellSize& && cellSize
&& &nbins& && nbins
&& &derivAperture& && derivAperture
&& &winSigma& && getWinSigma()
&& &histogramNormType& && histogramNormType
&& &L2HysThreshold& && L2HysThreshold
&& &gammaCorrection& && gammaCorrection
&& &nlevels& &&
if( !svmDetector.empty() )
fs && &SVMDetector& && &[:& && svmDetector && &]&;
fs && &}&;
bool HOGDescriptor::load(const String& filename, const String& objname)
FileStorage fs(filename, FileStorage::READ);
FileNode obj = !objname.empty() ? fs[objname] : fs.getFirstTopLevelNode();
return read(obj);
void HOGDescriptor::save(const String& filename, const String& objName) const
FileStorage fs(filename, FileStorage::WRITE);
write(fs, !objName.empty() ? objName : FileStorage::getDefaultObjectName(filename));
void HOGDescriptor::copyTo(HOGDescriptor& c) const
c.winSize = winS
c.blockSize = blockS
c.blockStride = blockS
c.cellSize = cellS
c.derivAperture = derivA
c.winSigma = winS
c.histogramNormType = histogramNormT
c.L2HysThreshold = L2HysT
c.gammaCorrection = gammaC
c.svmDetector = svmD c.nlevels = }
void HOGDescriptor::computeGradient(const Mat& img, Mat& grad, Mat& qangle,
Size paddingTL, Size paddingBR) const
CV_Assert( img.type() == CV_8U || img.type() == CV_8UC3 );
Size gradsize(img.cols + paddingTL.width + paddingBR.width,
img.rows + paddingTL.height + paddingBR.height);
grad.create(gradsize, CV_32FC2);
// &magnitude*(1-alpha), magnitude*alpha&
qangle.create(gradsize, CV_8UC2); // [0..nbins-1] - quantized gradient orientation
Size wholeS
//img的size,所以roiofs就应当理解为Point(0, 0)了。
img.locateROI(wholeSize, roiofs);
int cn = img.channels();
Mat_&float& _lut(1, 256);
const float* lut = &_lut(0,0);
if( gammaCorrection )
for( i = 0; i & 256; i++ )
_lut(0,i) = std::sqrt((float)i);
for( i = 0; i & 256; i++ )
_lut(0,i) = (float)i;
AutoBuffer&int& mapbuf(gradsize.width + gradsize.height + 4);
int* xmap = (int*)mapbuf + 1;
int* ymap = xmap + gradsize.width + 2;
//#define IPL_BORDER_REFLECT_101
Various border types, image boundaries are denoted with '|'
with some specified 'i'
const int borderType = (int)BORDER_REFLECT_101;
for( x = -1; x & gradsize.width + 1; x++ )
/*int borderInterpolate(int p, int len, int borderType)
xmap[x] = borderInterpolate(x - paddingTL.width + roiofs.x,
wholeSize.width, borderType) - roiofs.x;
for( y = -1; y & gradsize.height + 1; y++ )
ymap[y] = borderInterpolate(y - paddingTL.height + roiofs.y,
wholeSize.height, borderType) - roiofs.y;
// x- & y- derivatives for the whole row
int width = gradsize.
AutoBuffer&float& _dbuf(width*4);
float* dbuf = _
Mat Dx(1, width, CV_32F, dbuf);
Mat Dy(1, width, CV_32F, dbuf + width);
Mat Mag(1, width, CV_32F, dbuf + width*2);
Mat Angle(1, width, CV_32F, dbuf + width*3);
int _nbins =
float angleScale = (float)(_nbins/CV_PI);
#ifdef HAVE_IPP
Mat lutimg(img.rows,img.cols,CV_MAKETYPE(CV_32F,cn));
Mat hidxs(1, width, CV_32F);
Ipp32f* pHidxs
= (Ipp32f*)hidxs.
Ipp32f* pAngles = (Ipp32f*)Angle.
IppiSize roiS
roiSize.width = img.
roiSize.height = img.
for( y = 0; y & roiSize. y++ )
const uchar* imgPtr = img.data + y*img.
float* imglutPtr = (float*)(lutimg.data + y*lutimg.step);
for( x = 0; x & roiSize.width* x++ )
imglutPtr[x] = lut[imgPtr[x]];
for( y = 0; y & gradsize. y++ )
#ifdef HAVE_IPP
const float* imgPtr
= (float*)(lutimg.data + lutimg.step*ymap[y]);
const float* prevPtr = (float*)(lutimg.data + lutimg.step*ymap[y-1]);
const float* nextPtr = (float*)(lutimg.data + lutimg.step*ymap[y+1]);
const uchar* imgPtr
= img.data + img.step*ymap[y];
const uchar* prevPtr = img.data + img.step*ymap[y-1];
const uchar* nextPtr = img.data + img.step*ymap[y+1];
float* gradPtr = (float*)grad.ptr(y);
uchar* qanglePtr = (uchar*)qangle.ptr(y);
if( cn == 1 )
for( x = 0; x & x++ )
int x1 = xmap[x];
#ifdef HAVE_IPP
dbuf[x] = (float)(imgPtr[xmap[x+1]] - imgPtr[xmap[x-1]]);
dbuf[width + x] = (float)(nextPtr[x1] - prevPtr[x1]);
dbuf[x] = (float)(lut[imgPtr[xmap[x+1]]] - lut[imgPtr[xmap[x-1]]]);
dbuf[width + x] = (float)(lut[nextPtr[x1]] - lut[prevPtr[x1]]);
for( x = 0; x & x++ )
int x1 = xmap[x]*3;
float dx0, dy0, dx, dy, mag0,
#ifdef HAVE_IPP
const float* p2 = imgPtr + xmap[x+1]*3;
const float* p0 = imgPtr + xmap[x-1]*3;
dx0 = p2[2] - p0[2];
dy0 = nextPtr[x1+2] - prevPtr[x1+2];
mag0 = dx0*dx0 + dy0*dy0;
dx = p2[1] - p0[1];
dy = nextPtr[x1+1] - prevPtr[x1+1];
mag = dx*dx + dy*
if( mag0 & mag )
dx = p2[0] - p0[0];
dy = nextPtr[x1] - prevPtr[x1];
mag = dx*dx + dy*
const uchar* p2 = imgPtr + xmap[x+1]*3;
const uchar* p0 = imgPtr + xmap[x-1]*3;
dx0 = lut[p2[2]] - lut[p0[2]];
dy0 = lut[nextPtr[x1+2]] - lut[prevPtr[x1+2]];
mag0 = dx0*dx0 + dy0*dy0;
dx = lut[p2[1]] - lut[p0[1]];
dy = lut[nextPtr[x1+1]] - lut[prevPtr[x1+1]];
mag = dx*dx + dy*
if( mag0 & mag )
dx = lut[p2[0]] - lut[p0[0]];
dy = lut[nextPtr[x1]] - lut[prevPtr[x1]];
mag = dx*dx + dy*
if( mag0 & mag )
dbuf[x] = dx0;
dbuf[x+width] = dy0;
#ifdef HAVE_IPP
ippsCartToPolar_32f((const Ipp32f*)Dx.data, (const Ipp32f*)Dy.data, (Ipp32f*)Mag.data, pAngles, width);
for( x = 0; x & x++ )
if(pAngles[x] & 0.f)
pAngles[x] += (Ipp32f)(CV_PI*2.);
ippsNormalize_32f(pAngles, pAngles, width, 0.5f/angleScale, 1.f/angleScale);
cartToPolar( Dx, Dy, Mag, Angle, false );
for( x = 0; x & x++ )
#ifdef HAVE_IPP
int hidx = (int)pHidxs[x];
float mag = dbuf[x+width*2], angle = dbuf[x+width*3]*angleScale - 0.5f;
int hidx = cvFloor(angle);
gradPtr[x*2] = mag*(1.f - angle);
gradPtr[x*2+1] = mag*
if( hidx & 0 )
else if( hidx &= _nbins )
assert( (unsigned)hidx & (unsigned)_nbins );
qanglePtr[x*2] = (uchar)
hidx &= hidx & _nbins ? -1 : 0;
qanglePtr[x*2+1] = (uchar)
struct HOGCache
struct BlockData
BlockData() : histOfs(0), imgOffset() {}
Point imgO
struct PixData
size_t gradOfs, qangleO
int histOfs[4];
float histWeights[4];
float gradW
HOGCache(const HOGDescriptor* descriptor,
const Mat& img, Size paddingTL, Size paddingBR,
bool useCache, Size cacheStride);
virtual ~HOGCache() {};
virtual void init(const HOGDescriptor* descriptor,
const Mat& img, Size paddingTL, Size paddingBR,
bool useCache, Size cacheStride);
Size windowsInImage(Size imageSize, Size winStride)
Rect getWindow(Size imageSize, Size winStride, int idx)
const float* getBlock(Point pt, float* buf);
virtual void normalizeBlockHistogram(float* histogram)
vector&PixData& pixD
vector&BlockData& blockD
vector&int& ymaxC
Size winSize, cacheS
Size nblocks,
int blockHistogramS
int count1, count2, count4;
Mat_&float& blockC
Mat_&uchar& blockCacheF
const HOGDescriptor*
useCache =
blockHistogramSize = count1 = count2 = count4 = 0;
descriptor = 0;
HOGCache::HOGCache(const HOGDescriptor* _descriptor,
const Mat& _img, Size _paddingTL, Size _paddingBR,
bool _useCache, Size _cacheStride)
init(_descriptor, _img, _paddingTL, _paddingBR, _useCache, _cacheStride);
void HOGCache::init(const HOGDescriptor* _descriptor,
const Mat& _img, Size _paddingTL, Size _paddingBR,
bool _useCache, Size _cacheStride)
descriptor = _
cacheStride = _cacheS
useCache = _useC
descriptor-&computeGradient(_img, grad, qangle, _paddingTL, _paddingBR);
imgoffset = _paddingTL;
winSize = descriptor-&winS
Size blockSize = descriptor-&blockS
Size blockStride = descriptor-&blockS
Size cellSize = descriptor-&cellS
int i, j, nbins = descriptor-&
int rawBlockSize = blockSize.width*blockSize.
nblocks = Size((winSize.width - blockSize.width)/blockStride.width + 1,
(winSize.height - blockSize.height)/blockStride.height + 1);
ncells = Size(blockSize.width/cellSize.width, blockSize.height/cellSize.height);
blockHistogramSize = ncells.width*ncells.height*
if( useCache )
//cacheStride= _cacheStride,即其大小是由参数传入的,表示的是窗口移动的大小
Size cacheSize((grad.cols - blockSize.width)/cacheStride.width+1,
blockCache.create(cacheSize.height, cacheSize.width*blockHistogramSize);
size_t cacheRows = blockCache.
for(size_t ii = 0; ii & cacheR ii++ )
ymaxCached[ii] = -1;
Mat_&float& weights(blockSize);
float sigma = (float)descriptor-&getWinSigma();
float scale = 1.f/(sigma*sigma*2);
for(i = 0; i & blockSize. i++)
for(j = 0; j & blockSize. j++)
float di = i - blockSize.height*0.5f;
float dj = j - blockSize.width*0.5f;
weights(i,j) = std::exp(-(di*di + dj*dj)*scale);
//vector&BlockData& blockD而BlockData为HOGCache的一个结构体成员
//vector&PixData& pixD同理,Pixdata也为HOGCache中的一个结构体成员
// Initialize 2 lookup tables, pixData & blockData.
// Here is why:
// The detection algorithm runs in 4 nested loops (at each pyramid layer):
loop over the windows within the input image
loop over the blocks within each window
loop over the cells within each block
loop over the pixels in each cell
// As each of the loops runs over a 2-dimensional array,
// we could get 8(!) nested loops in total, which is very-very slow.
// To speed the things up, we do the following:
1. loop over windows is unrolled in the HOGDescriptor::{compute|detect}
inside we compute the current search window using getWindow() method.
Yes, it involves some overhead (function call + couple of divisions),
but it's tiny in fact.
2. loop over the blocks is also unrolled. Inside we use pre-computed blockData[j]
to set up gradient and histogram pointers.
3. loops over cells and pixels in each cell are merged
(since there is no overlap between cells, each pixel in the block is processed once)
and also unrolled. Inside we use PixData[k] to access the gradient values and
update the histogram
count1 = count2 = count4 = 0;
for( j = 0; j & blockSize. j++ )
for( i = 0; i & blockSize. i++ )
PixData* data = 0;
float cellX = (j+0.5f)/cellSize.width - 0.5f;
float cellY = (i+0.5f)/cellSize.height - 0.5f;
int icellX0 = cvFloor(cellX);
int icellY0 = cvFloor(cellY);
int icellX1 = icellX0 + 1, icellY1 = icellY0 + 1;
cellX -= icellX0;
cellY -= icellY0;
if( (unsigned)icellX0 & (unsigned)ncells.width &&
(unsigned)icellX1 & (unsigned)ncells.width )
if( (unsigned)icellY0 & (unsigned)ncells.height &&
(unsigned)icellY1 & (unsigned)ncells.height )
data = &pixData[rawBlockSize*2 + (count4++)];
data-&histOfs[0] = (icellX0*ncells.height + icellY0)*
data-&histWeights[0] = (1.f - cellX)*(1.f - cellY);
data-&histOfs[1] = (icellX1*ncells.height + icellY0)*
data-&histWeights[1] = cellX*(1.f - cellY);
data-&histOfs[2] = (icellX0*ncells.height + icellY1)*
data-&histWeights[2] = (1.f - cellX)*cellY;
data-&histOfs[3] = (icellX1*ncells.height + icellY1)*
data-&histWeights[3] = cellX*cellY;
//满足这个else条件说明icellY0取-1或者1,也就是说block纵坐标在(0, 3.5)
//和(11.5, 15)之间.
data = &pixData[rawBlockSize + (count2++)];
if( (unsigned)icellY0 & (unsigned)ncells.height )
icellY1 = icellY0;
cellY = 1.f - cellY;
data-&histOfs[0] = (icellX0*ncells.height + icellY1)*
data-&histWeights[0] = (1.f - cellX)*cellY;
data-&histOfs[1] = (icellX1*ncells.height + icellY1)*
data-&histWeights[1] = cellX*cellY;
data-&histOfs[2] = data-&histOfs[3] = 0;
data-&histWeights[2] = data-&histWeights[3] = 0;
//当block中横坐标满足在(0, 3.5)和(11.5, 15)范围内时,即
if( (unsigned)icellX0 & (unsigned)ncells.width )
icellX1 = icellX0;
cellX = 1.f - cellX;
if( (unsigned)icellY0 & (unsigned)ncells.height &&
(unsigned)icellY1 & (unsigned)ncells.height )
data = &pixData[rawBlockSize + (count2++)];
data-&histOfs[0] = (icellX1*ncells.height + icellY0)*
data-&histWeights[0] = cellX*(1.f - cellY);
data-&histOfs[1] = (icellX1*ncells.height + icellY1)*
data-&histWeights[1] = cellX*cellY;
data-&histOfs[2] = data-&histOfs[3] = 0;
data-&histWeights[2] = data-&histWeights[3] = 0;
data = &pixData[count1++];
if( (unsigned)icellY0 & (unsigned)ncells.height )
icellY1 = icellY0;
cellY = 1.f - cellY;
data-&histOfs[0] = (icellX1*ncells.height + icellY1)*
data-&histWeights[0] = cellX*cellY;
data-&histOfs[1] = data-&histOfs[2] = data-&histOfs[3] = 0;
data-&histWeights[1] = data-&histWeights[2] = data-&histWeights[3] = 0;
data-&gradOfs = (grad.cols*i + j)*2;
data-&qangleOfs = (qangle.cols*i + j)*2;
data-&gradWeight = weights(i,j);
assert( count1 + count2 + count4 == rawBlockSize );
// defragment pixData
for( j = 0; j & count2; j++ )
pixData[j + count1] = pixData[j + rawBlockSize];
for( j = 0; j & count4; j++ )
pixData[j + count1 + count2] = pixData[j + rawBlockSize*2];
count2 += count1;
count4 += count2;
// initialize blockData
for( j = 0; j & nblocks. j++ )
for( i = 0; i & nblocks. i++ )
BlockData& data = blockData[j*nblocks.height + i];
data.histOfs = (j*nblocks.height + i)*blockHistogramS
data.imgOffset = Point(j*blockStride.width,i*blockStride.height);
const float* HOGCache::getBlock(Point pt, float* buf)
float* blockHist =
assert(descriptor != 0);
Size blockSize = descriptor-&blockS
CV_Assert( (unsigned)pt.x &= (unsigned)(grad.cols - blockSize.width) &&
(unsigned)pt.y &= (unsigned)(grad.rows - blockSize.height) );
if( useCache )
CV_Assert( pt.x % cacheStride.width == 0 &&
pt.y % cacheStride.height == 0 );
Point cacheIdx(pt.x/cacheStride.width,
(pt.y/cacheStride.height) % blockCache.rows);
if( pt.y != ymaxCached[cacheIdx.y] )
Mat_&uchar& cacheRow = blockCacheFlags.row(cacheIdx.y);
cacheRow = (uchar)0;
ymaxCached[cacheIdx.y] = pt.y;
blockHist = &blockCache[cacheIdx.y][cacheIdx.x*blockHistogramSize];
uchar& computedFlag = blockCacheFlags(cacheIdx.y, cacheIdx.x);
if( computedFlag != 0 )
return blockH
computedFlag = (uchar)1; // set it at once, before actual computing
int k, C1 = count1, C2 = count2, C4 = count4;
const float* gradPtr = (const float*)(grad.data + grad.step*pt.y) + pt.x*2;
const uchar* qanglePtr = qangle.data + qangle.step*pt.y + pt.x*2;
CV_Assert( blockHist != 0 );
#ifdef HAVE_IPP
for( k = 0; k & blockHistogramS k++ )
blockHist[k] = 0.f;
const PixData* _pixData = &pixData[0];
for( k = 0; k & C1; k++ )
const PixData& pk = _pixData[k];
const float* a = gradPtr + pk.gradO
float w = pk.gradWeight*pk.histWeights[0];
const uchar* h = qanglePtr + pk.qangleO
int h0 = h[0], h1 = h[1];
float* hist = blockHist + pk.histOfs[0];
float t0 = hist[h0] + a[0]*w;
float t1 = hist[h1] + a[1]*w;
hist[h0] = t0; hist[h1] = t1;
for( ; k & C2; k++ )
const PixData& pk = _pixData[k];
const float* a = gradPtr + pk.gradO
float w, t0, t1, a0 = a[0], a1 = a[1];
const uchar* h = qanglePtr + pk.qangleO
int h0 = h[0], h1 = h[1];
float* hist = blockHist + pk.histOfs[0];
w = pk.gradWeight*pk.histWeights[0];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
hist = blockHist + pk.histOfs[1];
w = pk.gradWeight*pk.histWeights[1];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
for( ; k & C4; k++ )
const PixData& pk = _pixData[k];
const float* a = gradPtr + pk.gradO
float w, t0, t1, a0 = a[0], a1 = a[1];
const uchar* h = qanglePtr + pk.qangleO
int h0 = h[0], h1 = h[1];
float* hist = blockHist + pk.histOfs[0];
w = pk.gradWeight*pk.histWeights[0];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
hist = blockHist + pk.histOfs[1];
w = pk.gradWeight*pk.histWeights[1];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
hist = blockHist + pk.histOfs[2];
w = pk.gradWeight*pk.histWeights[2];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
hist = blockHist + pk.histOfs[3];
w = pk.gradWeight*pk.histWeights[3];
t0 = hist[h0] + a0*w;
t1 = hist[h1] + a1*w;
hist[h0] = t0; hist[h1] = t1;
return blockH
void HOGCache::normalizeBlockHistogram(float* _hist) const
float* hist = &_hist[0];
#ifdef HAVE_IPP
size_t sz = blockHistogramS
size_t i, sz = blockHistogramS
float sum = 0;
#ifdef HAVE_IPP
for( i = 0; i & i++ )
sum += hist[i]*hist[i];
float scale = 1.f/(std::sqrt(sum)+sz*0.1f), thresh = (float)descriptor-&L2HysT
#ifdef HAVE_IPP
ippsThreshold_32f_I( hist, sz, thresh, ippCmpGreater );
for( i = 0, sum = 0; i & i++ )
hist[i] = std::min(hist[i]*scale, thresh);
sum += hist[i]*hist[i];
scale = 1.f/(std::sqrt(sum)+1e-3f);
#ifdef HAVE_IPP
for( i = 0; i & i++ )
hist[i] *=
Size HOGCache::windowsInImage(Size imageSize, Size winStride) const
return Size((imageSize.width - winSize.width)/winStride.width + 1,
(imageSize.height - winSize.height)/winStride.height + 1);
Rect HOGCache::getWindow(Size imageSize, Size winStride, int idx) const
int nwindowsX = (imageSize.width - winSize.width)/winStride.width + 1;
int y = idx / nwindowsX;//商
int x = idx - nwindowsX*y;//余数
return Rect( x*winStride.width, y*winStride.height, winSize.width, winSize.height );
void HOGDescriptor::compute(const Mat& img, vector&float&& descriptors,
Size winStride, Size padding,
const vector&Point&& locations) const
if( winStride == Size() )
winStride = cellS
Size cacheStride(gcd(winStride.width, blockStride.width),
gcd(winStride.height, blockStride.height));
size_t nwindows = locations.size();
//alignSize(m, n)返回n的倍数大于等于m的最小值
padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width);
padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height);
Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2);
HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride);
if( !nwindows )
nwindows = cache.windowsInImage(paddedImgSize, winStride).area();
const HOGCache::BlockData* blockData = &cache.blockData[0];
int nblocks = cache.nblocks.area();
int blockHistogramSize = cache.blockHistogramS
size_t dsize = getDescriptorSize();//一个hog的描述长度
for( size_t i = 0; i & i++ )
float* descriptor = &descriptors[i*dsize];
Point pt0;
if( !locations.empty() )
pt0 = locations[i];
if( pt0.x & -padding.width || pt0.x & img.cols + padding.width - winSize.width ||
pt0.y & -padding.height || pt0.y & img.rows + padding.height - winSize.height )
pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding);
CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0);
for( int j = 0; j & j++ )
const HOGCache::BlockData& bj = blockData[j];
Point pt = pt0 + bj.imgO
float* dst = descriptor + bj.histO
const float* src = cache.getBlock(pt, dst);
if( src != dst )
#ifdef HAVE_IPP
for( int k = 0; k & blockHistogramS k++ )
dst[k] = src[k];
void HOGDescriptor::detect(const Mat& img,
vector&Point&& hits, vector&double&& weights, double hitThreshold,
Size winStride, Size padding, const vector&Point&& locations) const
if( svmDetector.empty() )
if( winStride == Size() )
winStride = cellS
Size cacheStride(gcd(winStride.width, blockStride.width),
gcd(winStride.height, blockStride.height));
size_t nwindows = locations.size();
padding.width = (int)alignSize(std::max(padding.width, 0), cacheStride.width);
padding.height = (int)alignSize(std::max(padding.height, 0), cacheStride.height);
Size paddedImgSize(img.cols + padding.width*2, img.rows + padding.height*2);
HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride);
if( !nwindows )
nwindows = cache.windowsInImage(paddedImgSize, winStride).area();
const HOGCache::BlockData* blockData = &cache.blockData[0];
int nblocks = cache.nblocks.area();
int blockHistogramSize = cache.blockHistogramS
size_t dsize = getDescriptorSize();
double rho = svmDetector.size() & dsize ? svmDetector[dsize] : 0;
vector&float& blockHist(blockHistogramSize);
for( size_t i = 0; i & i++ )
Point pt0;
if( !locations.empty() )
pt0 = locations[i];
if( pt0.x & -padding.width || pt0.x & img.cols + padding.width - winSize.width ||
pt0.y & -padding.height || pt0.y & img.rows + padding.height - winSize.height )
pt0 = cache.getWindow(paddedImgSize, winStride, (int)i).tl() - Point(padding);
CV_Assert(pt0.x % cacheStride.width == 0 && pt0.y % cacheStride.height == 0);
double s =
const float* svmVec = &svmDetector[0];
#ifdef HAVE_IPP
for( j = 0; j & j++, svmVec += blockHistogramSize )
const HOGCache::BlockData& bj = blockData[j];
Point pt = pt0 + bj.imgO
const float* vec = cache.getBlock(pt, &blockHist[0]);
#ifdef HAVE_IPP
Ipp32f partS
s += (double)partS
for( k = 0; k &= blockHistogramSize - 4; k += 4 )
//const float* svmVec = &svmDetector[0];
s += vec[k]*svmVec[k] + vec[k+1]*svmVec[k+1] +
vec[k+2]*svmVec[k+2] + vec[k+3]*svmVec[k+3];
for( ; k & blockHistogramS k++ )
s += vec[k]*svmVec[k];
if( s &= hitThreshold )
void HOGDescriptor::detect(const Mat& img, vector&Point&& hits, double hitThreshold,
Size winStride, Size padding, const vector&Point&& locations) const
vector&double& weightsV;
detect(img, hits, weightsV, hitThreshold, winStride, padding, locations);
struct HOGInvoker
HOGInvoker( const HOGDescriptor* _hog, const Mat& _img,
double _hitThreshold, Size _winStride, Size _padding,
const double* _levelScale, ConcurrentRectVector* _vec,
ConcurrentDoubleVector* _weights=0, ConcurrentDoubleVector* _scales=0 )
hitThreshold = _hitT
winStride = _winS
padding = _
levelScale = _levelS
weights = _
scales = _
void operator()( const BlockedRange& range ) const
int i, i1 = range.begin(), i2 = range.end();
double minScale = i1 & 0 ? levelScale[i1] : i2 & 1 ? levelScale[i1+1] : std::max(img.cols, img.rows);
Size maxSz(cvCeil(img.cols/minScale), cvCeil(img.rows/minScale));
Mat smallerImgBuf(maxSz, img.type());
vector&double& hitsW
for( i = i1; i & i2; i++ )
double scale = levelScale[i];
Size sz(cvRound(img.cols/scale), cvRound(img.rows/scale));
Mat smallerImg(sz, img.type(), smallerImgBuf.data);
if( sz == img.size() )
smallerImg = Mat(sz, img.type(), img.data, img.step);
resize(img, smallerImg, sz);
hog-&detect(smallerImg, locations, hitsWeights, hitThreshold, winStride, padding);
Size scaledWinSize = Size(cvRound(hog-&winSize.width*scale), cvRound(hog-&winSize.height*scale));
for( size_t j = 0; j & locations.size(); j++ )
scaledWinSize.width, scaledWinSize.height));
if (scales) {
if (weights && (!hitsWeights.empty()))
for (size_t j = 0; j & locations.size(); j++)
const HOGDescriptor*
double hitT
const double* levelS
//typedef tbb::concurrent_vector&Rect& ConcurrentRectV
//typedef tbb::concurrent_vector&double& ConcurrentDoubleV
void HOGDescriptor::detectMultiScale(
const Mat& img, vector&Rect&& foundLocations, vector&double&& foundWeights,
double hitThreshold, Size winStride, Size padding,
double scale0, double finalThreshold, bool useMeanshiftGrouping) const
double scale = 1.;
int levels = 0;
vector&double& levelS
for( levels = 0; levels & levels++ )
if( cvRound(img.cols/scale) & winSize.width ||
cvRound(img.rows/scale) & winSize.height ||
scale0 &= 1 )
scale *= scale0;
levels = std::max(levels, 1);
ConcurrentRectVector allC
ConcurrentDoubleVector tempS
ConcurrentDoubleVector tempW
vector&double& foundS
parallel_for(BlockedRange(0, (int)levelScale.size()),
HOGInvoker(this, img, hitThreshold, winStride, padding, &levelScale[0], &allCandidates, &tempWeights, &tempScales));
std::copy(tempScales.begin(), tempScales.end(), back_inserter(foundScales));
std::copy(allCandidates.begin(), allCandidates.end(), back_inserter(foundLocations));
std::copy(tempWeights.begin(), tempWeights.end(), back_inserter(foundWeights));
if ( useMeanshiftGrouping )
groupRectangles_meanshift(foundLocations, foundWeights, foundScales, finalThreshold, winSize);
groupRectangles(foundLocations, (int)finalThreshold, 0.2);
void HOGDescriptor::detectMultiScale(const Mat& img, vector&Rect&& foundLocations,
double hitThreshold, Size winStride, Size padding,
double scale0, double finalThreshold, bool useMeanshiftGrouping) const
vector&double& foundW
detectMultiScale(img, foundLocations, foundWeights, hitThreshold, winStride,
padding, scale0, finalThreshold, useMeanshiftGrouping);
typedef RTTIImpl&HOGDescriptor& HOGRTTI;
CvType hog_type( CV_TYPE_NAME_HOG_DESCRIPTOR, HOGRTTI::isInstance,
HOGRTTI::release, HOGRTTI::read, HOGRTTI::write, HOGRTTI::clone);
vector&float& HOGDescriptor::getDefaultPeopleDetector()
static const float detector[] = {
0.f, -0.f, -0.f, 0.f,
0.f, -0.f, 0.f, ........
return vector&float&(detector, detector + sizeof(detector)/sizeof(detector[0]));
//This function renurn 1981 SVM coeffs obtained from daimler's base.
//To use these coeffs the detection window size should be (48,96)
vector&float& HOGDescriptor::getDaimlerPeopleDetector()
static const float detector[] = {
0.294350f, -0.098796f, -0.129522f, 0.078753f,
0.387527f, 0.261529f, 0.145939f, 0.061520f,
return vector&float&(detector, detector + sizeof(detector)/sizeof(detector[0]));
//////////////// HOG (Histogram-of-Oriented-Gradients) Descriptor and Object Detector //////////////
struct CV_EXPORTS_W HOGDescriptor
enum { L2Hys=0 };
enum { DEFAULT_NLEVELS=64 };
CV_WRAP HOGDescriptor() : winSize(64,128), blockSize(16,16), blockStride(8,8),
cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),
histogramNormType(HOGDescriptor::L2Hys), L2HysThreshold(0.2), gammaCorrection(true),
CV_WRAP HOGDescriptor(Size _winSize, Size _blockSize, Size _blockStride,
Size _cellSize, int _nbins, int _derivAperture=1, double _winSigma=-1,
int _histogramNormType=HOGDescriptor::L2Hys,
double _L2HysThreshold=0.2, bool _gammaCorrection=false,
int _nlevels=HOGDescriptor::DEFAULT_NLEVELS)
: winSize(_winSize), blockSize(_blockSize), blockStride(_blockStride), cellSize(_cellSize),
nbins(_nbins), derivAperture(_derivAperture), winSigma(_winSigma),
histogramNormType(_histogramNormType), L2HysThreshold(_L2HysThreshold),
gammaCorrection(_gammaCorrection), nlevels(_nlevels)
CV_WRAP HOGDescriptor(const String& filename)
HOGDescriptor(const HOGDescriptor& d)
virtual ~HOGDescriptor() {}
//size_t是一个long unsigned int型
CV_WRAP size_t getDescriptorSize()
CV_WRAP bool checkDetectorSize()
CV_WRAP double getWinSigma()
CV_WRAP virtual void setSVMDetector(InputArray _svmdetector);
virtual bool read(FileNode& fn);
virtual void write(FileStorage& fs, const String& objname)
CV_WRAP virtual bool load(const String& filename, const String& objname=String());
CV_WRAP virtual void save(const String& filename, const String& objname=String())
virtual void copyTo(HOGDescriptor& c)
CV_WRAP virtual void compute(const Mat& img,
CV_OUT vector&float&& descriptors,
Size winStride=Size(), Size padding=Size(),
const vector&Point&& locations=vector&Point&())
//with found weights output
CV_WRAP virtual void detect(const Mat& img, CV_OUT vector&Point&& foundLocations,
CV_OUT vector&double&& weights,
double hitThreshold=0, Size winStride=Size(),
Size padding=Size(),
const vector&Point&& searchLocations=vector&Point&())
//without found weights output
virtual void detect(const Mat& img, CV_OUT vector&Point&& foundLocations,
double hitThreshold=0, Size winStride=Size(),
Size padding=Size(),
const vector&Point&& searchLocations=vector&Point&())
//with result weights output
CV_WRAP virtual void detectMultiScale(const Mat& img, CV_OUT vector&Rect&& foundLocations,
CV_OUT vector&double&& foundWeights, double hitThreshold=0,
Size winStride=Size(), Size padding=Size(), double scale=1.05,
double finalThreshold=2.0,bool useMeanshiftGrouping = false)
//without found weights output
virtual void detectMultiScale(const Mat& img, CV_OUT vector&Rect&& foundLocations,
double hitThreshold=0, Size winStride=Size(),
Size padding=Size(), double scale=1.05,
double finalThreshold=2.0, bool useMeanshiftGrouping = false)
CV_WRAP virtual void computeGradient(const Mat& img, CV_OUT Mat& grad, CV_OUT Mat& angleOfs,
Size paddingTL=Size(), Size paddingBR=Size())
CV_WRAP static vector&float& getDefaultPeopleDetector();
CV_WRAP static vector&float& getDaimlerPeopleDetector();
CV_PROP Size winS
CV_PROP Size blockS
CV_PROP Size blockS
CV_PROP Size cellS
CV_PROP int derivA
CV_PROP double winS
CV_PROP int histogramNormT
CV_PROP double L2HysT
CV_PROP bool gammaC
CV_PROP vector&float& svmD
