Pairwise distance between pairs of objects
D = pdist(X)
D = pdist(X,distance)
D = pdist(X)
计算 X 中各对行向量的相互距离(X是一个m-by-n的矩阵). 这里 D 要特别注意,D
是一个长为m(m&1)/2的行向量.可以这样理解 D 的生成:首先生成一个 X
(3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m&1).可以用命令
D = pdist(X,distance)
Given an m-by-n data matrix X, which is treated as m (1-by-n) row
vectors x1, x2, ..., xm, the various distances between the vector
xs and xt are defined as follows:
欧几里德距离Euclidean distance('euclidean')
d 2 s,t =(x s x t )(x s x t ) &
Notice that the Euclidean distance is a special case of the
Minkowski metric, where p = 2.
标准欧几里德距离Standardized Euclidean
d 2 s,t =(x s x t )V 1 (x s x t ) &
where V is the n-by-n diagonal matrix whose jth diagonal element is
S(j)2, where S is the vector of standard
马哈拉诺比斯距离Mahalanobis distance('mahalanobis')
d 2 s,t =(x s x t )C 1 (x s x t ) &
where C is the covariance matrix.
马氏距离是由印度统计学家马哈拉诺比斯(P. C.
曼哈顿距离(城市区块距离)City block metric('cityblock')
d s,t =∑ j=1 n ∣ ∣ x s j &x t j
&∣ ∣ &
Notice that the city block distance is a special case of the
Minkowski metric, where p=1.
闵可夫斯基距离Minkowski metric('minkowski')
d s,t =∑ j=1 n ∣ ∣ x s j &x t j
&∣ ∣ &p &
Notice that for the special case of p = 1, the Minkowski metric
gives the city block metric, for the special case of p = 2, the
Minkowski metric gives the Euclidean distance, and for the special
case of p = ∞, the Minkowski metric gives the Chebychev
切比雪夫距离Chebychev distance('chebychev')
d s,t =max j ∣ ∣ x s j &x t j &∣
Notice that the Chebychev distance is a special case of the
Minkowski metric, where p = ∞.
夹角余弦距离Cosine distance('cosine')
d s,t =1x s x t && &∥x s ∥ 2 ∥x t
∥ 2 & &
相关距离Correlation distance('correlation')
d s,t =1x s x t && &(x s x s
&ˉ ˉ ˉ &)(x s x s
&ˉ ˉ ˉ &) & & √
(x t x t &ˉ ˉ ˉ &)(x t x t
&ˉ ˉ ˉ &) & & √
汉明距离Hamming distance('hamming')
d s,t =(#(x s j &≠x t j &) n
杰卡德距离Jaccard distance('jaccard')
d s,t =#[(x s j &≠x t j &)∩((x s
j &≠0)∪(x t j &≠0))] #[(x s j
&≠0)∪(x t j &≠0)]
Spearman distance('spearman')
d s,t =1(r s r s &ˉ ˉ ˉ &)(r t r
t &ˉ ˉ ˉ &) &
&(r s r s &ˉ ˉ ˉ
&)(r s r s &ˉ ˉ ˉ
&) & & √ (r t r t
&ˉ ˉ ˉ &)(r t r t
&ˉ ˉ ˉ &) & & √
rsj is the rank of xsj taken over x1j, x2j, ...xmj, as computed by
rs and rt are the coordinate-wise rank vectors of xs and xt, i.e.,
rs = (rs1, rs2, ... rsn)
r s &ˉ ˉ ˉ &=1 n
&∑ j r s j &=n+1 2
r t &ˉ ˉ ˉ &=1 n
&∑ j r t j &=n+1 2
Pairwise distance between two sets of
D = pdist2(X,Y)
D = pdist2(X,Y,distance)
D = pdist2(X,Y,'minkowski',P)
D = pdist2(X,Y,'mahalanobis',C)
D = pdist2(X,Y,distance,'Smallest',K)
D = pdist2(X,Y,distance,'Largest',K)
[D,I] = pdist2(X,Y,distance,'Smallest',K)
[D,I] = pdist2(X,Y,distance,'Largest',K)
这里 X 是 mx-by-n 维矩阵,Y 是 my-by-n 维矩阵,生成 mx-by-my 维距离矩阵
[D,I] = pdist2(X,Y,distance,'Smallest',K)
生成 K-by-my 维矩阵 D 和同维矩阵 I,其中D的每列是原距离矩阵中最小的元素,按从小到大排列,I
中对应的列即为其索引号。注意,这里每列各自独立地取 K 个最小值。
例如,令原mx-by-my 维距离矩阵为A,则 K-by-my 维矩阵 D 满足
返回值为向量形式:内容为M*N矩阵X中各成对成分之间两两的欧式距离. D =
其他具体用法: D = pdist(X, DISTANCE) computes D using DISTANCE.
&Choices are:
& 'euclidean' & - Euclidean
distance (default)
& 'seuclidean' &- Standardized
Euclidean distance. Each coordinate
& & & difference
between rows in X is scaled by dividing
& & & by the
corresponding element of the standard
& & & deviation
S=NANSTD(X). To specify another value for
& & & S, use
& 'cityblock' & - City Block
& 'minkowski' & - Minkowski
distance. The default exponent is 2. To
& & & specify a
different exponent, use
pdist(X,'minkowski',P), where the exponent P is
& & & a scalar
positive value.
& 'chebychev' & - Chebychev
distance (maximum coordinate difference)
& 'mahalanobis' - Mahalanobis distance, using the
sample covariance
& & & of X as
computed by NANCOV. To compute the distance
& & & with a
different covariance, use
&pdist(X,'mahalanobis',C), where the matrix
symmetric and positive definite.
& 'cosine' & &
&- One minus the cosine of the included
& & & between
observations (treated as vectors)
& 'correlation' - One minus the sample linear
correlation between
observations (treated as sequences of values).
& 'spearman' &
&- One minus the sample Spearman's rank
& & & between
observations (treated as sequences of values).
& 'hamming' & &
- Hamming distance, percentage of coordinates
& & & that
& 'jaccard' & &
- One minus the Jaccard coefficient, the
& & & percentage
of nonzero coordinates that differ
& function & &
&- A distance function specified using @,
& & & example
& & A distance function must
be of the form
& & function D2 = DISTFUN(XI,
& & taking as arguments a
1-by-N vector XI containing a single row of X, an
& & M2-by-N matrix XJ
containing multiple rows of X, and returning an
& & M2-by-1 vector of
distances D2, whose Jth element is the distance
& & between the observations
XI and XJ(J,:).
& & The output D is arranged
in the order of ((2,1),(3,1),..., (M,1),
(3,2),...(M,2),.....(M,M-1)), i.e. the lower left triangle of the
& & M-by-M distance matrix
in column order. &To get the distance
& & the Ith and Jth
observations (I & J), either use the formula
& & D((I-1)*(M-I/2)+J-I), or
use the helper function Z = SQUAREFORM(D),
& & which returns an M-by-M
square symmetric matrix, with the (I,J) entry
& & equal to distance
between observation I and observation J.
& & Example:
&% Compute the ordinary Euclidean distance
&X = randn(100, 5); &
& % some random points
&D = pdist(X, 'euclidean'); &
euclidean distance
&% Compute the Euclidean distance with each
coordinate difference
&% scaled by the standard deviation
&Dstd = pdist(X,'seuclidean');
&% Use a function handle to compute a distance
that weights each
&% coordinate contribution differently
&Wgts = [.1 .3 .3 .2 .1]; &
& % coordinate weights
&weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 *
&Dwgt = pdist(X, @(Xi,Xj) weuc(Xi,Xj,Wgts));
% pdist 函数
yangben = load('ca.txt');
t = size(yangben);
t1 = t(1);
t2 = t(2);
ca =yangben(:,2:t2-1);
