为什么要对url进行js url encodee

点击联系发帖人 时间：2017-03-20 09:18

encodeurl

温馨提示！由于新浪微博认证机制调整，您的新浪微博帐号绑定已过期，请重新绑定！&&|&&
LOFTER精选
网易考拉推荐
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
阅读(4170)|
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
历史上的今天
loftPermalink:'',
id:'fks_',
blogTitle:'转载：为什么要对URI进行编码',
blogAbstract:'为什么需要Url编码，通常如果一样东西需要编码，说明这样东西并不适合传输。原因多种多样，如Size过大，包含隐私数据，对于Url来说，之所以要进行编码，是因为Url中有些字符会引起歧义。
例如Url参数字符串中使用key=value键值对这样的形式来传参，键值对之间以&符号分隔，如/s?q=abc&ie=utf- 8。如果你的value字符串中包含了=或者&，那么势必会造成接收Url的服务器解析错误，因此必须将引起歧义的&和=符号进行转义，也就是对其进行编码。
又如，Url的编码格式采用的是ASCII码，而不是Unicode，这也就是说你不能在Url中包含任何非ASCII字符，例如中文。否则如果客户端浏览器和服务端浏览器支持的字符集不同的情况下，中文可能会造成问题。&& & &
&Url编码',
blogTag:'uri编码',
blogUrl:'blog/static/',
isPublished:1,
istop:false,
modifyTime:0,
publishTime:4,
permalink:'blog/static/',
commentCount:0,
mainCommentCount:0,
recommendCount:3,
bsrk:-100,
publisherId:0,
recomBlogHome:false,
currentRecomBlog:false,
attachmentsFileIds:[],
groupInfo:{},
friendstatus:'none',
followstatus:'unFollow',
pubSucc:'',
visitorProvince:'',
visitorCity:'',
visitorNewUser:false,
postAddInfo:{},
mset:'000',
remindgoodnightblog:false,
isBlackVisitor:false,
isShowYodaoAd:false,
hostIntro:'傻瓜一样',
hmcon:'1',
selfRecomBlogCount:'0',
lofter_single:''
{list a as x}
{if x.moveFrom=='wap'}
{elseif x.moveFrom=='iphone'}
{elseif x.moveFrom=='android'}
{elseif x.moveFrom=='mobile'}
${a.selfIntro|escape}{if great260}${suplement}{/if}
{list a as x}
推荐过这篇日志的人：
{list a as x}
{if !!b&&b.length>0}
他们还推荐了：
{list b as y}
转载记录：
{list d as x}
{list a as x}
{list a as x}
{list a as x}
{list a as x}
{if x_index>4}{break}{/if}
${fn2(x.publishTime,'yyyy-MM-dd HH:mm:ss')}
{list a as x}
{if !!(blogDetail.preBlogPermalink)}
{if !!(blogDetail.nextBlogPermalink)}
{list a as x}
{if defined('newslist')&&newslist.length>0}
{list newslist as x}
{if x_index>7}{break}{/if}
{list a as x}
{var first_option =}
{list x.voteDetailList as voteToOption}
{if voteToOption==1}
{if first_option==false},{/if}&&“${b[voteToOption_index]}”&&
{if (x.role!="-1") },“我是${c[x.role]}”&&{/if}
&&&&&&&&${fn1(x.voteTime)}
{if x.userName==''}{/if}
网易公司版权所有&&
{list x.l as y}
{if defined('wl')}
{list wl as x}{/list}URL编码背景
在URI的最初设计时，希望能通过书面转录，比如写在餐巾纸上告诉另外一人，因此URI的构成字符必须是可写的ASCII字符。在这些可书写的字符里，由于一些字符在不同操作系统的编码有不同的解析，被包含在“不安全字符”之中，要格外注意。
（不安全字符）
在URI的构成字符中，最安全的方案是正确使用“保留字符” 和 “非保留字符”的并集
保留字符：在URL中起到职能型的字符，比如 & ，？。所以被URL规则给“保留”了。
（保留字和非保留字）
重点:percent encode
何谓percent encode呢？意指百分号编码，借助%来进行有效编码。URL encode的实质就是正确的使用percent encode.
percent encode ＝
什么时候，对哪些内容，采用何种过滤原则，以及如何生成percent编码？
　　在WWW最初时，做法是将字符流转换成字节流，按照ASCII字符与字节一一对应可相互转换，使用对应ASCII字符的整型值作为%的后两个16进制字符，构成percent编码。后来出现了多种percent编码生成方法，导致了URI的难以识别。
　　目前做法，指定或系统默认的使用UTF8转成字节流，每个字节编成一个percent编码，例如:中文“网易”的URL编码为%e7%bd%91%e6%98%93，而其UTF8字节流为e7 bd 91 e6 93，可以看出其一一对应关系
　　那么percent编码是在对非法字符采用某种编码（约定为UTF8）转成字节流后，逐字节加上%构成percent编码。
　　由于不同scheme或协议对URI格式有不同的要求，RFC关于对哪些内容编码，采用何种过滤原则不做硬性规定。而将决定权延后到执行时由开发者根据需要决定。
重点：URL编码需要遵守的原则：
不要对Unreserved Characters做percent encode编码。
除了保留字符和非保留字符外的所有字符，必须使用percent encode进行编码。
保留字符不用于URI分隔符，而是用于其它位置，比如query部分的value时，要对这时用到的保留字符做percent encode编码。
不应当对”保留字”在作为”保留字的使用场景”时使用percent encode编码。比如“”中的？此时就是作为保留字来使用的，不应当被percent encode。当两个URI的字符几乎对等，区别只在于一个对某些字符用的原有字符，另一个URI对这些字符做了percent encode时。绝大部分情况下，这两个URI应当被认为是不同的两个URI。
不正确的URL encode可能导致的问题
禁止对URI中的保留字作编码，比如&字符。
比如做编码后变成了http%3a%2f%%2findex.htm，将不能正常访问。
如果url参数值带有保留字符，encode之。
当构建参数传入{“name” : ”namepart1&namepart2”，“id” : “kk”}。
此时拼接字符串编成了，那么如何解析得到”name”字段“namepart1&namepart2”的实际value值，以及id字段的值”kk”？
潜在的语义攻击风险
当构建参数传入{“name” : ”Mitty&isLogin=true”}。此时拼接字符串编成了，如果isLogin真的是有意义的queryKey时，直接造成服务器接收了额外的参数。当然关于URL的攻击有很多，比如semantic attack, 这里不做讨论。
&&相关文章推荐
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：55556次
排名：千里之外
原创：120篇
(2)(9)(9)(3)(2)(7)(3)(1)(3)(9)(7)(9)(13)(17)(7)(18)(2)(1)温馨提示！由于新浪微博认证机制调整，您的新浪微博帐号绑定已过期，请重新绑定！&&|&&
就是个疯子
LOFTER精选
网易考拉推荐
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
(1) url编码：import urlliburl = '/s?wd=哈哈'url = url.decode('gbk', 'replace')print urllib.quote(url.encode('utf-8', 'replace'))结果:&http%3a%%2fs%3fwd%3d%e5%93%88%e5%93%88(2) url解码:import urllibencoded_url = 'http%3a%%2fs%3fwd%3d%e5%93%88%e5%93%88'print urllib.unquote(encoded_url).decode('utf-8', 'replace').encode('gbk', 'replace')函数调用的参数以及结果都是utf-8编码的，所以在对url编码时，需要将参数串的编码从原始编码转换成utf-8，对url解码时，需要将解码结果从utf-8转换成原始编码格式。
阅读(16506)|
用微信&&“扫一扫”
将文章分享到朋友圈。
用易信&&“扫一扫”
将文章分享到朋友圈。
历史上的今天
loftPermalink:'',
id:'fks_',
blogTitle:'使用python对url编码解码',
blogAbstract:'写cgi经常碰到的一个问题就是对url进行编码和解码，python提供了很方便的接口进行调用。url中的query带有特殊字符（不是url的保留字）时需要进行编码。当url中带有汉字时，需要特殊的处理才能正确编码，以下都只针对这种情形，当然也适用于纯英文字符的url。(1) url编码：import urlliburl = \'/s?wd=哈哈\'url = url.decode(\'gbk\', \'replace\')print urllib.quote(url.encode(\'utf-8\', \'replace\'))',
blogTag:'url编码,python',
blogUrl:'blog/static/',
isPublished:1,
istop:false,
modifyTime:3,
publishTime:4,
permalink:'blog/static/',
commentCount:0,
mainCommentCount:0,
recommendCount:1,
bsrk:-100,
publisherId:0,
recomBlogHome:false,
currentRecomBlog:false,
attachmentsFileIds:[],
groupInfo:{},
friendstatus:'none',
followstatus:'unFollow',
pubSucc:'',
visitorProvince:'',
visitorCity:'',
visitorNewUser:false,
postAddInfo:{},
mset:'000',
remindgoodnightblog:false,
isBlackVisitor:false,
isShowYodaoAd:false,
hostIntro:'就是个疯子',
hmcon:'1',
selfRecomBlogCount:'0',
lofter_single:''
{list a as x}
{if x.moveFrom=='wap'}
{elseif x.moveFrom=='iphone'}
{elseif x.moveFrom=='android'}
{elseif x.moveFrom=='mobile'}
${a.selfIntro|escape}{if great260}${suplement}{/if}
{list a as x}
推荐过这篇日志的人：
{list a as x}
{if !!b&&b.length>0}
他们还推荐了：
{list b as y}
转载记录：
{list d as x}
{list a as x}
{list a as x}
{list a as x}
{list a as x}
{if x_index>4}{break}{/if}
${fn2(x.publishTime,'yyyy-MM-dd HH:mm:ss')}
{list a as x}
{if !!(blogDetail.preBlogPermalink)}
{if !!(blogDetail.nextBlogPermalink)}
{list a as x}
{if defined('newslist')&&newslist.length>0}
{list newslist as x}
{if x_index>7}{break}{/if}
{list a as x}
{var first_option =}
{list x.voteDetailList as voteToOption}
{if voteToOption==1}
{if first_option==false},{/if}&&“${b[voteToOption_index]}”&&
{if (x.role!="-1") },“我是${c[x.role]}”&&{/if}
&&&&&&&&${fn1(x.voteTime)}
{if x.userName==''}{/if}
网易公司版权所有&&
{list x.l as y}
{if defined('wl')}
{list wl as x}{/list}url encode的问题 - 椰子 - ITeye博客
博客分类：
1.urlencode和decode
字符的编码和解码在有中文和特殊符号的情况下，常常是一个头疼的问题。url的encode和decode是解决这个问题的一个分支，通过简单的算法将特殊字符编码，其大致算法如下：
The alphanumeric characters “a” through “z”, “A” through “Z” and “0″ through “9″ remain the same.
The special characters “.”, “-”, “*”, and “_” remain the same.
The space character ” ” is converted into a plus sign “+”.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string “%xy”, where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used。
简单来讲，就是将一个非英文的字符先用一定的编码方式（比如UTF-8)编码得到3个字节，然后每个字节的8位用两个16进制的字符来表示，前面再加上%。java处理伪代码描述如下：
StringBuilder sb = new StringBuilder();
for (int i = 0; i & s.length();) {
int c = (int) s.charAt(i);
if (c == ' ') {
sb.append('+');
}else if( c == '%') {
sb.append("%25");
}else if(c 符合前面第4条的描述，是特殊字符){
byte[] ba = str.getBytes(charset);
for (int j = 0; j & ba. j++) {
String ts = Integer.toHexString(b)取后两位;
sb.append("%").append(ts);
sb.append(c);
String result = sb.toString();
StringBuilder sb = new StringBuilder();
for (int i = 0; i & s.length();) {
int c = (int) s.charAt(i);
if (c == ' ') {
sb.append('+');
}else if( c == '%') {
sb.append("%25");
}else if(c 符合前面第4条的描述，是特殊字符){
byte[] ba = str.getBytes(charset);
for (int j = 0; j & ba. j++) {
String ts = Integer.toHexString(b)取后两位;
sb.append("%").append(ts);
sb.append(c);
String result = sb.toString();
通过这样的方式，比如 “a中国” 就会变成”a%E4%B8%AD%E5%9B%BD”,在发送端编码，在接受方使用相反的算法解码即可。但是这里面的几个特殊字符，比如%，常常会带来一些隐晦的问题。
2. 问题一:apache的rewrite
在做统一域名迁移的时候，遇到了一例这样的问题，现象是以前传递过来的一个正确参数现在超长了，排查后发现，由于为了兼容两个域名，我们对于某些url做了一个rewrite，而apache的rewrite模块默认会对%这样的字符转换为%25，再发送rewrite的响应到浏览器，因此，参数就由%252BeNh变成了%25252BeNh,导致超长了。
解决办法，修改apache的rewrite参数，添加一个NE, 如下：
RewriteRule ^/martini/(.*)$
/eve/$1 [L,R,NE]
RewriteRule ^/martini/(.*)$
/eve/$1 [L,R,NE]
RewriteRule ^/martini/(.*)$
/eve/$1 [L,R,NE]
问题得到解决，更多apache的rewrite配置可以参考：
. 问题二:容器的自动decode
在eve的tracelog模块中，会将目标url作为一个参数，经过UrlEncoder.encode后，包装到eve的url中，然后发送邮件给客户，这样客户点击时，就可以从eve进行跳转，从而记录下相关访问数据。如目标url为，包装后变成/dispatch?targetUrl=http%3A%2F%&logid=12345.
但是在遇到了一次问题，当目标url本身也有参数，而且是经过编码的中文参数的时候，就出现了问题。eve的使用方先用gbk字符将目标url编码成了如下样式：
然后eve像平常一样，将这个url用utf-8再次encode，拼装得到url如下：
结果，客户点击跳转链接，出错了。
在eve的跳转处理servlet中，大概处理逻辑是如下：
String url = req.getParameter(TracelogDto.TARGET_URL);
url = URLDecoder.decode(url, "utf-8");
String url = req.getParameter(TracelogDto.TARGET_URL);
url = URLDecoder.decode(url, "utf-8");
理想情况下处理出来的url是?user=%EE%E2实际却是：?user=？
开始怀疑是apache转发的问题，后来排查发现是jboss的req.getParameter()时，对该参数已经进行了一次decode了，所以照成这个问题。之后查了一下代码，发现tomcat和jetty都在getParameter时都做了这个事情，找了下servlet规范和相关java的接口，没有明确规定在.getParameter时需要先decode，网上倒是有文章说，但事实上，主流容器都做了这个事情。
目前request经过解析了的方法有：parsePostData(int, ServletInputStream)，parseQueryString(String)，getContextPath()，
getPathInfo()没有解析的方法有：getRequestURI()
因此，最终的解决方案是去掉跳转代码中的decode就可以了，这本来是一个很简单的问题，但因为不知道容器做了一次decode，而带来了一些困扰。
jetty的getParameter decode的调用流程：
--UrlEncoded.decodeUtf8To(..);
--org.eclipse.jetty.http.HttpURI.decodeQueryTo(MultiMap parameters)
--org.eclipse.jetty.server.Request. extractParameters()
--org.eclipse.jetty.server.Request.getParameter(String name)
tomcat的getParameter decode的调用流程：
--org.apache.tomcat.util.http.Parameters.urlDecode
--org.apache.tomcat.util.http.Parameters.processParameters
--org.apache.tomcat.util.http.Parameters.handleQueryParameters
--org.apache.catalina.connector.Request.handleQueryParameters
--org.apache.catalina.connector.Request.parseParameters
--org.apache.catalina.connector.Request.getParameter(String name)
--org.apache.catalina.connector.RequestFacade.getParameter(String name)
另外，encode和decode都需要指定一个字符集，如果UTF-8，GBK，或者ISO-8859-1，tomcat在不指定的情况下，queryString和body都是用ISO-8859-1来做decode的。
容器的编码问题
对于tomcat在getParameter时的字符处理，代码如下：
String enc = getCharacterEncoding();
boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
if (enc != null) {
parameters.setEncoding(enc);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding(enc);
parameters.setEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
String enc = getCharacterEncoding();
boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
if (enc != null) {
parameters.setEncoding(enc);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding(enc);
parameters.setEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
该段代码表明，如果设置了request的(可以通过在filter中设置request.setCharacterEncoding(charset)，或者在http头中设置Content-type: application/x-www-form- charset=UTF-8；那么就用指定的编码解析请求body中的paramter。
如果之前响应到浏览器的HTML代码里有类似&meta http-equiv=”Content-Type” content=”text/ charset=GBK” /&,那么此HTML的form表单将以html指定的的编码方式提交数据（但注意，ie虽然以这个编码，但是并不将Content-type: application/x-www-form- charset=UTF-8 这个charset设置上去，所以在容器中request.getCharacterEncoding()是null的。事实上，我们的LocaleFilter就是解决这个问题。)则body的解码使用对应的charset。
对于QueryString是否使用，还要看看useBodyEncodingForURI的设置.这个设置是由connector来设置的，也就是容器的配置，不是每次请求可以变的，如果没有设置，queryString就会以ISO-8859-1的方式来解码（包括urldecode）。
对于浏览器来说，queryStirng如果是在window下，ie就以GBK编码传输，firefox以GBK编码做urlEncode后传输。
然后对于容器来说，一般还有一个URIencoding可以设置，是控制整个uri的编码方式的
3. jetty的处理
1.url的处理
jetty对于url和querystr，在每个HttpConnection初始化的时候，有如下代码
_uri = StringUtil.__UTF8.equals(URIUtil.__CHARSET)?new HttpURI():new EncodedHttpURI(URIUtil.__CHARSET);
_uri = StringUtil.__UTF8.equals(URIUtil.__CHARSET)?new HttpURI():new EncodedHttpURI(URIUtil.__CHARSET);
在URIUtil中：
final String __CHARSET=System.getProperty("org.eclipse.jetty.util.URI.charset",StringUtil.__UTF8);
final String __CHARSET=System.getProperty("org.eclipse.jetty.util.URI.charset",StringUtil.__UTF8);
因此我们可以看到，对于url，默认使用UTF-8处理，如果设置了org.eclipse.jetty.util.URI.charset，就用设置的字符编码处理。
2.querystr的处理
if (_uri!=null && _uri.hasQuery())
if (_queryEncoding==null)
_uri.decodeQueryTo(_baseParameters);
_uri.decodeQueryTo(_baseParameters,_queryEncoding);
if (_uri!=null && _uri.hasQuery())
if (_queryEncoding==null)
_uri.decodeQueryTo(_baseParameters);
_uri.decodeQueryTo(_baseParameters,_queryEncoding);
可以看到，如果设置了queryEncoding，就会按照设置的编码来解析，在Request中，有方法publicvoid setQueryEncoding(String queryEncoding);也可以通过request.setAttribute来设置
publicvoid setAttribute(String name, Object value) {
if ("org.eclipse.jetty.server.Request.queryEncoding".equals(name))
setQueryEncoding(value==null?null:value.toString());
publicvoid setAttribute(String name, Object value) {
if ("org.eclipse.jetty.server.Request.queryEncoding".equals(name))
setQueryEncoding(value==null?null:value.toString());
如果没有设置queryEncoding，会是什么情况呢？在EncodedHttpUri中，有如下代码
public void decodeQueryTo(MultiMap parameters)
if (_query==_fragment)
UrlEncoded.decodeTo(StringUtil.toString(_raw,_query+1,_fragment-_query-1,_encoding),parameters,_encoding);
public void decodeQueryTo(MultiMap parameters)
if (_query==_fragment)
UrlEncoded.decodeTo(StringUtil.toString(_raw,_query+1,_fragment-_query-1,_encoding),parameters,_encoding);
可以看到，会使用_encoding参数，这个就是前面new出EncdodeHttpUri的传入参数，即org.eclipse.jetty.util.URI.charset设置的参数。
因此，对于queryStr，如果请求中设置了_queryEncoding，就用他的编码，否则用系统参数org.eclipse.jetty.util.URI.charset设置的编码，否则用默认编码UTF-8
3.body部分
默认使用 UTF-8 编码，当然可以在使用之前使用 request.set CharacterEncoding 设定编码.
注：网上有资料说POST 参数默认使用 Content-type 中的 Charset 编码，但看了下源码，tomcat是有这个功能的，在getCharacterEncoding的时候，有一个如果为null则去ContentType中取的动作，但jetty好像没有）
coconut_zhang
浏览: 312571 次
来自: 天津
设置了.setUseTemporaryFileDuringWr ...
写的很详细，但是我现在想知道他们是怎么定位log4j.prop ...
谢谢！帮了大忙。
不错，非常感谢
哥，你就不能上传点配置文件什么的吗？加我QQ}

叫阿莫西中心

为什么要对url进行js url encodee

我要回帖

更多关于 encodeurl 的文章

更多推荐