pandas 中的列表有没有jquery find方法方法

点击联系发帖人 时间：2017-01-30 03:20

对方好友列表有没有你

pandaseq在linux环境下的安装
已有 689 次阅读
|个人分类:|系统分类:|关键词:pandaseq
& & 根据提示， & &1）先在linux服务器上进行yum install zlib-devel bzip2-devel libtool-ltdl-devel libtool，其中libtool-ltdl-devel提示不存在这个包，暂时没管它； & &2）执行./autogen.sh，出现提示AM_PROG_AR的错误信息，采用建议的策略进行处理，注释掉其中关于AM_PROG_AR的行； & &3）./configure编译，报错信息如图，利用find命令去查找ltdl.h的路径，然后利用设置环境变量去处理，export &CFLAGS=-I/path/to/ltdl.h/ LDFLAGS=-L/path/to/ltdl.h/ LD_LIBRARY_PATH=/path/to/ltdl.h/$LD_LIBRARY_PATH，重新运行./configure，错误信息依然没变，尝试了不同的策略之后，错误信息一直存在，猜测是ltdl没有安装成功导致，为此进入ltdl.h的路径，进行ltdl.h的编译与安装，期间出现报错信息，利用find命令找到config-ml.in，然后拷贝到相应目录下，再重新./confiure&make&make install，最后ltdl.h安装成功。重新pandaseq的./configure运行，顺利进行； & &4）make，未见异常； & &5）make install，未见异常； & &6）最终，./pandaseq -h运行，成功安装，如下
转载本文请联系原作者获取授权，同时请注明本文来自肖斌科学网博客。链接地址：
上一篇：下一篇：
当前推荐数：0
评论 ( 个评论)
扫一扫，分享此博文
作者的其他最新博文
热门博文导读
Powered by
Copyright &5956人阅读
Pandas小记（8）
http://索引IndexMany of these methods or variants thereof are available on the objectsthat contain an index (Series/Dataframe) and those should most likely beused before calling these methods directly.从series对象中找到某元素（行）对应的索引（如果索引是从0开始的连续值，那就是行号了）nodes_id_index = pd.Index(nodes_series)
print(nodes_id_index.get_loc('u_'))[][]更多请参考[]检索/选择dataframe列选择和Series一样，在DataFrame中的一列可以通过字典记法或属性来检索，返回Series：In [43]: frame2['state']
In [44]: frame2.year
three Ohio
three 2002
Name: state
Name: yearNote: 返回的Series包含和DataFrame相同的索引，并它们的 name 属性也被正确的设置了。dataframe选择多列lines = lines[[0, 1, 4]]
或者lines = lines[['user', 'check-in_time', 'location_id']]
dataframe行选择&&& dates = pd.date_range('', periods=6)df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))&&& datesDatetimeIndex(['', '', '', '',& & & & & & & &'', ''],& & & & & & & dtype='datetime64[ns]', freq='D')&&& df& & & & & & & & & &A & & & & B & & & & C & & & & D &2....331978 -1....576272 &1....606468 &0....337163 &2....043864 &0....636364行可以直接通过[]选择，只是必须是数字范围或者字符串范围索引：&&& df['':'']& & & & & & & & & &A & & & & B & & & & C & & & & D -1....576272 &1....606468&&& df[3:5]& & & & & & & & & &A & & & & B & & & & C & & & & D &0....337163 &2....043864Selection by Position ix和iloc行也可以使用一些方法通过位置num或名字label来检索，例如&ix索引成员（field）{更多ix使用实例可参考后面的“索引，挑选和过滤”部分}。Note:&提取特定的某列数据。Python中，可以使用iloc或者ix属性，但是ix更稳定一些。ix{行选；行列选}In [45]: frame2.ix['three']
Name: threedf.ix[3]A & -0.976627B & &0.766333C & -1.043501D & &0.554586Name:
00:00:00, dtype: float64假设我们需数据第一列的前5行：df.ix[:,0].head()&&& df.ix[1:3, 0:3]&#相当于df.ix[1:3, ['A', 'B', 'C']]& & & & & & & & & &A & & & & B & & & & C -1...304359 &1...746928iloc{行选；行列选}Select via the position of the passed integers与ix, [], at的区别是，iloc[3]选择是的数据第3行，而其它如ix[3]选择的是索引为3的那一行！In [32]: df.iloc[3]
00:00:00, dtype: float64
By integer slices, acting similar to numpy/pythonIn [33]: df.iloc[3:5,0:2]
-0..567020
By lists of integer position locations, similar to the numpy/python styleIn [34]: df.iloc[[1,2,4],[0,2]]
-0..494929
-0..276232
For getting fast access to a scalar (equiv to the prior method)In [38]: df.iat[1,1]
Out[38]: -0.30858[].ix，.iloc，loc的区别和注意事项参考下面显式拷贝部分[]Selection by Label仅通过label选择行loc[]For getting a cross section using a labelIn [26]: df.loc[dates[0]]
00:00:00, dtype: float64
Selecting on a multi-axis by labelIn [27]: df.loc[:,['A','B']]
-0..104569
-0..567020
-0..113648[]最快的仅选择单数值at[]For getting fast access to a scalar (equiv to the prior method)In [31]: df.at[dates[0],'A']
Out[31]: 0.18628布尔索引Boolean IndexingUsing a single column’s values to select data.In [39]: df[df.A & 0]
0....135632
1....044236
0....271860
A&where&operation for getting.In [40]: df[df & 0]
...过滤filteringUsing the&&method for filtering:In [41]: df2 = df.copy()
In [42]: df2['E'] = ['one', 'one','two','three','four','three']
In [43]: df2
0....135632
1....044236
-0....071804
0....271860
-0....087401
-0....524988
In [44]: df2[df2['E'].isin(['two','four'])]
-0....071804
-0....087401
four索引，挑选和过滤大多具体的索引规则见前面的“检索/选择”部分Series索引和整数索引Series索引(&obj[...]&)的工作原理类似与NumPy索引，除了可以使用Series的索引值，也可以仅使用整数索引。In [102]: obj = Series(np.arange(4.), index=['a', 'b', 'c', 'd'])
In [103]: obj['b']
In [104]: obj[1]
Out[103]: 1.0
Out[104]: 1.0
In [105]: obj[2:4]
In [106]: obj[['b', 'a', 'd']]
In [107]: obj[[1, 3]]
In [108]: obj[obj & 2]
整数索引操作由整数索引的pandas对象跟内置的Python数据结构 (如列表和元组)在索引语义上有些不同。例如，你可能认为下面这段代码不会产生一个错误：ser = pd.Series(np.arange(3.))serOut[11]: 0&&& 0.01&&& 1.02&&& 2.0dtype: float64ser[-1]这里，有一个含有0,1,2的索引，很难推断出用户想要什么(基于标签或位置的索引);相反，一个非整数索引，就没有这样的歧义：&&&ser2 = pd.Series(np.arange(3.), index=['a', 'b', 'c'])&&&ser2[-1]2.0为了保持良好的一致性，如果轴索引含有索引器，那么根据整数进行数据选取的操作将总是面向标签的。这也包括用ix进行切片：ser.ix[:1]Out[15]: 0&&& 0.01&&& 1.0dtype: float64Series的iget_ value 方法、DataFrame 的 irow 和 icol 方法如果你需要可靠的、不考虑索引类型的、基于位置的索引，可以使用Series的iget_ value 方法和 DataFrame 的 irow 和 icol 方法：&&& ser3 = pd.Series(range(3), index=[-5, 1, 3])&&& ser3.iget_value(2)2&&& frame = pd.DataFrame(np.arange(6).reshape(3, 2), index=[2,0,1])frameOut[21]: && 0& 12& 0& 10& 2& 31& 4& 5&&& frame.irow(0)0&&& 01&&& 1Name: 2, dtype: int32标签切片使用标签来切片和正常的Python切片并不一样，它会把结束点也包括在内：In [109]: obj['b':'c']
索引赋值使用这些函数来赋值In [110]: obj['b':'c'] = 5
In [111]: obj
通过切片或一个布尔数组来选择行，这旨在在这种情况下使得DataFrame的语法更像一个ndarry。In [116]: data[:2]
In [117]: data[data['three'] & 5]
one two three four
one two three four
DataFrame行标签索引 ixDataFrame可以在行上进行标签索引，使你可以从DataFrame选择一个行和列的子集，使用像NumPy的记法再加上轴标签。这也是一种不是很冗长的重新索引的方法：因此，有很多方法来选择和重排包含在pandas对象中的数据。DataFrame方法的简短概要还有分层索引及一些额外的选项。obj.ix[val]从DataFrame的行集选择单行obj.ix[:, val]从列集选择单列obj.ix[val1, val2]选择行和列reindex 方法转换一个或多个轴到新的索引xs 方法通过标签选择单行或单列到一个Seriesicol, irow 方法通过整数位置，分别的选择单行或单列到一个Seriesget_value, set_value 方法通过行和列标选择一个单值Note:在设计pandas时，我觉得不得不敲下 frame[:, col] 来选择一列，是非常冗余的（且易出错的），因此列选择是最常见的操作之一。因此，我做了这个设计权衡，把所有的富标签索引引入到ix 。[]唯一值、值计数以及成员资格唯一值、值计数、成员资格方法方法&&&&&&&&&&&&&&&&&&&&&&&&& 说明isin&&&&&&&&&&&&&&&& 计算一个表示“Series各值是否包含于传入的值序列中”的布尔型数组 unique&&&&&&&&&&& 计算Series中的唯一值数组，按发现的顺序返回 value_counts&&& 返回一个Series,其索引为唯一值，其值为频率，按计数值降序排列这类方法可以从一维Series的值中抽取信息。isin用于判断矢量化集合的成员资格，可用于选取Series中或DataFrame列中数据的子集：&&& obj0&&& c1&&& a2&&& d3&&& a4&&& a5&&& b6&&& b7&&& c8&&& cdtype: object&&&mask=obj.isin(['b','c'])&&& mask0&&&& True...8&&&& Truedtype: bool&&& obj[mask]0&&& c5&&& b6&&& b7&&& c8&&& c&&& obj=Series(['c','a','d','a','a','b','b','c','c'])obj.unique()# 函数是unique，它可以得到Series中的唯一值数组：&&&uniques = obj.unique()&&&uniquesarray(['c', 'a', 'd', 'b'], dtype=object)返冋的唯一值是未排序的，如果需要的话，可以对结果再次进行排序(uniques. sort())。value_counts用于计算一个Series中各值出现的频率：&&& obj.value_counts()c&&& 3a&&& 3b&&& 2d&&& 1dtype: int64为了便于査看，结果Series是按值频率降序排列的。查源码，发现这个统计是通过hashtable实现的。keys, counts = htable.value_count_scalar64(values, dropna)统计数组或序列所有元素出现次数pd.value_countsvalue_counts还是一个顶级pandas方法，可用于任何数组或序列： &&& pd.value_counts(obj.values, sort=False)a&&& 3c&&& 3b&&& 2d&&& 1dtype: int64返回一个pandas.series对象，不过你基本可以将它当成dict一样使用。当然也可以减去一些判断，直接使用pandas.value_counts()调用的hashtable统计方法（lz在源码中看到的）import pandas.hashtable as htable
values = np.array([1, 2, 3, 5, 1, 3, 3, 2, 3, 5])
values_cnts = dict(zip(*htable.value_count_scalar64(values, dropna=True)))
print(values_cnts)apply应用于DataFrame有时，可能希望得到DataFrame中多个相关列的一张柱状图。例如：&&&data = pd.DataFrame({'Qu1': [1, 3, 4, 3, 4],'Qu2': [2, 3, 1, 2, 3],'Qu3': [1, 5, 2, 4, 4]})&&&data&& Qu1& Qu2& Qu30&&& 1&&& 2&&& 11&&& 3&&& 3&&& 52&&& 4&&& 1&&& 23&&& 3&&& 2&&& 44&&& 4&&& 3&&& 4将 pandas.value_counts 传给该 DataFrame 的 apply 函数: In[25]: data.apply(pd.value_counts).fillna(0)&& Qu1& Qu2& Qu31& 1.0& 1.0& 1.02& 0.0& 2.0& 1.03& 2.0& 2.0& 0.04& 2.0& 0.0& 2.05& 0.0& 0.0& 1.0[]索引对象obj.indexpandas的索引对象用来保存坐标轴标签和其它元数据（如坐标轴名或名称）。构建一个Series或DataFrame时任何数组或其它序列标签在内部转化为索引：In [68]: obj = Series(range(3), index=['a', 'b', 'c'])
In [69]: index = obj.index
In [70]: index
Out[70]: Index([a, b, c], dtype=object)
In [71]: index[1:]
Out[71]: Index([b, c], dtype=object)
不可变性索引对象是不可变的，因此不能由用户改变：In [72]: index[1] = 'd'
Exception Traceback (most recent call last)...
Exception: &class 'pandas.core.index.Index'& object is immutable
索引对象的不可变性非常重要，这样它可以在数据结构中结构中安全的共享：In [73]: index = pd.Index(np.arange(3))
In [74]: obj2 = Series([1.5, -2.5, 0], index=index)
In [75]: obj2.index is index
Out[75]: True
pandas中的主要索引对象是库中内建的索引类清单。通过一些开发努力，索引可以被子类化，来实现特定坐标轴索引功能。多数用户不必要知道许多索引对象的知识，但是它们仍然是pandas数据模型的重要部分。pandas中的主要索引对象Int64Index对整形值的特化索引。MultiIndex“分层”索引对象，表示单个轴的多层次的索引。可以被认为是类似的元组的数组。DatetimeIndex存储纳秒时间戳（使用NumPy的datetime64 dtyppe来表示）。PeriodIndex对周期数据（时间间隔的）的特化索引。固定大小集合功能除了类似于阵列，索引也有类似固定大小集合一样的功能In [76]: frame3
state Nevada Ohio
In [77]: 'Ohio' in frame3.columns
Out[77]: True
In [78]: 2003 in frame3.index
Out[78]: False
索引方法和属性每个索引都有许多关于集合逻辑的方法和属性，且能够解决它所包含的数据的常见问题。索引方法和属性diff计算索引的差集intersection计算交集union计算并集isin计算出一个布尔数组表示每一个值是否包含在所传递的集合里delete计算删除位置i的元素的索引drop计算删除所传递的值后的索引insert计算在位置i插入元素后的索引is_monotonic返回True，如果每一个元素都比它前面的元素大或相等is_unique返回True，如果索引没有重复的值unique计算索引的唯一值数组[]重建索引reindexpandas对象的一个关键的方法是 reindex ，意味着使数据符合一个新的索引来构造一个新的对象。reindex更多的不是修改pandas对象的索引，而只是修改索引的顺序，如果修改的索引不存在就会使用默认的None代替此行。且不会修改原数组，要修改需要使用赋值语句。reindex 函数的参数method插值（填充）方法，见的选项fill_value代替重新索引时引入的缺失数据值limit当前向或后向填充时，最大的填充间隙level在多层索引上匹配简单索引，否则选择一个子集copy如果新索引与就的相等则底层数据不会拷贝。默认为True(即始终拷贝）In [79]: obj = Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
In [80]: obj
reindex 重排数据（行索引）在Series上调用 reindex 重排数据，使得它符合新的索引，如果那个索引的值不存在就引入缺失数据值：In [81]: obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
In [82]: obj2
In [83]: obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)
重建索引的内插或填充method为了对时间序列这样的数据排序，当重建索引的时候可能想要对值进行内插或填充。 method 选项可以是你做到这一点，使用一个如ffill 的方法来向前填充值：In [84]: obj3 = Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
In [85]: obj3.reindex(range(6), method='ffill')
method 选项的清单reindex 的 method（内插）选项ffill或pad前向（或进位）填充bfill或backfill后向（或进位）填充对于DataFrame， reindex 可以改变（行）索引，列或两者。当只传入一个序列时，结果中的行被重新索引了：In [86]: frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
In [87]: frame
Ohio Texas California
列重新索引关键字columns使用 columns 关键字可以是列重新索引：In [90]: states = ['Texas', 'Utah', 'California']
In [91]: frame.reindex(columns=states)
Texas Utah California
DataFrame重命名列columns方法2:df.rename(columns={'age': 'x', 'fat_percent': 'y'})行列同时重新索引2种方式一次可以对两个重新索引，可是插值只在行侧（0坐标轴）进行：In [92]: frame.reindex(index=['a', 'b', 'c', 'd'], method='ffill', columns=states)
Texas Utah California
正如你将看到的，使用带标签索引的 ix 可以把重新索引做的更简单：In [93]: frame.ix[['a', 'b', 'c', 'd'], states]
Texas Utah California
DataFrame索引和列的互转set_index reset_index人们经常想要将DataFrame的一个或多个列当做行索引来用，或者可能希望将行索引变成DataFrame的列。以下面这个DataFrame为例：frame = pd.DataFrame({'a': range(7),'b': range(7, 0, -1),'c': ['one','one','one','two','two','two', 'two'],'d': [0, 1, 2, 0, 1, 2, 3]})frame&& a& b&&& c& d0& 0& 7& one& 01& 1& 6& one& 12& 2& 5& one& 23& 3& 4& two& 04& 4& 3& two& 15& 5& 2& two& 26& 6& 1& two& 3列转换为行索引set_indexDataFrame的set_index函数会将其一个或多个列转换为行索引，创建一个新的 DataFrame ：frame2 = frame.set_index(['c', 'd'])In [6]: frame2&&&&&& a& bc&& d&&&& &one 0& 0& 7&&& 1& 1& 6&&& 2& 2& 5two 0& 3& 4&&& 1& 4& 3&&& 2& 5& 2&&& 3& 6& 1默认情况下，那些列会从DataFrame中移除，但也可以将其保留下来: frame.set_index(['c','d'], drop=False)&&&&&& a& b&&& c& dc&& d&&&&&&&&&&&& &one 0& 0& 7& one& 0&&& 1& 1& 6& one& 1&&& 2& 2& 5& one& 2two 0& 3& 4& two& 0&&& 1& 4& 3& two& 1&&& 2& 5& 2& two& 2&&& 3& 6& 1& two& 3[没有reduce的分组参考group部分]索引的级别会被转移到列reset_indexreset_index的功能跟set_index刚好相反，层次化索引的级别会被转移到列里面：frame2.reset_index()&&&& c& d& a& b0& one& 0& 0& 71& one& 1& 1& 62& one& 2& 2& 53& two& 0& 3& 44& two& 1& 4& 35& two& 2& 5& 26& two& 3& 6& 1[]显式拷贝索引DataFrame时返回的列是底层数据的一个视窗，而不是一个拷贝。因此，任何在Series上的就地修改都会影响DataFrame。列可以使用Series的copy 函数来显示拷贝。Note:While standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, were commend the optimized pandas data access methods,.at,.iat,.loc,.ilocand.ix.SettingWithCopyWarning提示SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFramedf[len(df.columns) - 1][df[len(df.columns) - 1] & 0.0] = 1.0这个warning主要是第二个索引导致的，就是说第二个索引是copy的。奇怪的是，df的确已经修改了，而warnning提示好像是说修改被修改到df的一个copy上了。所以这里只是一个warnning，只是说和内存有关，可能赋值不上，也可能上了。且print(df[len(df.columns) - 1][df[len(df.columns) - 1] & 0.0].is_copy)输出None，怎么就输出None，而不是True或者False?解决修改df原本数据时建议使用loc，但是要注意行列的索引位置Try using .loc[row_indexer,col_indexer] = value insteaddf.loc[df[len(df.columns) - 1] & 0.0, len(df.columns) - 1] = 1.0不建议设置不提示：pd.options.mode.chained_assignment = None
# default='warn'参考前面why .ix is a bad idea部分[为什么有这种warnning的官方解释：Returning a view versus a copy][][]Why .ix is a bad idea通过.ix选择的数据是一个copy的数据，修改这个选择不会修改原数据，而.loc是修改原数据。The .ix object tries to do more than one thing, and for anyone who has read anything about clean code, this is a strong smell.Given this dataframe:df = pd.DataFrame({&a&: [1,2,3,4], &b&: [1,1,2,2]})Two behaviors:dfcopy = df.ix[:,[&a&]]
dfcopy.a.ix[0] = 2Behavior one: dfcopy is now a stand alone dataframe. Changing it will not change dfdf.ix[0, &a&] = 3Behavior two: This changes the original dataframe.Use .loc insteadThe pandas developers recognized that the .ix object was quite smelly[speculatively] and thus created two new objects which helps in the accession and assignment of data..loc is faster, because it does not try to create a copy of the data..loc is meant to modify your existing dataframe inplace, which is more memory efficient..loc is predictable, it has one behavior.[]带有重复值的轴索引带有重复索引值的Series&&&obj = Series(range(5), index=['a','a','b','b','c'])&&&obja&&& 0a&&& 1b&&& 2b&&& 3c&&& 4索引的is_unique属性验证是否是唯一的&&&obj.index.is_uniqueFalse带有重复值索引的数据选取如果某个索引对应多个值，则返回一个S而对应单个值的，则返回一个标量值。&&&obj['a']a&&& 0a&&& 1&&&obj['c']4对DataFrame的行进行索引时也是如此:&&& df = DataFrame(np.random.randn(4, 3), index=['a','a','b','b'])&&&df&&& df.ix['b']层次化索引层次化索引（hierarchical indexing)是pandas的一项重要功能，它能在一个轴上拥有多个（两个以上）索引级别。抽象点说，它使能以低维度形式处理高维度数据。Series创建一个Series，并用一个由列表或数组组成的列表作为索引data = pd.Series(np.random.randn(10), index=[['a','a','a','b','b','b','c','c','d','d'], [1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])In [6]: dataa &1 & &0.382928& &2 & -0.360273& &3 & -0.533257b &1 & &0.341118& &2 & &0.439390& &3 & &0.645848c &1 & &0.006016& &2 & &0.700268d &2 & &0.405497& &3 & &0.188755dtype: float64这就是带有Multilndex索引的Series的格式化输出形式。索引之间的“间隔”表示“直接使用上面的标签”。&&& data.indexMultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]], labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])层次化索引的对象选取数据子集In [8]: data['b':'c']b &1 & &0.341118& &2 & &0.439390& &3 & &0.645848c &1 & &0.006016& &2 & &0.700268dtype: float64In [10]: data.ix[['b', 'd']]b &1 & &0.341118& &2 & &0.439390& &3 & &0.645848d &2 & &0.405497& &3 & &0.188755dtype: float64内层”中进行选取In [11]: data[:, 2]a & -0.360273b & &0.439390c & &0.700268d & &0.405497dtype: float64层次化索引在数据重塑和基于分组的操作：堆叠和反堆叠（如透视表生成）中扮演着重要的角色可通过其unstack方法被重新安排到一个DataFrame中：In [12]: data.unstack()& & & & & 1 & & & & 2 & & & & 3a &0...533257b &0...645848c &0..700268 & & & NaNd & & & NaN &0..188755#unstack的逆运览是stack:data.unstack().stack()DataFrame对于一个DataFrame，每条轴都可以有分层索引:frame = pd.DataFrame(np.arange(12).reshape((4, 3)),index=[['a','a','b','b'], [1, 2, 1, 2]],columns=[['Ohio','Ohio','Colorado'],['Green','Red','Green']])In [16]: frame& & &Ohio & & Colorado& & Green Red & &Greena 1 & & 0 & 1 & & & &2& 2 & & 3 & 4 & & & &5b 1 & & 6 & 7 & & & &8& 2 & & 9 &10 & & & 11各层都可以有名字（可以是字符串，也可以是別的Python对象）。如果指定了名称，它们就会显示在控制台输出中（不要将索引名称跟轴标签混为一谈！）：In [18]: frame.index.names = ['key1','key2']In [19]: frame.columns.names = ['state', 'color']In [20]: framestate & & &Ohio & & Coloradocolor & & Green Red & &Greenkey1 key2 & & & & & & & & &&a & &1 & & & &0 & 1 & & & &2& & &2 & & & &3 & 4 & & & &5b & &1 & & & &6 & 7 & & & &8& & &2 & & & &9 &10 & & & 11分部的列索引选取列分组In [21]: frame['Ohio']color & & &Green &Redkey1 key2 & & & & & &a & &1 & & & & 0 & &1& & &2 & & & & 3 & &4b & &1 & & & & 6 & &7& & &2 & & & & 9 & 10单独创建Multilndex复用pd.MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'],['Green','Red', 'Green']],names=['state', 'color'])重排分级顺序swaplevel和sortlevel如需要重新调整某条轴上各级别的顺序，或根据指定级别上的值对数据进行排序。调整某条轴上各级别的顺序swaplevelswaplevel接受两个级别编号或名称，并返回一个互换了级别的新对象（但数据不会发生变化）：In [24]: framestate & & &Ohio & & Coloradocolor & & Green Red & &Greenkey1 key2 & & & & & & & & &&a & &1 & & & &0 & 1 & & & &2& & &2 & & & &3 & 4 & & & &5b & &1 & & & &6 & 7 & & & &8& & &2 & & & &9 &10 & & & 11In [25]: frame.swaplevel('key1','key2')state & & &Ohio & & Coloradocolor & & Green Red & &Greenkey2 key1 & & & & & & & & &&1 & &a & & & &0 & 1 & & & &22 & &a & & & &3 & 4 & & & &51 & &b & & & &6 & 7 & & & &82 & &b & & & &9 &10 & & & 11Note: 同frame.swaplevel(0,1)?指定级别上的值对数据进行排序sortlevel而sortlevel则根据单个级别中的值对数据进行排序（稳定的）。交换级別时，常常也会用到sortlevel，这样最终结果就是有序的了：In [26]: frame.sortlevel(1)state & & &Ohio & & Coloradocolor & & Green Red & &Greenkey1 key2 & & & & & & & & &&a & &1 & & & &0 & 1 & & & &2b & &1 & & & &6 & 7 & & & &8a & &2 & & & &3 & 4 & & & &5b & &2 & & & &9 &10 & & & 11In [27]: frame.swaplevel(0,1).sortlevel(0)state & & &Ohio & & Coloradocolor & & Green Red & &Greenkey2 key1 & & & & & & & & &&1 & &a & & & &0 & 1 & & & &2& & &b & & & &6 & 7 & & & &82 & &a & & & &3 & 4 & & & &5& & &b & & & &9 &10 & & & 11Note：在层次化索引的对象上，如果索引是按字典方式从外到内排序（即调用sortlevel(0)或 sort_index()的结果），数据选取操作的性能要好很多。根据级别汇总统计许多对DataFrame和Series的描述和汇总统计都有一个level选项，它用于指定在某条轴上求和的级别，根据行或列上的级別来进行求和In [29]: framestate & & &Ohio & & Coloradocolor & & Green Red & &Greenkey1 key2 & & & & & & & & &&a & &1 & & & &0 & 1 & & & &2& & &2 & & & &3 & 4 & & & &5b & &1 & & & &6 & 7 & & & &8& & &2 & & & &9 &10 & & & 11In [30]: frame.sum(level='key2')state &Ohio & & Coloradocolor Green Red & &Greenkey2 & & & & & & & & & &1 & & & & 6 & 8 & & & 102 & & & &12 &14 & & & 16In [33]: frame.sum(level='color',axis=1)color & & &Green &Redkey1 key2 & & & & & &a & &1 & & & & 2 & &1& & &2 & & & & 8 & &4b & &1 & & & &14 & &7& & &2 & & & &20 & 10In [35]: frame.sum(level='color')...AssertionError: Level color not in index[]from: ref: [Indexing and Selecting Data]*
参考知识库
* 以上用户言论只代表其个人观点，不代表CSDN网站的观点或立场
访问：1159051次
积分：15685
积分：15685
排名：第538名
原创：499篇
转载：73篇
评论：142条
文章：21篇
阅读：66602
阅读：15542
文章：13篇
阅读：32353
阅读：19774
文章：16篇
阅读：57017
文章：18篇
阅读：35316
(5)(3)(15)(18)(18)(23)(4)(5)(16)(6)(11)(15)(5)(4)(5)(29)(8)(12)(9)(10)(16)(20)(19)(7)(24)(9)(15)(19)(57)(12)(28)(15)(36)(15)(19)(2)(2)
Contact me}

叫阿莫西中心