基本统计(含排序) 分布/累计统计 数据特征 相关性、周期性等 数据挖掘(形成知识)
一组数据表达一个或多个含义
摘要 - 数据形成有损特征的过程
pandas库的数据排序
.sort_index()方法在指定轴上根据索引进行排序,默认升序
.sort_index(axis=0, ascending=True)
.sort_values()方法在指定轴上根据数值进行排序,默认升序
Series.sort_values(axis=0, ascending=True)
DataFrame.sort_values(by, axis=0, ascending=True)
by :axis轴上的某个索引或索引列表
NaN统一放到排序末尾
代码示例
# -*- coding: utf-8 -*- # @File : pandas_sort.py # @Date : 2018-05-20 # pandas数据排序 import pandas as pd import numpy as np # 数据准备 df = pd.DataFrame(np.arange(20).reshape(4, 5), index=["c", "a", "d", "b"]) print(df) """ 0 1 2 3 4 c 0 1 2 3 4 a 5 6 7 8 9 d 10 11 12 13 14 b 15 16 17 18 19 """ # 索引升序排序,默认axis=0,行索引 print(df.sort_index()) """ 0 1 2 3 4 a 5 6 7 8 9 b 15 16 17 18 19 c 0 1 2 3 4 d 10 11 12 13 14 """ # 索引降序排序 print(df.sort_index(ascending=False)) """ 0 1 2 3 4 d 10 11 12 13 14 c 0 1 2 3 4 b 15 16 17 18 19 a 5 6 7 8 9 """ # 对axis-1排序,列索引 print(df.sort_index(axis=1, ascending=False)) """ 4 3 2 1 0 c 4 3 2 1 0 a 9 8 7 6 5 d 14 13 12 11 10 b 19 18 17 16 15 """ # 值排序,行排序 print(df.sort_values(2, ascending=False)) """ 0 1 2 3 4 b 15 16 17 18 19 d 10 11 12 13 14 a 5 6 7 8 9 c 0 1 2 3 4 """ # 列排序,选择排序关键字 print(df.sort_values("a", axis=1, ascending=False)) """ 4 3 2 1 0 c 4 3 2 1 0 a 9 8 7 6 5 d 14 13 12 11 10 b 19 18 17 16 15 """ # NaN统一放到排序末尾 a = pd.DataFrame(np.arange(12).reshape(3, 4), index=["a", "b", "c"]) b = pd.DataFrame(np.arange(20).reshape(4, 5), index=["a", "b", "c", "d"]) c = a + b print(c) """ 0 1 2 3 4 a 0.0 2.0 4.0 6.0 NaN b 9.0 11.0 13.0 15.0 NaN c 18.0 20.0 22.0 24.0 NaN d NaN NaN NaN NaN NaN """ print(c.sort_values(2, ascending=False)) """ 0 1 2 3 4 c 18.0 20.0 22.0 24.0 NaN b 9.0 11.0 13.0 15.0 NaN a 0.0 2.0 4.0 6.0 NaN d NaN NaN NaN NaN NaN """ print(c.sort_values(2, ascending=True)) """ 0 1 2 3 4 a 0.0 2.0 4.0 6.0 NaN b 9.0 11.0 13.0 15.0 NaN c 18.0 20.0 22.0 24.0 NaN d NaN NaN NaN NaN NaN ""