国产在线视频区,国产亚洲高清,日本最新免费不卡二区在线

利用python進行數(shù)據(jù)分析

imelee >《練習腳本代碼》

2017.03.19

關(guān)注

1.時期的頻率轉(zhuǎn)換

Period和PeriodIndex對象都可以通過其asfreq方法被轉(zhuǎn)換成別的頻率。假設(shè)我們有一個年度時期，希望將其轉(zhuǎn)換為當年年初或年末的一個月度時期

p=pd.Period('2007',freq='A-DEC')
print p.asfreq('M',how='start')
print p.asfreq('M',how='end')

結(jié)果為：

2007-01
2007-12

你可以將Period('2007',freq='A-DEC')看做一個被劃分為多個月度時期的時間段中的游標。對于一個不以12月結(jié)束的財政年變，月度子時期的歸屬情況就不一樣了

p=pd.Period('2007',freq='A-JUN')
print p.asfreq('M','start')
print p.asfreq('M','end')

結(jié)果為：

2006-07
2007-06

在將高頻率轉(zhuǎn)換為低頻率時，超時期是由子時期所屬的位置決定的。例如，在A-JUN頻率中，月份2007年08月實際上是屬于周期2008年的

p=pd.Period('2007-08','M')
print p.asfreq('A-JUN')

結(jié)果為：

2008

PeriodIndex或TimeSeries的頻率轉(zhuǎn)換方式也是如此

rng=pd.period_range('2006','2009',freq='A-DEC')
ts=Series(np.random.randn(len(rng)),index=rng)
print ts

結(jié)果為：

2006    0.802142
2007   -0.048446
2008   -1.459365
2009   -0.710186
Freq: A-DEC, dtype: float64

print ts.asfreq('M',how='start')
print ts.asfreq('B',how='end')

結(jié)果為：

2006-01    1.385962
2007-01   -0.293633
2008-01   -0.742163
2009-01    0.147614
Freq: M, dtype: float64
2006-12-29    1.385962
2007-12-31   -0.293633
2008-12-31   -0.742163
2009-12-31    0.147614
Freq: B, dtype: float64

2.按季度計算的時期頻率

pandas支持12種可能的季度型頻率，即Q-JAN到Q-DEC

p=pd.Period('2012Q4',freq='Q-JAN')
print p

結(jié)果為：

2012Q4

在以1月結(jié)束的財年中，2012Q4是從11月到1月（將其轉(zhuǎn)換為日型頻率就明白了），如下圖

print p.asfreq('D','start')
print p.asfreq('D','end')

結(jié)果為：

2011-11-01
2012-01-31

因此，Period之間的算術(shù)運算會非常簡單。例如，要獲取該季度倒數(shù)第二個工作日下午4點的時間戳

p4pm=(p.asfreq('B','e')-1).asfreq('T','S')+16*60
print p4pm

print p4pm.to_timestamp()

結(jié)果為：

2012-01-30 16:00

2012-01-30 16:00:00

period_range還可用于生產(chǎn)季度型范圍。

rng=pd.period_range('2011Q3','2012Q4',freq='Q-JAN')
ts=Series(np.arange(len(rng)),index=rng)
print ts

結(jié)果為：

2011Q3    0
2011Q4    1
2012Q1    2
2012Q2    3
2012Q3    4
2012Q4    5
Freq: Q-JAN, dtype: int32

new_rng=(rng.asfreq('B','e')-1).asfreq('T','S')+16*60
ts.index=new_rng.to_timestamp()
print ts

結(jié)果為：

2010-10-28 16:00:00    0
2011-01-28 16:00:00    1
2011-04-28 16:00:00    2
2011-07-28 16:00:00    3
2011-10-28 16:00:00    4
2012-01-30 16:00:00    5
dtype: int32

3.將Timestamp轉(zhuǎn)換為Period（及其反向過程）

通過使用to_period方法，可以將由時間戳索引的Series和DataFrame對象轉(zhuǎn)換為以時期索引：

rng=pd.date_range('1/1/2000',periods=3,freq='M')
ts=Series(np.random.randn(3),index=rng)
pts=ts.to_period()
print ts
print pts

結(jié)果為：

2000-01-31    0.301329
2000-02-29   -0.927125
2000-03-31    0.369884
Freq: M, dtype: float64
2000-01    0.301329
2000-02   -0.927125
2000-03    0.369884
Freq: M, dtype: float64

由于時期指的是非重疊時間區(qū)間，因此對于給定的頻率，一個時間戳只能屬于一個時期。新PeriodIndex的頻率默認是從時間戳推斷而來的，你也可以指定任何別的頻率。結(jié)果中允許存在重復(fù)時期

rng=pd.date_range('1/29/2000',periods=6,freq='D')
ts2=Series(np.random.randn(6),index=rng)
print ts2.to_period('M')

結(jié)果為：

2000-01    0.591006
2000-01    0.326477
2000-01   -2.997369
2000-02    0.140095
2000-02    0.001204
2000-02   -0.276570
Freq: M, dtype: float64

要轉(zhuǎn)換為時間戳，使用to_timestamp即可

pts=ts.to_period()
print pts
print pts.to_timestamp(how='end')

結(jié)果為：

2000-01    0.188389
2000-02   -0.967632
2000-03   -0.740213
Freq: M, dtype: float64
2000-01-31    0.188389
2000-02-29   -0.967632
2000-03-31   -0.740213
Freq: M, dtype: float64

4.通過數(shù)組創(chuàng)建PeriodIndex

固定頻率的數(shù)據(jù)集通常會將時間信息分開存放在多個列中。如，在下面這個宏觀經(jīng)濟數(shù)據(jù)集中，年度和季度就分別存放在不同的列中

data=pd.read_csv('macrodata.csv')
print data.year
print data.quarter

將這兩個數(shù)組以及一個頻率傳入PeriodIndex，就可以將它們合并成DataFrame的一個索引

index=pd.PeriodIndex(year=data.year,quarter=data.quarter,freq='Q-DEC')
print index
data.index=index
print data.infl

結(jié)果為：

PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
             '1960Q3', '1960Q4', '1961Q1', '1961Q2',
             ...
             '2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
             '2008Q4', '2009Q1', '2009Q2', '2009Q3'],
            dtype='int64', length=203, freq='Q-DEC')
1959Q1    0.00
1959Q2    2.34
1959Q3    2.74
1959Q4    0.27
1960Q1    2.31
1960Q2    0.14
1960Q3    2.70
1960Q4    1.21
1961Q1   -0.40
1961Q2    1.47
1961Q3    0.80
1961Q4    0.80
1962Q1    2.26
1962Q2    0.13
1962Q3    2.11
1962Q4    0.79
1963Q1    0.53
1963Q2    2.75
1963Q3    0.78
1963Q4    2.46
1964Q1    0.13
1964Q2    0.90
1964Q3    1.29
1964Q4    2.05
1965Q1    1.28
1965Q2    2.54
1965Q3    0.89
1965Q4    2.90
1966Q1    4.99
1966Q2    2.10

2002Q2    1.56
2002Q3    2.66
2002Q4    3.08
2003Q1    1.31
2003Q2    1.09
2003Q3    2.60
2003Q4    3.02
2004Q1    2.35
2004Q2    3.61
2004Q3    3.58
2004Q4    2.09
2005Q1    4.15
2005Q2    1.85
2005Q3    9.14
2005Q4    0.40
2006Q1    2.60
2006Q2    3.97
2006Q3   -1.58
2006Q4    3.30
2007Q1    4.58
2007Q2    2.75
2007Q3    3.45
2007Q4    6.38
2008Q1    2.82
2008Q2    8.53
2008Q3   -3.16
2008Q4   -8.79
2009Q1    0.94
2009Q2    3.37
2009Q3    3.56
Freq: Q-DEC, Name: infl, dtype: float64

5.重采樣及頻率轉(zhuǎn)換

重采樣指的是將時間序列從一個頻率轉(zhuǎn)換到另一個頻率的處理過程。將高頻率數(shù)據(jù)聚合到低頻率稱為降采樣，而將低頻率數(shù)據(jù)轉(zhuǎn)換到高頻率則稱為升采樣。并不是所有的重采樣都能被劃分到這兩個大類中。

pandas對象都帶有一個resample方法，它是各種頻率轉(zhuǎn)換工作的主力函數(shù)

rng=pd.date_range('1/1/2000',periods=100,freq='D')
ts=Series(np.random.randn(len(rng)),index=rng)
print ts.resample('M',how='mean')
print ts.resample('M',how='mean',kind='period')

結(jié)果為：

2000-01-31   -0.154671
2000-02-29    0.224220
2000-03-31   -0.242436
2000-04-30    0.291921
Freq: M, dtype: float64
2000-01   -0.154671
2000-02    0.224220
2000-03   -0.242436
2000-04    0.291921
Freq: M, dtype: float64

6.降采樣

在用resample對數(shù)據(jù)進行降采樣時，需要考慮兩樣東西：

a.各區(qū)間哪邊是閉合的

b.如何標記各個聚合面元，用區(qū)間的開頭還是末尾

首先，來看一些“1分鐘”數(shù)據(jù)：

rng=pd.date_range('1/1/2000',periods=12,freq='T')
ts=Series(np.arange(12),index=rng)
print ts

結(jié)果為：

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

假設(shè)你想通過求和的方式將這些數(shù)據(jù)聚合到“5分鐘”塊中

print ts.resample('5min',how='sum')

結(jié)果為：

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32

傳入的頻率將會以“5分鐘”的增量定義面元邊界。默認情況下，面元的右邊界是包含的，傳入closed='left'會讓區(qū)間以左邊界閉合

print ts.resample('5min',how='sum',closed='left')

結(jié)果為：

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32

傳入label='left'即可用面元的左邊界對其進行標記

print ts.resample('5min',how='sum',closed='left',label='left')

結(jié)果為：

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5T, dtype: int32

只需要通過loffset設(shè)置一個字符串或日期偏移量即可實現(xiàn)對結(jié)果索引做一些位移

print ts.resample('5min',how='sum',loffset='-1s')

結(jié)果為：

1999-12-31 23:59:59    10
2000-01-01 00:04:59    35
2000-01-01 00:09:59    21
Freq: 5T, dtype: int32

此外，也可以通過調(diào)用結(jié)果對象的shift方法來實現(xiàn)該目的，這樣就不需要設(shè)置loffset了。

7.OHLC重采樣

金融領(lǐng)域中有一種無所不在的時間序列聚合方式，計算各面元的四個值：第一個值（open，開盤），最后一個值（close，開盤），最大值（high，最高）以及最小值（low，最低）。傳入how='ohlc'即可得到一個含有這四種聚合值的DataFrame

print ts.resample('5min',how='ohlc')

結(jié)果為：

                     open high low close
2000-01-01 00:00:00     0     4    0      4
2000-01-01 00:05:00     5     9    5      9
2000-01-01 00:10:00    10    11   10     11

8.通過groupby進行重采樣

另一種降采樣的辦法是使用pandas的groupby功能。如，打算根據(jù)月份或星期幾進行分組，只需傳入一個能夠訪問時間序列的索引上的這些字段的函數(shù)即可

rng=pd.date_range('1/1/2000',periods=100,freq='D')
ts=Series(np.arange(100),index=rng)
print ts
print ts.groupby(lambda x:x.month).mean()
print ts.groupby(lambda x:x.weekday).mean()

結(jié)果為：

2000-01-01     0
2000-01-02     1
2000-01-03     2
2000-01-04     3
2000-01-05     4
2000-01-06     5
2000-01-07     6
2000-01-08     7
2000-01-09     8
2000-01-10     9
2000-01-11    10
2000-01-12    11
2000-01-13    12
2000-01-14    13
2000-01-15    14
2000-01-16    15
2000-01-17    16
2000-01-18    17
2000-01-19    18
2000-01-20    19
2000-01-21    20
2000-01-22    21
2000-01-23    22
2000-01-24    23
2000-01-25    24
2000-01-26    25
2000-01-27    26
2000-01-28    27
2000-01-29    28
2000-01-30    29
              ..
2000-03-11    70
2000-03-12    71
2000-03-13    72
2000-03-14    73
2000-03-15    74
2000-03-16    75
2000-03-17    76
2000-03-18    77
2000-03-19    78
2000-03-20    79
2000-03-21    80
2000-03-22    81
2000-03-23    82
2000-03-24    83
2000-03-25    84
2000-03-26    85
2000-03-27    86
2000-03-28    87
2000-03-29    88
2000-03-30    89
2000-03-31    90
2000-04-01    91
2000-04-02    92
2000-04-03    93
2000-04-04    94
2000-04-05    95
2000-04-06    96
2000-04-07    97
2000-04-08    98
2000-04-09    99
Freq: D, dtype: int32
1    15
2    45
3    75
4    95
dtype: int32
0    47.5
1    48.5
2    49.5
3    50.5
4    51.5
5    49.0
6    50.0
dtype: float64

9.升采樣和插值

在將數(shù)據(jù)從低頻率轉(zhuǎn)換到高頻率時，就不需要聚合了。

frame=DataFrame(np.random.randn(2,4),
index=pd.date_range('1/1/2000',periods=2,freq='W-WED'),
columns=['Colorado','Texas','New York','Ohio'])
print frame

結(jié)果為：

Colorado Texas New York Ohio
2000-01-05 2.168729 0.408150 -1.268413 1.882155
2000-01-12 0.695996 0.981071 0.678594 -0.526727

將其重采樣到日頻率，默認會引入缺失值

df_daily=frame.resample('D')
print df_daily

結(jié)果為：

            Colorado     Texas New York      Ohio
2000-01-05 0.135424 0.551593 -0.777373 -0.233382
2000-01-06       NaN       NaN       NaN       NaN
2000-01-07       NaN       NaN       NaN       NaN
2000-01-08       NaN       NaN       NaN       NaN
2000-01-09       NaN       NaN       NaN       NaN
2000-01-10       NaN       NaN       NaN       NaN
2000-01-11       NaN       NaN       NaN       NaN
2000-01-12 -0.563044 -0.312367 0.601868 1.742242

假設(shè)你想要用前面的周型值填充“非星期三”。resampling的值填充和插值方式跟fillna和reindex的一樣

print frame.resample('D',fill_method='ffill')

結(jié)果為：

Colorado Texas New York Ohio
2000-01-05 0.457819 2.660074 1.196789 -0.582404
2000-01-06 0.457819 2.660074 1.196789 -0.582404
2000-01-07 0.457819 2.660074 1.196789 -0.582404
2000-01-08 0.457819 2.660074 1.196789 -0.582404
2000-01-09 0.457819 2.660074 1.196789 -0.582404
2000-01-10 0.457819 2.660074 1.196789 -0.582404
2000-01-11 0.457819 2.660074 1.196789 -0.582404
2000-01-12 1.878784 -1.368375 0.484410 -0.698291

同樣，這里也可以只填充指定的時期數(shù)（目的是限制前面的觀測值的持續(xù)使用距離）

print frame.resample('D',fill_method='ffill',limit=2)

結(jié)果為：

            Colorado     Texas New York      Ohio
2000-01-05 -1.139258 -0.181188 0.669716 1.250018
2000-01-06 -1.139258 -0.181188 0.669716 1.250018
2000-01-07 -1.139258 -0.181188 0.669716 1.250018
2000-01-08       NaN       NaN       NaN       NaN
2000-01-09       NaN       NaN       NaN       NaN
2000-01-10       NaN       NaN       NaN       NaN
2000-01-11       NaN       NaN       NaN       NaN
2000-01-12 -1.318581 0.503518 1.852005 0.378236

新的日期索引完全沒必要跟舊的相交

print frame.resample('W-THU',fill_method='ffill')

結(jié)果為：

Colorado Texas New York Ohio
2000-01-06 2.113687 0.315481 2.523027 0.609636
2000-01-13 -1.376797 -1.087075 -0.647194 -0.111042

10.通過日期進行重采樣

frame=DataFrame(np.random.randn(24,4),
index=pd.period_range('1-2000','12-2001',freq='M'),
columns=['Colorado','Texas','New York','Ohio'])
print frame[:5]

結(jié)果為：

Colorado Texas New York Ohio
2000-01 0.119422 0.495024 1.729524 0.633504
2000-02 -1.222105 1.802419 1.167525 1.868474
2000-03 -1.211917 -0.367331 0.834356 -0.984967
2000-04 -0.708925 0.561091 -0.707988 0.809059
2000-05 0.437332 0.315616 0.175065 0.364923

annual_frame=frame.resample('A-DEC',how='mean')
print annual_frame

結(jié)果為：

Colorado Texas New York Ohio
2000 -0.326572 0.079417 0.101661 -0.194386
2001 0.228762 -0.290425 -0.674494 0.318449

升樣要稍微麻煩一些，因為你必須決定在新頻率中各區(qū)間的哪端用于放置原來的值，就像asfreq方法那樣。convention參數(shù)默認為'end'，可設(shè)置為’start‘

#Q-DEC:季度型（每年以12月結(jié)束）
print annual_frame.resample('Q-DEC',fill_method='ffill')
print annual_frame.resample('Q-DEC',fill_method='ffill',convention='start')

結(jié)果為：

Colorado Texas New York Ohio
2000Q1 0.055982 0.000586 -0.229527 -0.321558
2000Q2 0.055982 0.000586 -0.229527 -0.321558
2000Q3 0.055982 0.000586 -0.229527 -0.321558
2000Q4 0.055982 0.000586 -0.229527 -0.321558
2001Q1 -0.095915 -0.363507 -0.035557 0.186972
2001Q2 -0.095915 -0.363507 -0.035557 0.186972
2001Q3 -0.095915 -0.363507 -0.035557 0.186972
2001Q4 -0.095915 -0.363507 -0.035557 0.186972
Colorado Texas New York Ohio
2000Q1 0.055982 0.000586 -0.229527 -0.321558
2000Q2 0.055982 0.000586 -0.229527 -0.321558
2000Q3 0.055982 0.000586 -0.229527 -0.321558
2000Q4 0.055982 0.000586 -0.229527 -0.321558
2001Q1 -0.095915 -0.363507 -0.035557 0.186972
2001Q2 -0.095915 -0.363507 -0.035557 0.186972
2001Q3 -0.095915 -0.363507 -0.035557 0.186972
2001Q4 -0.095915 -0.363507 -0.035557 0.186972

生采樣和降采樣的規(guī)則：

a.在降采樣中，目標頻率必須是源頻率的子時期

b.在升采樣中，目標頻率必須是源頻率的超時期

由Q-MAR定義的時間區(qū)間只能生采樣為A-MAR，A-JUN，A-SEP，A-DEC等

print annual_frame.resample('Q-MAR',fill_method='ffill')

結(jié)果為：

Colorado Texas New York Ohio
2000Q4 -0.288411 -0.340895 0.235046 0.073330
2001Q1 -0.288411 -0.340895 0.235046 0.073330
2001Q2 -0.288411 -0.340895 0.235046 0.073330
2001Q3 -0.288411 -0.340895 0.235046 0.073330
2001Q4 -0.396545 0.510345 0.167587 0.256141
2002Q1 -0.396545 0.510345 0.167587 0.256141
2002Q2 -0.396545 0.510345 0.167587 0.256141
2002Q3 -0.396545 0.510345 0.167587 0.256141

11.時間序列繪圖

從Yahoo!Finance下載了幾只美國股票的一些價格數(shù)據(jù)

close_px_all=pd.read_csv('stock_px.csv',parse_dates=True,index_col=0)
close_px=close_px_all[['AAPL','MSFT','XOM']]
close_px=close_px.resample('B',fill_method='ffill')
print close_px

結(jié)果為：

              AAPL   MSFT    XOM
2003-01-02    7.40 21.11 29.22
2003-01-03    7.45 21.14 29.24
2003-01-06    7.45 21.52 29.96
2003-01-07    7.43 21.93 28.95
2003-01-08    7.28 21.31 28.83
2003-01-09    7.34 21.93 29.44
2003-01-10    7.36 21.97 29.03
2003-01-13    7.32 22.16 28.91
2003-01-14    7.30 22.39 29.17
2003-01-15    7.22 22.11 28.77
2003-01-16    7.31 21.75 28.90
2003-01-17    7.05 20.22 28.60
2003-01-20    7.05 20.22 28.60
2003-01-21    7.01 20.17 27.94
2003-01-22    6.94 20.04 27.58
2003-01-23    7.09 20.54 27.52
2003-01-24    6.90 19.59 26.93
2003-01-27    7.07 19.32 26.21
2003-01-28    7.29 19.18 26.90
2003-01-29    7.47 19.61 27.88
2003-01-30    7.16 18.95 27.37
2003-01-31    7.18 18.65 28.13
2003-02-03    7.33 19.08 28.52
2003-02-04    7.30 18.59 28.52
2003-02-05    7.22 18.45 28.11
2003-02-06    7.22 18.63 27.87
2003-02-07    7.07 18.30 27.66
2003-02-10    7.18 18.62 27.87
2003-02-11    7.18 18.25 27.67
2003-02-12    7.20 18.25 27.12
           ...    ...    ...
2011-09-05 374.05 25.80 72.14
2011-09-06 379.74 25.51 71.15
2011-09-07 383.93 26.00 73.65
2011-09-08 384.14 26.22 72.82
2011-09-09 377.48 25.74 71.01
2011-09-12 379.94 25.89 71.84
2011-09-13 384.62 26.04 71.65
2011-09-14 389.30 26.50 72.64
2011-09-15 392.96 26.99 74.01
2011-09-16 400.50 27.12 74.55
2011-09-19 411.63 27.21 73.70
2011-09-20 413.45 26.98 74.01
2011-09-21 412.14 25.99 71.97
2011-09-22 401.82 25.06 69.24
2011-09-23 404.30 25.06 69.31
2011-09-26 403.17 25.44 71.72
2011-09-27 399.26 25.67 72.91
2011-09-28 397.01 25.58 72.07
2011-09-29 390.57 25.45 73.88
2011-09-30 381.32 24.89 72.63
2011-10-03 374.60 24.53 71.15
2011-10-04 372.50 25.34 72.83
2011-10-05 378.25 25.89 73.95
2011-10-06 377.37 26.34 73.89
2011-10-07 369.80 26.25 73.56
2011-10-10 388.81 26.94 76.28
2011-10-11 400.29 27.00 76.27
2011-10-12 402.19 26.96 77.16
2011-10-13 408.43 27.18 76.37
2011-10-14 422.00 27.27 78.11

[2292 rows x 3 columns]

runfile('F:/python代碼/shuju/date4.py', wdir='F:/python代碼/shuju')
              AAPL   MSFT    XOM
2003-01-02    7.40 21.11 29.22
2003-01-03    7.45 21.14 29.24
2003-01-06    7.45 21.52 29.96
2003-01-07    7.43 21.93 28.95
2003-01-08    7.28 21.31 28.83
2003-01-09    7.34 21.93 29.44
2003-01-10    7.36 21.97 29.03
2003-01-13    7.32 22.16 28.91
2003-01-14    7.30 22.39 29.17
2003-01-15    7.22 22.11 28.77
2003-01-16    7.31 21.75 28.90
2003-01-17    7.05 20.22 28.60
2003-01-20    7.05 20.22 28.60
2003-01-21    7.01 20.17 27.94
2003-01-22    6.94 20.04 27.58
2003-01-23    7.09 20.54 27.52
2003-01-24    6.90 19.59 26.93
2003-01-27    7.07 19.32 26.21
2003-01-28    7.29 19.18 26.90
2003-01-29    7.47 19.61 27.88
2003-01-30    7.16 18.95 27.37
2003-01-31    7.18 18.65 28.13
2003-02-03    7.33 19.08 28.52
2003-02-04    7.30 18.59 28.52
2003-02-05    7.22 18.45 28.11
2003-02-06    7.22 18.63 27.87
2003-02-07    7.07 18.30 27.66
2003-02-10    7.18 18.62 27.87
2003-02-11    7.18 18.25 27.67
2003-02-12    7.20 18.25 27.12
           ...    ...    ...
2011-09-05 374.05 25.80 72.14
2011-09-06 379.74 25.51 71.15
2011-09-07 383.93 26.00 73.65
2011-09-08 384.14 26.22 72.82
2011-09-09 377.48 25.74 71.01
2011-09-12 379.94 25.89 71.84
2011-09-13 384.62 26.04 71.65
2011-09-14 389.30 26.50 72.64
2011-09-15 392.96 26.99 74.01
2011-09-16 400.50 27.12 74.55
2011-09-19 411.63 27.21 73.70
2011-09-20 413.45 26.98 74.01
2011-09-21 412.14 25.99 71.97
2011-09-22 401.82 25.06 69.24
2011-09-23 404.30 25.06 69.31
2011-09-26 403.17 25.44 71.72
2011-09-27 399.26 25.67 72.91
2011-09-28 397.01 25.58 72.07
2011-09-29 390.57 25.45 73.88
2011-09-30 381.32 24.89 72.63
2011-10-03 374.60 24.53 71.15
2011-10-04 372.50 25.34 72.83
2011-10-05 378.25 25.89 73.95
2011-10-06 377.37 26.34 73.89
2011-10-07 369.80 26.25 73.56
2011-10-10 388.81 26.94 76.28
2011-10-11 400.29 27.00 76.27
2011-10-12 402.19 26.96 77.16
2011-10-13 408.43 27.18 76.37
2011-10-14 422.00 27.27 78.11

[2292 rows x 3 columns]

對其中任意一列調(diào)用plot即可生成一張簡單的圖表

close_px['AAPL'].plot()

當對DataFrame調(diào)用plot時，所有時間序列都會被繪制在一個subplot上，并有一個圖例說明哪個是哪個，這里只繪制了2009年的數(shù)據(jù)

close_px.ix['2009'].plot()

展示蘋果公司在2011年1月到3月間的每日股價

close_px['AAPL'].ix['01-2011':'03-2011'].plot()

季度型頻率的數(shù)據(jù)會用季度標記進行格式化

appl_q=close_px['AAPL'].resample('Q-DEC',fill_method='ffill')
appl_q.ix['2009':].plot()

12.移動窗口函數(shù)

在移動窗口（可以帶有指數(shù)衰減權(quán)數(shù)）上計算的各種統(tǒng)計函數(shù)也是一類常見于時間序列的數(shù)組變換。將它稱為移動窗口函數(shù)，其中還包括那些窗口不定長的函數(shù)（如指數(shù)加權(quán)移動平均），移動函數(shù)會自動排除缺失值。

rolling_mean是其中最簡單的一個。它接受一個TimeSeries或DataFrame以及一個window（表示期數(shù)）

close_px.AAPL.plot()
pd.rolling_mean(close_px.AAPL,250).plot()

appl_std250=pd.rolling_std(close_px.AAPL,250,min_periods=10)
print appl_std250[5:12]

結(jié)果為：

2003-01-09         NaN
2003-01-10         NaN
2003-01-13         NaN
2003-01-14         NaN
2003-01-15    0.077496
2003-01-16    0.074760
2003-01-17    0.112368
Freq: B, Name: AAPL, dtype: float64

appl_std250.plot()

要計算擴展窗口平均，你可以將擴展窗口看做一個特殊的窗口，其長度與實踐序列一樣，但只需一期（或多期）即可計算一個值

#通過rolling_mean定義擴展平均
expanding_mean=lambda x:rolling_mean(x,len(x),min_periods=1)

對DataFrame調(diào)用rolling_mean（以及與之類似的函數(shù)）會將轉(zhuǎn)換應(yīng)用到所有的列上

pd.rolling_mean(close_px,60).plot(logy=True)

13.指數(shù)加權(quán)函數(shù)

下面這個例子對比了蘋果公司股價的60日移動平均和span=60的指數(shù)加權(quán)移動平均

fig,axes=plt.subplots(nrows=2,ncols=1,sharex=True,sharey=True,figsize=(12,7))
aapl_px=close_px.AAPL['2005':'2009']
ma60=pd.rolling_mean(aapl_px,60,min_periods=50)
ewma60=pd.ewma(aapl_px,span=60)
aapl_px.plot(style='k-',ax=axes[0])
ma60.plot(style='k--',ax=axes[0])
aapl_px.plot(style='k-',ax=axes[1])
ewma60.plot(style='k--',ax=axes[1])
axes[0].set_title('Simple MA')
axes[1].set_title('Exponentially-weighted MA')

14.二元移動窗口函數(shù)

有些統(tǒng)計計算（如相關(guān)系數(shù)和協(xié)方差）需要在兩個時間序列上執(zhí)行，如果對某只股票對某個參數(shù)指數(shù)（如標準普爾500指數(shù)）的相關(guān)系數(shù)感興趣，我們可以通過計算百分數(shù)變化并使用rolling_corr的方式得到該結(jié)果

spx_px=close_px_all['SPX']
spx_rets=spx_px/spx_px.shift(1)-1
returns=close_px.pct_change()
corr=pd.rolling_corr(returns.AAPL,spx_rets,125,min_periods=100)
corr.plot()

想要一次性計算多只股票與標準普爾500指數(shù)的相關(guān)系數(shù)，只需傳入一個TimeSeries和一個DataFrame，rolling_corr就會自動計算TimeSeries（本例中就是spx_rets）與DataFrame各列的相關(guān)系數(shù)

corr=pd.rolling_corr(returns,spx_rets,125,min_periods=100)
corr.plot()

15.用戶定義的移動窗口函數(shù)

AAPL2%回報率的百分等級（一年窗口期）

score_at_2percent=lambda x:percentileofscore(x,0.02)
result=pd.rolling_apply(returns.AAPL,250,score_at_2percent)
result.plot()

16.性能和內(nèi)存使用方面的注意事項

rng=pd.date_range('1/1/2000',periods=10000000,freq='10ms')
ts=Series(np.random.randn(len(rng)),index=rng)
print ts
print ts.resample('15min',how='ohlc')

本站僅提供存儲服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點擊舉報。

打開APP，閱讀全文并永久保存查看更多類似文章

十分鐘搞定pandas

Python 數(shù)據(jù)分析之 pandas 進階(二)

Pandas 時間序列4 - 實例方法與重采樣

[轉(zhuǎn)]10 minutes to pandas

Python中處理日期時間庫的使用方法

Python學習教程_Python學習路線：Pandas庫分析-時間序列的處理

更多類似文章 >>

免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版