http://blog.csdn.net/pipisorry/article/details/52209377
其它時(shí)間序列處理相關(guān)的包
[P4J 0.6: Periodic light curve analysis tools based on Information Theory]
[p4j github]
pandas時(shí)序數(shù)據(jù)文件讀取
dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
data = pd.read_csv('AirPassengers.csv', parse_dates='Month', index_col='Month',date_parser=dateparse)
print data.head()
read_csv時(shí)序參數(shù)
parse_dates:這是指定含有時(shí)間數(shù)據(jù)信息的列。正如上面所說的,列的名稱為“月份”。
index_col:使用pandas 的時(shí)間序列數(shù)據(jù)背后的關(guān)鍵思想是:目錄成為描述時(shí)間數(shù)據(jù)信息的變量。所以該參數(shù)告訴pandas使用“月份”的列作為索引。
date_parser:指定將輸入的字符串轉(zhuǎn)換為可變的時(shí)間數(shù)據(jù)。Pandas默認(rèn)的數(shù)據(jù)讀取格式是‘YYYY-MM-DD HH:MM:SS’?如需要讀取的數(shù)據(jù)沒有默認(rèn)的格式,就要人工定義。這和dataparse的功能部分相似,這里的定義可以為這一目的服務(wù)。The default uses dateutil.parser.parser to do the conversion.
[pandas.read_csv]
[python模塊:時(shí)間處理模塊]
時(shí)間序列分析和處理Time Series
pandas has simple, powerful, and efficient functionality for performingresampling operations during frequency conversion (e.g., converting secondlydata into 5-minutely data). This is extremely common in, but not limited to,financial applications.
時(shí)序數(shù)據(jù)生成和表示
c = pandas.Timestamp('2012-01-01 00:00:08')
In [103]: rng = pd.date_range('1/1/2012', periods=100, freq='S')In [104]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)In [105]: ts.resample('5Min', how='sum')Out[105]: 2012-01-01 25083Freq: 5T, dtype: int32
Time zone representation
In [106]: rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')In [107]: ts = pd.Series(np.random.randn(len(rng)), rng)In [108]: tsOut[108]: 2012-03-06 0.4640002012-03-07 0.2273712012-03-08 -0.4969222012-03-09 0.3063892012-03-10 -2.290613Freq: D, dtype: float64In [109]: ts_utc = ts.tz_localize('UTC')In [110]: ts_utcOut[110]: 2012-03-06 00:00:00+00:00 0.4640002012-03-07 00:00:00+00:00 0.2273712012-03-08 00:00:00+00:00 -0.4969222012-03-09 00:00:00+00:00 0.3063892012-03-10 00:00:00+00:00 -2.290613Freq: D, dtype: float64
時(shí)序轉(zhuǎn)換
Convert to another time zone
In [111]: ts_utc.tz_convert('US/Eastern')Out[111]: 2012-03-05 19:00:00-05:00 0.4640002012-03-06 19:00:00-05:00 0.2273712012-03-07 19:00:00-05:00 -0.4969222012-03-08 19:00:00-05:00 0.3063892012-03-09 19:00:00-05:00 -2.290613Freq: D, dtype: float64
Converting between time span representations
In [112]: rng = pd.date_range('1/1/2012', periods=5, freq='M')In [113]: ts = pd.Series(np.random.randn(len(rng)), index=rng)In [114]: tsOut[114]: 2012-01-31 -1.1346232012-02-29 -1.5618192012-03-31 -0.2608382012-04-30 0.2819572012-05-31 1.523962Freq: M, dtype: float64In [115]: ps = ts.to_period()In [116]: psOut[116]: 2012-01 -1.1346232012-02 -1.5618192012-03 -0.2608382012-04 0.2819572012-05 1.523962Freq: M, dtype: float64In [117]: ps.to_timestamp()Out[117]: 2012-01-01 -1.1346232012-02-01 -1.5618192012-03-01 -0.2608382012-04-01 0.2819572012-05-01 1.523962Freq: MS, dtype: float64
Converting between period and timestamp enables some convenient arithmeticfunctions to be used. In the following example, we convert a quarterlyfrequency with year ending in November to 9am of the end of the month followingthe quarter end:
In [118]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')In [119]: ts = pd.Series(np.random.randn(len(prng)), prng)In [120]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9In [121]: ts.head()Out[121]: 1990-03-01 09:00 -0.9029371990-06-01 09:00 0.0681591990-09-01 09:00 -0.0578731990-12-01 09:00 -0.3682041991-03-01 09:00 -1.144073Freq: H, dtype: float64
[pandas-docs/stable/timeseries]
[pandas cookbook Timeseries]
皮皮blog
pandas時(shí)序類型
pandas 的 TimeStamp
pandas 最基本的時(shí)間日期對象是一個從 Series 派生出來的子類 TimeStamp,這個對象與 datetime 對象保有高度兼容性,可通過 pd.to_datetime() 函數(shù)轉(zhuǎn)換。(一般是從 datetime 轉(zhuǎn)換為 Timestamp)
lang:python>>> pd.to_datetime(now)Timestamp('2014-06-17 15:56:19.313193', tz=None)>>> pd.to_datetime(np.nan)NaT
pandas 的時(shí)間序列
pandas 最基本的時(shí)間序列類型就是以時(shí)間戳(TimeStamp)為 index 元素的 Series 類型。
lang:python>>> dates = [datetime(2011,1,1),datetime(2011,1,2),datetime(2011,1,3)]>>> ts = Series(np.random.randn(3),index=dates)>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.154522dtype: float64>>> type(ts)>>> ts.index[2011-01-01, ..., 2011-01-03]Length: 3, Freq: None, Timezone: None>>> ts.index[0]Timestamp('2011-01-01 00:00:00', tz=None)
時(shí)間序列之間的算術(shù)運(yùn)算會自動按時(shí)間對齊。
索引、選取、子集構(gòu)造
時(shí)間序列只是 index 比較特殊的 Series ,因此一般的索引操作對時(shí)間序列依然有效。其特別之處在于對時(shí)間序列索引的操作優(yōu)化。如使用各種字符串進(jìn)行索引:
lang:python>>> ts['20110101']0.36228897878097266>>> ts['2011-01-01']0.36228897878097266>>> ts['01/01/2011']0.36228897878097266
對于較長的序列,還可以只傳入 “年” 或 “年月” 選取切片:
lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222012-12-25 0.111869dtype: float64>>> ts['2012']2012-12-25 0.111869dtype: float64>>> ts['2011-1-2':'2012-12']2011-01-02 0.5866952011-01-03 -0.1545222012-12-25 0.111869dtype: float64
除了這種字符串切片方式外,還有一種實(shí)例方法可用:ts.truncate(after='2011-01-03')。
值得注意的是,切片時(shí)使用的字符串時(shí)間戳并不必存在于 index 之中,如 ts.truncate(before='3055') 也是合法的。
Time/Date Components
There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DateTimeIndex.
Property | Description |
---|
year | The year of the datetime |
month | The month of the datetime |
day | The days of the datetime |
hour | The hour of the datetime |
minute | The minutes of the datetime |
second | The seconds of the datetime |
microsecond | The microseconds of the datetime |
nanosecond | The nanoseconds of the datetime |
date | Returns datetime.date |
time | Returns datetime.time |
dayofyear | The ordinal day of year |
weekofyear | The week ordinal of the year |
week | The week ordinal of the year |
dayofweek | The numer of the day of the week with Monday=0, Sunday=6 |
weekday | The number of the day of the week with Monday=0, Sunday=6 |
weekday_name | The name of the day in a week (ex: Friday) |
quarter | Quarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc. |
days_in_month | The number of days in the month of the datetime |
is_month_start | Logical indicating if first day of month (defined by frequency) |
is_month_end | Logical indicating if last day of month (defined by frequency) |
is_quarter_start | Logical indicating if first day of quarter (defined by frequency) |
is_quarter_end | Logical indicating if last day of quarter (defined by frequency) |
is_year_start | Logical indicating if first day of year (defined by frequency) |
is_year_end | Logical indicating if last day of year (defined by frequency) |
Furthermore, if you have a Series with datetimelike values, then you can access these properties via the .dt accessor, see the docs.
[Time/Date Components?]
日期的范圍、頻率以及移動
pandas 中的時(shí)間序列一般被默認(rèn)為不規(guī)則的,即沒有固定的頻率。但出于分析的需要,我們可以通過插值的方式將序列轉(zhuǎn)換為具有固定頻率的格式。一種快捷方式是使用 .resample(rule) 方法:
lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-06 0.222958dtype: float64>>> ts.resample('D')2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-04 NaN2011-01-05 NaN2011-01-06 0.222958Freq: D, dtype: float64
生成日期范圍
pd.date_range() 可用于生成指定長度的 DatetimeIndex。參數(shù)可以是起始結(jié)束日期,或單給一個日期,加一個時(shí)間段參數(shù)。日期是包含的。
lang:python>>> pd.date_range('20100101','20100110')[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None>>> pd.date_range(start='20100101',periods=10)[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None>>> pd.date_range(end='20100110',periods=10)[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None
默認(rèn)情況下,date_range 會按天計(jì)算時(shí)間點(diǎn)。這可以通過 freq 參數(shù)進(jìn)行更改,如 “BM” 代表 bussiness end of month。
lang:python>>> pd.date_range('20100101','20100601',freq='BM')[2010-01-29, ..., 2010-05-31]Length: 5, Freq: BM, Timezone: None
頻率和日期偏移量
pandas 中的頻率是由一個基礎(chǔ)頻率和一個乘數(shù)組成的?;A(chǔ)頻率通常以一個字符串別名表示,如上例中的 “BM”。對于每個基礎(chǔ)頻率,都有一個被稱為日期偏移量(date offset)的對象與之對應(yīng)??梢酝ㄟ^實(shí)例化日期偏移量來創(chuàng)建某種頻率:
lang:python>>> Hour()>>> Hour(2)<2 *="" hours="">>>> Hour(1) + Minute(30)<90 *="" minutes="">90>2>
但一般來說不必這么麻煩,使用前面提過的字符串別名來創(chuàng)建頻率就可以了:
lang:python>>> pd.date_range('00:00','12:00',freq='1h20min')[2014-06-17 00:00:00, ..., 2014-06-17 12:00:00]Length: 10, Freq: 80T, Timezone: None
可用的別名,可以通過 help() 或 文檔來查詢,這里就不寫了。
移動(超前和滯后)數(shù)據(jù)
移動(shifting)指的是沿著時(shí)間軸將數(shù)據(jù)前移或后移。Series 和 DataFrame 都有一個 .shift() 方法用于執(zhí)行單純的移動操作,index 維持不變:
lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-06 0.222958dtype: float64>>> ts.shift(2)2011-01-01 NaN2011-01-02 NaN2011-01-03 0.3622892011-01-06 0.586695dtype: float64>>> ts.shift(-2)2011-01-01 -0.1545222011-01-02 0.2229582011-01-03 NaN2011-01-06 NaNdtype: float64
上例中因?yàn)橐苿硬僮鳟a(chǎn)生了 NA 值,另一種移動方法是移動 index,而保持?jǐn)?shù)據(jù)不變。這種移動方法需要額外提供一個 freq 參數(shù)來指定移動的頻率:
lang:python>>> ts.shift(2,freq='D')2011-01-03 0.3622892011-01-04 0.5866952011-01-05 -0.1545222011-01-08 0.222958dtype: float64>>> ts.shift(2,freq='3D')2011-01-07 0.3622892011-01-08 0.5866952011-01-09 -0.1545222011-01-12 0.222958dtype: float64
時(shí)期及其算術(shù)運(yùn)算
本節(jié)使用的時(shí)期(period)概念不同于前面的時(shí)間戳(timestamp),指的是一個時(shí)間段。但在使用上并沒有太多不同,pd.Period 類的構(gòu)造函數(shù)仍需要一個時(shí)間戳,以及一個 freq 參數(shù)。freq 用于指明該 period 的長度,時(shí)間戳則說明該 period 在公園時(shí)間軸上的位置。
lang:python>>> p = pd.Period(2010,freq='M')>>> pPeriod('2010-01', 'M')>>> p + 2Period('2010-03', 'M')
上例中我給 period 的構(gòu)造器傳了一個 “年” 單位的時(shí)間戳和一個 “Month” 的 freq,pandas 便自動把 2010 解釋為了 2010-01。
period_range 函數(shù)可用于創(chuàng)建規(guī)則的時(shí)間范圍:
lang:python>>> pd.period_range('2010-01','2010-05',freq='M')freq: M[2010-01, ..., 2010-05]length: 5
PeriodIndex 類保存了一組 period,它可以在任何 pandas 數(shù)據(jù)結(jié)構(gòu)中被用作軸索引:
lang:python>>> Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))2010-01 0.7559612010-02 -1.0744922010-03 -0.3797192010-04 0.1536622010-05 -0.291157Freq: M, dtype: float64
時(shí)期的頻率轉(zhuǎn)換
Period 和 PeriodIndex 對象都可以通過其 .asfreq(freq, method=None, how=None) 方法被轉(zhuǎn)換成別的頻率。
lang:python>>> p = pd.Period('2007',freq='A-DEC')>>> p.asfreq('M',how='start')Period('2007-01', 'M')>>> p.asfreq('M',how='end')Period('2007-12', 'M')>>> ts = Series(np.random.randn(1),index=[p])>>> ts2007 -0.112347Freq: A-DEC, dtype: float64>>> ts.asfreq('M',how='start')2007-01 -0.112347Freq: M, dtype: float64
時(shí)間戳與時(shí)期間相互轉(zhuǎn)換
以時(shí)間戳和以時(shí)期為 index 的 Series 和 DataFrame 都有一對 .to_period() 和 to_timestamp(how='start') 方法用于互相轉(zhuǎn)換 index 的類型。因?yàn)閺?period 到 timestamp 的轉(zhuǎn)換涉及到一個取端值的問題,所以需要一個額外的 how 參數(shù),默認(rèn)為 'start':
lang:python>>> ts = Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))>>> ts2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64>>> ts.to_timestamp()2010-01-01 -0.3121602010-02-01 0.9626522010-03-01 -0.9594782010-04-01 1.2402362010-05-01 -0.916218Freq: MS, dtype: float64>>> ts.to_timestamp(how='end')2010-01-31 -0.3121602010-02-28 0.9626522010-03-31 -0.9594782010-04-30 1.2402362010-05-31 -0.916218Freq: M, dtype: float64>>> ts.to_timestamp().to_period()2010-01-01 00:00:00.000 -0.3121602010-02-01 00:00:00.000 0.9626522010-03-01 00:00:00.000 -0.9594782010-04-01 00:00:00.000 1.2402362010-05-01 00:00:00.000 -0.916218Freq: L, dtype: float64>>> ts.to_timestamp().to_period('M')2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64
重采樣及頻率轉(zhuǎn)換
重采樣(resampling)指的是將時(shí)間序列從一個頻率轉(zhuǎn)換到另一個頻率的過程。pandas 對象都含有一個.resample(freq, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0) 方法用于實(shí)現(xiàn)這個過程。
本篇最前面曾用 resample 規(guī)整化過時(shí)間序列。當(dāng)時(shí)進(jìn)行的是插值操作,因?yàn)樵饕念l率與給出的 freq 參數(shù)相同。resample 方法更多的應(yīng)用場合是 freq 發(fā)生改變的時(shí)候,這時(shí)操作就分為升采樣(upsampling)和降采樣(downsampling)兩種。具體的區(qū)別都體現(xiàn)在參數(shù)里。
lang:python>>> ts2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64>>> ts.resample('D',fill_method='ffill')#升采樣2010-01-01 -0.312162010-01-02 -0.312162010-01-03 -0.312162010-01-04 -0.312162010-01-05 -0.312162010-01-06 -0.312162010-01-07 -0.312162010-01-08 -0.312162010-01-09 -0.312162010-01-10 -0.312162010-01-11 -0.312162010-01-12 -0.312162010-01-13 -0.312162010-01-14 -0.312162010-01-15 -0.31216...2010-05-17 -0.9162182010-05-18 -0.9162182010-05-19 -0.9162182010-05-20 -0.9162182010-05-21 -0.9162182010-05-22 -0.9162182010-05-23 -0.9162182010-05-24 -0.9162182010-05-25 -0.9162182010-05-26 -0.9162182010-05-27 -0.9162182010-05-28 -0.9162182010-05-29 -0.9162182010-05-30 -0.9162182010-05-31 -0.916218Freq: D, Length: 151>>> ts.resample('A-JAN',how='sum')#降采樣2010 -0.3121602011 0.327191Freq: A-JAN, dtype: float64
[pandas 時(shí)間序列操作]
from: http://blog.csdn.net/pipisorry/article/details/52209377
ref: [時(shí)間序列預(yù)測全攻略(附帶Python代碼)]
[Complete guide to create a Time Series Forecast (with Codes in Python)]