免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版

打開APP
userphoto
未登錄

開通VIP,暢享免費(fèi)電子書等14項(xiàng)超值服

開通VIP
pandas小記:pandas時(shí)間序列分析和處理Timeseries

http://blog.csdn.net/pipisorry/article/details/52209377

其它時(shí)間序列處理相關(guān)的包

[P4J 0.6: Periodic light curve analysis tools based on Information Theory]

[p4j github]

pandas時(shí)序數(shù)據(jù)文件讀取

dateparse = lambda dates: pd.datetime.strptime(dates, '%Y-%m')
data = pd.read_csv('AirPassengers.csv', parse_dates='Month', index_col='Month',date_parser=dateparse)
print data.head()

read_csv時(shí)序參數(shù)

parse_dates:這是指定含有時(shí)間數(shù)據(jù)信息的列。正如上面所說的,列的名稱為“月份”。
index_col:使用pandas 的時(shí)間序列數(shù)據(jù)背后的關(guān)鍵思想是:目錄成為描述時(shí)間數(shù)據(jù)信息的變量。所以該參數(shù)告訴pandas使用“月份”的列作為索引。
date_parser:指定將輸入的字符串轉(zhuǎn)換為可變的時(shí)間數(shù)據(jù)。Pandas默認(rèn)的數(shù)據(jù)讀取格式是‘YYYY-MM-DD HH:MM:SS’?如需要讀取的數(shù)據(jù)沒有默認(rèn)的格式,就要人工定義。這和dataparse的功能部分相似,這里的定義可以為這一目的服務(wù)。The default uses dateutil.parser.parser to do the conversion.

[pandas.read_csv]

[python模塊:時(shí)間處理模塊]


時(shí)間序列分析和處理Time Series

pandas has simple, powerful, and efficient functionality for performingresampling operations during frequency conversion (e.g., converting secondlydata into 5-minutely data). This is extremely common in, but not limited to,financial applications.

時(shí)序數(shù)據(jù)生成和表示

c = pandas.Timestamp('2012-01-01 00:00:08')

In [103]: rng = pd.date_range('1/1/2012', periods=100, freq='S')In [104]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)In [105]: ts.resample('5Min', how='sum')Out[105]: 2012-01-01 25083Freq: 5T, dtype: int32

Time zone representation

In [106]: rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')In [107]: ts = pd.Series(np.random.randn(len(rng)), rng)In [108]: tsOut[108]: 2012-03-06 0.4640002012-03-07 0.2273712012-03-08 -0.4969222012-03-09 0.3063892012-03-10 -2.290613Freq: D, dtype: float64In [109]: ts_utc = ts.tz_localize('UTC')In [110]: ts_utcOut[110]: 2012-03-06 00:00:00+00:00 0.4640002012-03-07 00:00:00+00:00 0.2273712012-03-08 00:00:00+00:00 -0.4969222012-03-09 00:00:00+00:00 0.3063892012-03-10 00:00:00+00:00 -2.290613Freq: D, dtype: float64

時(shí)序轉(zhuǎn)換

Convert to another time zone

In [111]: ts_utc.tz_convert('US/Eastern')Out[111]: 2012-03-05 19:00:00-05:00 0.4640002012-03-06 19:00:00-05:00 0.2273712012-03-07 19:00:00-05:00 -0.4969222012-03-08 19:00:00-05:00 0.3063892012-03-09 19:00:00-05:00 -2.290613Freq: D, dtype: float64

Converting between time span representations

In [112]: rng = pd.date_range('1/1/2012', periods=5, freq='M')In [113]: ts = pd.Series(np.random.randn(len(rng)), index=rng)In [114]: tsOut[114]: 2012-01-31 -1.1346232012-02-29 -1.5618192012-03-31 -0.2608382012-04-30 0.2819572012-05-31 1.523962Freq: M, dtype: float64In [115]: ps = ts.to_period()In [116]: psOut[116]: 2012-01 -1.1346232012-02 -1.5618192012-03 -0.2608382012-04 0.2819572012-05 1.523962Freq: M, dtype: float64In [117]: ps.to_timestamp()Out[117]: 2012-01-01 -1.1346232012-02-01 -1.5618192012-03-01 -0.2608382012-04-01 0.2819572012-05-01 1.523962Freq: MS, dtype: float64

Converting between period and timestamp enables some convenient arithmeticfunctions to be used. In the following example, we convert a quarterlyfrequency with year ending in November to 9am of the end of the month followingthe quarter end:

In [118]: prng = pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')In [119]: ts = pd.Series(np.random.randn(len(prng)), prng)In [120]: ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9In [121]: ts.head()Out[121]: 1990-03-01 09:00 -0.9029371990-06-01 09:00 0.0681591990-09-01 09:00 -0.0578731990-12-01 09:00 -0.3682041991-03-01 09:00 -1.144073Freq: H, dtype: float64

[pandas-docs/stable/timeseries]

[pandas cookbook Timeseries]

皮皮blog



pandas時(shí)序類型

pandas 的 TimeStamp

pandas 最基本的時(shí)間日期對象是一個從 Series 派生出來的子類 TimeStamp,這個對象與 datetime 對象保有高度兼容性,可通過 pd.to_datetime() 函數(shù)轉(zhuǎn)換。(一般是從 datetime 轉(zhuǎn)換為 Timestamp)

lang:python>>> pd.to_datetime(now)Timestamp('2014-06-17 15:56:19.313193', tz=None)>>> pd.to_datetime(np.nan)NaT


pandas 的時(shí)間序列

pandas 最基本的時(shí)間序列類型就是以時(shí)間戳(TimeStamp)為 index 元素的 Series 類型。

lang:python>>> dates = [datetime(2011,1,1),datetime(2011,1,2),datetime(2011,1,3)]>>> ts = Series(np.random.randn(3),index=dates)>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.154522dtype: float64>>> type(ts)>>> ts.index[2011-01-01, ..., 2011-01-03]Length: 3, Freq: None, Timezone: None>>> ts.index[0]Timestamp('2011-01-01 00:00:00', tz=None)

時(shí)間序列之間的算術(shù)運(yùn)算會自動按時(shí)間對齊。

索引、選取、子集構(gòu)造

時(shí)間序列只是 index 比較特殊的 Series ,因此一般的索引操作對時(shí)間序列依然有效。其特別之處在于對時(shí)間序列索引的操作優(yōu)化。如使用各種字符串進(jìn)行索引:

lang:python>>> ts['20110101']0.36228897878097266>>> ts['2011-01-01']0.36228897878097266>>> ts['01/01/2011']0.36228897878097266

對于較長的序列,還可以只傳入 “年” 或 “年月” 選取切片:

lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222012-12-25 0.111869dtype: float64>>> ts['2012']2012-12-25 0.111869dtype: float64>>> ts['2011-1-2':'2012-12']2011-01-02 0.5866952011-01-03 -0.1545222012-12-25 0.111869dtype: float64

除了這種字符串切片方式外,還有一種實(shí)例方法可用:ts.truncate(after='2011-01-03')。

值得注意的是,切片時(shí)使用的字符串時(shí)間戳并不必存在于 index 之中,如 ts.truncate(before='3055') 也是合法的。

Time/Date Components

There are several time/date properties that one can access from Timestamp or a collection of timestamps like a DateTimeIndex.

PropertyDescription
yearThe year of the datetime
monthThe month of the datetime
dayThe days of the datetime
hourThe hour of the datetime
minuteThe minutes of the datetime
secondThe seconds of the datetime
microsecondThe microseconds of the datetime
nanosecondThe nanoseconds of the datetime
dateReturns datetime.date
timeReturns datetime.time
dayofyearThe ordinal day of year
weekofyearThe week ordinal of the year
weekThe week ordinal of the year
dayofweekThe numer of the day of the week with Monday=0, Sunday=6
weekdayThe number of the day of the week with Monday=0, Sunday=6
weekday_nameThe name of the day in a week (ex: Friday)
quarterQuarter of the date: Jan=Mar = 1, Apr-Jun = 2, etc.
days_in_monthThe number of days in the month of the datetime
is_month_startLogical indicating if first day of month (defined by frequency)
is_month_endLogical indicating if last day of month (defined by frequency)
is_quarter_startLogical indicating if first day of quarter (defined by frequency)
is_quarter_endLogical indicating if last day of quarter (defined by frequency)
is_year_startLogical indicating if first day of year (defined by frequency)
is_year_endLogical indicating if last day of year (defined by frequency)

Furthermore, if you have a Series with datetimelike values, then you can access these properties via the .dt accessor, see the docs.

[Time/Date Components?]

日期的范圍、頻率以及移動

pandas 中的時(shí)間序列一般被默認(rèn)為不規(guī)則的,即沒有固定的頻率。但出于分析的需要,我們可以通過插值的方式將序列轉(zhuǎn)換為具有固定頻率的格式。一種快捷方式是使用 .resample(rule) 方法:

lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-06 0.222958dtype: float64>>> ts.resample('D')2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-04 NaN2011-01-05 NaN2011-01-06 0.222958Freq: D, dtype: float64

生成日期范圍

pd.date_range() 可用于生成指定長度的 DatetimeIndex。參數(shù)可以是起始結(jié)束日期,或單給一個日期,加一個時(shí)間段參數(shù)。日期是包含的。

lang:python>>> pd.date_range('20100101','20100110')[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None>>> pd.date_range(start='20100101',periods=10)[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None>>> pd.date_range(end='20100110',periods=10)[2010-01-01, ..., 2010-01-10]Length: 10, Freq: D, Timezone: None

默認(rèn)情況下,date_range 會按天計(jì)算時(shí)間點(diǎn)。這可以通過 freq 參數(shù)進(jìn)行更改,如 “BM” 代表 bussiness end of month。

lang:python>>> pd.date_range('20100101','20100601',freq='BM')[2010-01-29, ..., 2010-05-31]Length: 5, Freq: BM, Timezone: None


頻率和日期偏移量

pandas 中的頻率是由一個基礎(chǔ)頻率和一個乘數(shù)組成的?;A(chǔ)頻率通常以一個字符串別名表示,如上例中的 “BM”。對于每個基礎(chǔ)頻率,都有一個被稱為日期偏移量(date offset)的對象與之對應(yīng)??梢酝ㄟ^實(shí)例化日期偏移量來創(chuàng)建某種頻率:

lang:python>>> Hour()>>> Hour(2)<2 *="" hours="">>>> Hour(1) + Minute(30)<90 *="" minutes="">

但一般來說不必這么麻煩,使用前面提過的字符串別名來創(chuàng)建頻率就可以了:

lang:python>>> pd.date_range('00:00','12:00',freq='1h20min')[2014-06-17 00:00:00, ..., 2014-06-17 12:00:00]Length: 10, Freq: 80T, Timezone: None

可用的別名,可以通過 help() 或 文檔來查詢,這里就不寫了。

移動(超前和滯后)數(shù)據(jù)

移動(shifting)指的是沿著時(shí)間軸將數(shù)據(jù)前移或后移。Series 和 DataFrame 都有一個 .shift() 方法用于執(zhí)行單純的移動操作,index 維持不變:

lang:python>>> ts2011-01-01 0.3622892011-01-02 0.5866952011-01-03 -0.1545222011-01-06 0.222958dtype: float64>>> ts.shift(2)2011-01-01 NaN2011-01-02 NaN2011-01-03 0.3622892011-01-06 0.586695dtype: float64>>> ts.shift(-2)2011-01-01 -0.1545222011-01-02 0.2229582011-01-03 NaN2011-01-06 NaNdtype: float64

上例中因?yàn)橐苿硬僮鳟a(chǎn)生了 NA 值,另一種移動方法是移動 index,而保持?jǐn)?shù)據(jù)不變。這種移動方法需要額外提供一個 freq 參數(shù)來指定移動的頻率:

lang:python>>> ts.shift(2,freq='D')2011-01-03 0.3622892011-01-04 0.5866952011-01-05 -0.1545222011-01-08 0.222958dtype: float64>>> ts.shift(2,freq='3D')2011-01-07 0.3622892011-01-08 0.5866952011-01-09 -0.1545222011-01-12 0.222958dtype: float64

時(shí)期及其算術(shù)運(yùn)算

本節(jié)使用的時(shí)期(period)概念不同于前面的時(shí)間戳(timestamp),指的是一個時(shí)間段。但在使用上并沒有太多不同,pd.Period 類的構(gòu)造函數(shù)仍需要一個時(shí)間戳,以及一個 freq 參數(shù)。freq 用于指明該 period 的長度,時(shí)間戳則說明該 period 在公園時(shí)間軸上的位置。

lang:python>>> p = pd.Period(2010,freq='M')>>> pPeriod('2010-01', 'M')>>> p + 2Period('2010-03', 'M')

上例中我給 period 的構(gòu)造器傳了一個 “年” 單位的時(shí)間戳和一個 “Month” 的 freq,pandas 便自動把 2010 解釋為了 2010-01。

period_range 函數(shù)可用于創(chuàng)建規(guī)則的時(shí)間范圍:

lang:python>>> pd.period_range('2010-01','2010-05',freq='M')freq: M[2010-01, ..., 2010-05]length: 5

PeriodIndex 類保存了一組 period,它可以在任何 pandas 數(shù)據(jù)結(jié)構(gòu)中被用作軸索引:

lang:python>>> Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))2010-01 0.7559612010-02 -1.0744922010-03 -0.3797192010-04 0.1536622010-05 -0.291157Freq: M, dtype: float64


時(shí)期的頻率轉(zhuǎn)換

Period 和 PeriodIndex 對象都可以通過其 .asfreq(freq, method=None, how=None) 方法被轉(zhuǎn)換成別的頻率。

lang:python>>> p = pd.Period('2007',freq='A-DEC')>>> p.asfreq('M',how='start')Period('2007-01', 'M')>>> p.asfreq('M',how='end')Period('2007-12', 'M')>>> ts = Series(np.random.randn(1),index=[p])>>> ts2007 -0.112347Freq: A-DEC, dtype: float64>>> ts.asfreq('M',how='start')2007-01 -0.112347Freq: M, dtype: float64


時(shí)間戳與時(shí)期間相互轉(zhuǎn)換

以時(shí)間戳和以時(shí)期為 index 的 Series 和 DataFrame 都有一對 .to_period() 和 to_timestamp(how='start') 方法用于互相轉(zhuǎn)換 index 的類型。因?yàn)閺?period 到 timestamp 的轉(zhuǎn)換涉及到一個取端值的問題,所以需要一個額外的 how 參數(shù),默認(rèn)為 'start':

lang:python>>> ts = Series(np.random.randn(5),index=pd.period_range('201001','201005',freq='M'))>>> ts2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64>>> ts.to_timestamp()2010-01-01 -0.3121602010-02-01 0.9626522010-03-01 -0.9594782010-04-01 1.2402362010-05-01 -0.916218Freq: MS, dtype: float64>>> ts.to_timestamp(how='end')2010-01-31 -0.3121602010-02-28 0.9626522010-03-31 -0.9594782010-04-30 1.2402362010-05-31 -0.916218Freq: M, dtype: float64>>> ts.to_timestamp().to_period()2010-01-01 00:00:00.000 -0.3121602010-02-01 00:00:00.000 0.9626522010-03-01 00:00:00.000 -0.9594782010-04-01 00:00:00.000 1.2402362010-05-01 00:00:00.000 -0.916218Freq: L, dtype: float64>>> ts.to_timestamp().to_period('M')2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64


重采樣及頻率轉(zhuǎn)換

重采樣(resampling)指的是將時(shí)間序列從一個頻率轉(zhuǎn)換到另一個頻率的過程。pandas 對象都含有一個.resample(freq, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0) 方法用于實(shí)現(xiàn)這個過程。

本篇最前面曾用 resample 規(guī)整化過時(shí)間序列。當(dāng)時(shí)進(jìn)行的是插值操作,因?yàn)樵饕念l率與給出的 freq 參數(shù)相同。resample 方法更多的應(yīng)用場合是 freq 發(fā)生改變的時(shí)候,這時(shí)操作就分為升采樣(upsampling)和降采樣(downsampling)兩種。具體的區(qū)別都體現(xiàn)在參數(shù)里。

lang:python>>> ts2010-01 -0.3121602010-02 0.9626522010-03 -0.9594782010-04 1.2402362010-05 -0.916218Freq: M, dtype: float64>>> ts.resample('D',fill_method='ffill')#升采樣2010-01-01 -0.312162010-01-02 -0.312162010-01-03 -0.312162010-01-04 -0.312162010-01-05 -0.312162010-01-06 -0.312162010-01-07 -0.312162010-01-08 -0.312162010-01-09 -0.312162010-01-10 -0.312162010-01-11 -0.312162010-01-12 -0.312162010-01-13 -0.312162010-01-14 -0.312162010-01-15 -0.31216...2010-05-17 -0.9162182010-05-18 -0.9162182010-05-19 -0.9162182010-05-20 -0.9162182010-05-21 -0.9162182010-05-22 -0.9162182010-05-23 -0.9162182010-05-24 -0.9162182010-05-25 -0.9162182010-05-26 -0.9162182010-05-27 -0.9162182010-05-28 -0.9162182010-05-29 -0.9162182010-05-30 -0.9162182010-05-31 -0.916218Freq: D, Length: 151>>> ts.resample('A-JAN',how='sum')#降采樣2010 -0.3121602011 0.327191Freq: A-JAN, dtype: float64

[pandas 時(shí)間序列操作]

from: http://blog.csdn.net/pipisorry/article/details/52209377

ref: [時(shí)間序列預(yù)測全攻略(附帶Python代碼)]

[Complete guide to create a Time Series Forecast (with Codes in Python)]


本站僅提供存儲服務(wù),所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請點(diǎn)擊舉報(bào)。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
Python學(xué)習(xí)教程_Python學(xué)習(xí)路線:Pandas庫分析-時(shí)間序列的處理
Pandas 時(shí)間序列1 - 縱覽與時(shí)間戳
Data Whale第20期組隊(duì)學(xué)習(xí) Pandas學(xué)習(xí)—時(shí)序數(shù)據(jù)
Pandas時(shí)間序列:時(shí)區(qū)處理
PANDAS QUICK START 
python+pandas生成指定日期和重采樣
更多類似文章 >>
生活服務(wù)
分享 收藏 導(dǎo)長圖 關(guān)注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權(quán)!
如果VIP功能使用有故障,
可點(diǎn)擊這里聯(lián)系客服!

聯(lián)系客服