arima 数据预处理-Linux大棚

admin 管理员组

文章数量: 1087652

arima 数据预处理

1 数据的预处理

时间序列数据生成 import pandas as pd

import numpy as np

date_range：

可以指定开始时间与周期

H：小时

D：天

M：月

# TIMES #2016 Jul 1 7/1/2016 1/7/2016 2016-07-01 2016/07/01

rng = pd.date_range('2016-07-01', periods = 10, freq = '3D')

rng

DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10',

'2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22',

'2016-07-25', '2016-07-28'],

dtype='datetime64[ns]', freq='3D')

time=pd.Series(np.random.randn(20),

index=pd.date_range(dt.datetime(2016,1,1),periods=20))

print(time)

2016-01-01 -0.129379

2016-01-02 0.164480

2016-01-03 -0.639117

2016-01-04 -0.427224

2016-01-05 2.055133

2016-01-06 1.116075

2016-01-07 0.357426

2016-01-08 0.274249

2016-01-09 0.834405

2016-01-10 -0.005444

2016-01-11 -0.134409

2016-01-12 0.249318

2016-01-13 -0.297842

2016-01-14 -0.128514

2016-01-15 0.063690

2016-01-16 -2.246031

2016-01-17 0.359552

2016-01-18 0.383030

2016-01-19 0.402717

2016-01-20 -0.694068

Freq: D, dtype: float64

复制代码

truncate过滤 time.truncate(before='2016-1-10')

2016-01-10 -0.005444

2016-01-11 -0.134409

2016-01-12 0.249318

2016-01-13 -0.297842

2016-01-14 -0.128514

2016-01-15 0.063690

2016-01-16 -2.246031

2016-01-17 0.359552

2016-01-18 0.383030

2016-01-19 0.402717

2016-01-20 -0.694068

Freq: D, dtype: float64

time.truncate(after='2016-1-10')

2016-01-01 -0.129379

2016-01-02 0.164480

2016-01-03 -0.639117

2016-01-04 -0.427224

2016-01-05 2.055133

2016-01-06 1.116075

2016-01-07 0.357426

2016-01-08 0.274249

2016-01-09 0.834405

2016-01-10 -0.005444

Freq: D, dtype: float64

print(time['2016-01-15':'2016-01-20'])

2016-01-15 0.063690

2016-01-16 -2.246031

2016-01-17 0.359552

2016-01-18 0.383030

2016-01-19 0.402717

2016-01-20 -0.694068

Freq: D, dtype: float64

data=pd.date_range('2010-01-01','2011-01-01',freq='M')

print(data)

DatetimeIndex(['2010-01-31', '2010-02-28', '2010-03-31', '2010-04-30',

'2010-05-31', '2010-06-30', '2010-07-31', '2010-08-31',

'2010-09-30', '2010-10-31', '2010-11-30', '2010-12-31'],

dtype='datetime64[ns]', freq='M')

# 指定索引

rng = pd.date_range('2016 Jul 1', periods = 10, freq = 'D')

rng

pd.Series(range(len(rng)), index = rng)

2016-07-01 0

2016-07-02 1

2016-07-03 2

2016-07-04 3

2016-07-05 4

2016-07-06 5

2016-07-07 6

2016-07-08 7

2016-07-09 8

2016-07-10 9

Freq: D, dtype: int32

复制代码

指定索引 periods = [pd.Period('2016-01'), pd.Period('2016-02'), pd.Period('2016-03')]

ts = pd.Series(np.random.randn(len(periods)), index = periods)

2016-07-01 0

2016-07-02 1

2016-07-03 2

2016-07-04 3

2016-07-05 4

2016-07-06 5

2016-07-07 6

2016-07-08 7

2016-07-09 8

2016-07-10 9

Freq: D, dtype: int32

复制代码

时间戳和时间周期可以转换 ts = pd.Series(range(10), pd.date_range('07-10-16 8:00', periods = 10, freq = 'H'))

2016-07-10 08:00:00 0

2016-07-10 09:00:00 1

2016-07-10 10:00:00 2

2016-07-10 11:00:00 3

2016-07-10 12:00:00 4

2016-07-10 13:00:00 5

2016-07-10 14:00:00 6

2016-07-10 15:00:00 7

2016-07-10 16:00:00 8

2016-07-10 17:00:00 9

Freq: H, dtype: int32

ts_period = ts.to_period()

ts_period

2016-07-10 08:00 0

2016-07-10 09:00 1

2016-07-10 10:00 2

2016-07-10 11:00 3

2016-07-10 12:00 4

2016-07-10 13:00 5

2016-07-10 14:00 6

2016-07-10 15:00 7

2016-07-10 16:00 8

2016-07-10 17:00 9

Freq: H, dtype: int32

ts_period['2016-07-10 08:30':'2016-07-10 11:45']

2016-07-10 08:00 0

2016-07-10 09:00 1

2016-07-10 10:00 2

2016-07-10 11:00 3

Freq: H, dtype: int32

ts['2016-07-10 08:30':'2016-07-10 11:45']

2016-07-10 09:00:00 1

2016-07-10 10:00:00 2

2016-07-10 11:00:00 3

Freq: H, dtype: int32

复制代码

2 数据重采样

时间数据由一个频率转换到另一个频率

降采样

升采样 rng = pd.date_range('1/1/2011', periods=90, freq='D')

ts = pd.Series(np.random.randn(len(rng)), index=rng)

ts.head()

2011-01-01 -1.025562

2011-01-02 0.410895

2011-01-03 0.660311

2011-01-04 0.710293

2011-01-05 0.444985

Freq: D, dtype: float64

ts.resample('M').sum()

2011-01-31 2.510102

2011-02-28 0.583209

2011-03-31 2.749411

Freq: M, dtype: float64

ts.resample('3D').sum()

2011-01-01 0.045643

2011-01-04 -2.255206

2011-01-07 0.571142

2011-01-10 0.835032

2011-01-13 -0.396766

2011-01-16 -1.156253

2011-01-19 -1.286884

2011-01-22 2.883952

2011-01-25 1.566908

2011-01-28 1.435563

2011-01-31 0.311565

2011-02-03 -2.541235

2011-02-06 0.317075

2011-02-09 1.598877

2011-02-12 -1.950509

2011-02-15 2.928312

2011-02-18 -0.733715

2011-02-21 1.674817

2011-02-24 -2.078872

2011-02-27 2.172320

2011-03-02 -2.022104

2011-03-05 -0.070356

2011-03-08 1.276671

2011-03-11 -2.835132

2011-03-14 -1.384113

2011-03-17 1.517565

2011-03-20 -0.550406

2011-03-23 0.773430

2011-03-26 2.244319

2011-03-29 2.951082

Freq: 3D, dtype: float64

day3Ts = ts.resample('3D').mean()

day3Ts

2011-01-01 0.015214

2011-01-04 -0.751735

2011-01-07 0.190381

2011-01-10 0.278344

2011-01-13 -0.132255

2011-01-16 -0.385418

2011-01-19 -0.428961

2011-01-22 0.961317

2011-01-25 0.522303

2011-01-28 0.478521

2011-01-31 0.103855

2011-02-03 -0.847078

2011-02-06 0.105692

2011-02-09 0.532959

2011-02-12 -0.650170

2011-02-15 0.976104

2011-02-18 -0.244572

2011-02-21 0.558272

2011-02-24 -0.692957

2011-02-27 0.724107

2011-03-02 -0.674035

2011-03-05 -0.023452

2011-03-08 0.425557

2011-03-11 -0.945044

2011-03-14 -0.461371

2011-03-17 0.505855

2011-03-20 -0.183469

2011-03-23 0.257810

2011-03-26 0.748106

2011-03-29 0.983694

Freq: 3D, dtype: float64

## 下采样

print(day3Ts.resample('D').asfreq())

2011-01-01 0.015214

2011-01-02 NaN

2011-01-03 NaN

2011-01-04 -0.751735

2011-01-05 NaN

2011-01-06 NaN

2011-01-07 0.190381

2011-01-08 NaN

2011-01-09 NaN

2011-01-10 0.278344

2011-01-11 NaN

2011-01-12 NaN

2011-01-13 -0.132255

2011-01-14 NaN

2011-01-15 NaN

2011-01-16 -0.385418

2011-01-17 NaN

2011-01-18 NaN

2011-01-19 -0.428961

2011-01-20 NaN

2011-01-21 NaN

2011-01-22 0.961317

Freq: D, Length: 88, dtype: float64

复制代码

ffill 空值取前面的值

bfill 空值取后面的值

interpolate 线性取值 day3Ts.resample('D').ffill(1)

2011-01-01 0.015214

2011-01-02 0.015214

2011-01-03 NaN

2011-01-04 -0.751735

2011-01-05 -0.751735

2011-01-06 NaN

2011-01-07 0.190381

2011-01-08 0.190381

2011-01-09 NaN

2011-01-10 0.278344

2011-01-11 0.278344

day3Ts.resample('D').bfill(1)

2011-01-01 0.015214

2011-01-02 NaN

2011-01-03 -0.751735

2011-01-04 -0.751735

2011-01-05 NaN

2011-01-06 0.190381

2011-01-07 0.190381

2011-01-08 NaN

2011-01-09 0.278344

2011-01-10 0.278344

2011-01-11 NaN

2011-01-12 -0.132255

2011-01-13 -0.132255

day3Ts.resample('D').interpolate('linear')

2011-01-01 0.015214

2011-01-02 -0.240435

2011-01-03 -0.496085

2011-01-04 -0.751735

2011-01-05 -0.437697

2011-01-06 -0.123658

2011-01-07 0.190381

2011-01-08 0.219702

2011-01-09 0.249023

2011-01-10 0.278344

2011-01-11 0.141478

2011-01-12 0.004611

2011-01-13 -0.132255

2011-01-14 -0.216643

2011-01-15 -0.301030

复制代码

3 滑动窗

滑动窗计算 %matplotlib inline

import matplotlib.pylab

import numpy as np

import pandas as pd

df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2016', freq = 'D', periods = 600))

df.head()

2016-07-01 -0.192140

2016-07-02 0.357953

2016-07-03 -0.201847

2016-07-04 -0.372230

2016-07-05 1.414753

Freq: D, dtype: float64

r = df.rolling(window = 10)

#r.max, r.median, r.std, r.skew, r.sum, r.var

print(r.mean())

016-07-01 NaN

2016-07-02 NaN

2016-07-03 NaN

2016-07-04 NaN

2016-07-05 NaN

2016-07-06 NaN

2016-07-07 NaN

2016-07-08 NaN

2016-07-09 NaN

2016-07-10 0.300133

2016-07-11 0.284780

2016-07-12 0.252831

2016-07-13 0.220699

2016-07-14 0.167137

2016-07-15 0.018593

2016-07-16 -0.061414

2016-07-17 -0.134593

2016-07-18 -0.153333

2016-07-19 -0.218928

2016-07-20 -0.169426

2016-07-21 -0.219747

2016-07-22 -0.181266

2016-07-23 -0.173674

2016-07-24 -0.130629

2016-07-25 -0.166730

2016-07-26 -0.233044

2016-07-27 -0.256642

2016-07-28 -0.280738

2016-07-29 -0.289893

2016-07-30 -0.379625

...

2018-01-22 -0.211467

2018-01-23 0.034996

2018-01-24 -0.105910

2018-01-25 -0.145774

2018-01-26 -0.089320

2018-01-27 -0.164370

2018-01-28 -0.110892

2018-01-29 -0.205786

2018-01-30 -0.101162

2018-01-31 -0.034760

2018-02-01 0.229333

2018-02-02 0.043741

2018-02-03 0.052837

2018-02-04 0.057746

2018-02-05 -0.071401

2018-02-06 -0.011153

2018-02-07 -0.045737

2018-02-08 -0.021983

2018-02-09 -0.196715

2018-02-10 -0.063721

2018-02-11 -0.289452

2018-02-12 -0.050946

2018-02-13 -0.047014

2018-02-14 0.048754

2018-02-15 0.143949

2018-02-16 0.424823

2018-02-17 0.361878

2018-02-18 0.363235

2018-02-19 0.517436

2018-02-20 0.368020

Freq: D, Length: 600, dtype: float64

复制代码

可视化 import matplotlib.pyplot as plt

%matplotlib inline

plt.figure(figsize=(15, 5))

df.plot(style='r--')

df.rolling(window=10).mean().plot(style='b')

复制代码

4 ARIMA预测

数据的预处理 import pandas_datareader

import datetime

import matplotlib.pylab as plt

import seaborn as sns

from matplotlib.pylab import style

from statsmodels.tsa.arima_model import ARIMA

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

style.use('ggplot')

plt.rcParams['font.sans-serif'] = ['SimHei']

plt.rcParams['axes.unicode_minus'] = False

stockFile = 'data/T10yr.csv'

stock = pd.read_csv(stockFile, index_col=0, parse_dates=[0])

stock.head(10)

复制代码

stock_week = stock['Close'].resample('W-MON').mean()

stock_train = stock_week['2000':'2015']

stock_train.plot(figsize=(12,8))

plt.legend(bbox_to_anchor=(1.25, 0.5))

plt.title("Stock Close")

sns.despine()

复制代码

stock_diff = stock_train.diff()

stock_diff = stock_diff.dropna()

plt.figure()

plt.plot(stock_diff)

plt.title('一阶差分')

plt.show()

复制代码

acf = plot_acf(stock_diff, lags=20)

plt.title("ACF")

acf.show()

复制代码

pacf = plot_pacf(stock_diff, lags=20)

plt.title("PACF")

pacf.show()

复制代码

model = ARIMA(stock_train, order=(1, 1, 1),freq='W-MON')

result = model.fit()

#print(result.summary())

pred = result.predict('20140609', '20160701',dynamic=True, typ='levels')

print (pred)

2014-06-09 2.463559

2014-06-16 2.455539

2014-06-23 2.449569

2014-06-30 2.444183

2014-07-07 2.438962

2014-07-14 2.433788

2014-07-21 2.428627

2014-07-28 2.423470

2014-08-04 2.418315

2014-08-11 2.413159

2014-08-18 2.408004

2014-08-25 2.402849

2014-09-01 2.397693

2014-09-08 2.392538

2014-09-15 2.387383

plt.figure(figsize=(6, 6))

plt.xticks(rotation=45)

plt.plot(pred)

plt.plot(stock_train)

复制代码

本文标签： arima 数据预处理

版权声明：本文标题：arima 数据预处理内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.roclinux.cn/p/1700324230a397121.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

arima 数据预处理

arima 数据预处理

更多相关文章

arima 数据预处理

发表评论

推荐文章

javascript - How do I set custom ticks along the Xaxis in D3? or name the bars in a bar chart? - Stack Overflow

jquery - How to call javascript Function in Oracle APEX on button click in dynamic action? - Stack Overflow

swiftui - .containerBackground availability incorrect? - Stack Overflow

javascript - How To Custom Animate <ul> Carousel Slider with jQuery? - Stack Overflow

Windows7不能使用MS-Speech语音库的解决方法

热门文章

javascript - Why is AWS.Lambda.invoke `error` callback argument never populated? - Stack Overflow

javascript - How to display 3D PDF in Chrome & Firefox Browser - Stack Overflow

reactjs - Infinite loop only when I test my React component - Stack Overflow

javascript - Replacing month name with number - Stack Overflow

javascript - Sending notification using node.js + socket.io - Stack Overflow

knex.js - How to update one single row? - Stack Overflow

windows下MongBD从官网下载速度很慢解决办法

8.隐私与安全 - 使用ChatGPT时的注意事项【810】

亲人朋友的华为手机里面有重要的数据资料图片证明这些想不刷机解锁用什么软件方法想保留所有的东西忘记密码怎么解锁有账户密码的机器荣耀手机也可以用这个教程弄好学会解决

2025最新 pip install 国内可用镜像源仓库地址（01月01日更新）

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

【免费下载】重温经典：MSDN原版Windows 7 with SP1各版本下载推荐

【免费下载】大神U盘工具（Win10PE）UEFI纯净版启动盘制作工具

【免费下载】重温经典：Windows 98原版系统镜像下载资源推荐

Windows系统更新，显示Windows启动管理器，进去后为重装系统界面的解决方法。

win11登录密码忘记了？别慌！无需重装系统，一个U盘轻松移除！

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

arima 数据预处理

arima 数据预处理

更多相关文章

arima 数据预处理

发表评论

推荐文章

javascript - How do I set custom ticks along the Xaxis in D3? or name the bars in a bar chart? - Stack Overflow

jquery - How to call javascript Function in Oracle APEX on button click in dynamic action? - Stack Overflow

swiftui - .containerBackground availability incorrect? - Stack Overflow

javascript - How To Custom Animate &lt;ul&gt; Carousel Slider with jQuery? - Stack Overflow

Windows7不能使用MS-Speech语音库的解决方法

热门文章

javascript - Why is AWS.Lambda.invoke `error` callback argument never populated? - Stack Overflow

javascript - How to display 3D PDF in Chrome &amp; Firefox Browser - Stack Overflow

reactjs - Infinite loop only when I test my React component - Stack Overflow

javascript - Replacing month name with number - Stack Overflow

javascript - Sending notification using node.js + socket.io - Stack Overflow

knex.js - How to update one single row? - Stack Overflow

windows下MongBD从官网下载速度很慢解决办法

8.隐私与安全 - 使用ChatGPT时的注意事项【810】

亲人朋友的华为手机里面有重要的数据资料图片证明这些想不刷机解锁用什么软件方法想保留所有的东西忘记密码怎么解锁有账户密码的机器荣耀手机也可以用这个教程弄好学会解决

2025最新 pip install 国内可用镜像源仓库地址（01月01日更新）

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

【免费下载】 重温经典：MSDN原版Windows 7 with SP1各版本下载推荐

【免费下载】 大神U盘工具（Win10PE）UEFI纯净版启动盘制作工具

【免费下载】 重温经典：Windows 98原版系统镜像下载资源推荐

Windows系统更新，显示Windows启动管理器，进去后为重装系统界面的解决方法。

win11登录密码忘记了？别慌！无需重装系统，一个U盘轻松移除！

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

javascript - How To Custom Animate <ul> Carousel Slider with jQuery? - Stack Overflow

javascript - How to display 3D PDF in Chrome & Firefox Browser - Stack Overflow

【免费下载】重温经典：MSDN原版Windows 7 with SP1各版本下载推荐

【免费下载】大神U盘工具（Win10PE）UEFI纯净版启动盘制作工具

【免费下载】重温经典：Windows 98原版系统镜像下载资源推荐