• 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏吧

R:通过聚合OHLC系列中的值来减少时间序列数据的频率

r 来源:hx1 4次浏览

我有一个高频数据集,用于汇率下降到毫秒,我希望将其转换为R中的低频和常规时间序列数据。每分钟或5分钟OHLC系列(开放,高,低,关闭)。原始数据集有四列,一列用于汇率,一列用于时间戳,其中包括日期和时间以及出价和要价的列。数据已从.csv文件导入。R:通过聚合OHLC系列中的值来减少时间序列数据的频率

{head(GBPUSD)}{tail(GBPUSD)}返回如下:

# A tibble: 6 x 4 
     X1     X2  X3  X4 
    <chr>    <dttm> <dbl> <dbl> 
1 GBP/USD 2017-06-01 00:00:00 1.28756 1.28763 
2 GBP/USD 2017-06-01 00:00:00 1.28754 1.28760 
3 GBP/USD 2017-06-01 00:00:00 1.28754 1.28759 
4 GBP/USD 2017-06-01 00:00:00 1.28753 1.28759 
5 GBP/USD 2017-06-01 00:00:00 1.28753 1.28759 
6 GBP/USD 2017-06-01 00:00:00 1.28753 1.28759 


# A tibble: 6 x 4 
     X1     X2  X3  X4 
    <chr>    <dttm> <dbl> <dbl> 
1 GBP/USD 2017-06-30 20:59:56 1.30093 1.30300 
2 GBP/USD 2017-06-30 20:59:56 1.30121 1.30300 
3 GBP/USD 2017-06-30 20:59:56 1.30100 1.30390 
4 GBP/USD 2017-06-30 20:59:56 1.30146 1.30452 
5 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447 
6 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447 


===========解决方案如下:

我改变了一点点的OP的原始数据集下面的教学/教学方面的原因:

df <- data.frame(
X1=c("GBP/USD"), 
X2=c("2017-06-01 00:00:00", "2017-06-01 00:00:00", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:01", "2017-06-01 00:00:02", "2017-06-30 20:59:52", "2017-06-30 20:59:54", "2017-06-30 20:59:54", "2017-06-30 20:59:56", "2017-06-30 20:59:56", "2017-06-30 20:59:56"), 
X3=c(1.28756, 1.28754, 1.28754, 1.28753, 1.28752, 1.28757, 1.30093, 1.30121, 1.30100, 1.30146, 1.30145,1.30145), 
X4=c(1.28763, 1.28760, 1.28759, 1.28758, 1.28755, 1.28760,1.30300, 1.30300, 1.30390, 1.30452, 1.30447, 1.30447), 
stringsAsFactors=FALSE) 

df 

     X1     X2  X3  X4 
1 GBP/USD 2017-06-01 00:00:00 1.28756 1.28763 
2 GBP/USD 2017-06-01 00:00:00 1.28754 1.28760 
3 GBP/USD 2017-06-01 00:00:01 1.28754 1.28759 
4 GBP/USD 2017-06-01 00:00:01 1.28753 1.28758 
5 GBP/USD 2017-06-01 00:00:01 1.28752 1.28755 
6 GBP/USD 2017-06-01 00:00:02 1.28757 1.28760 
7 GBP/USD 2017-06-30 20:59:52 1.30093 1.30300 
8 GBP/USD 2017-06-30 20:59:54 1.30121 1.30300 
9 GBP/USD 2017-06-30 20:59:54 1.30100 1.30390 
10 GBP/USD 2017-06-30 20:59:56 1.30146 1.30452 
11 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447 
12 GBP/USD 2017-06-30 20:59:56 1.30145 1.30447 

现在,在低频的数据,将有成为相同事物的分组。所以,我们必须找到对应唯一startings指数,以及各组的结局:

indices <- seq_along(df[,2])[!(duplicated(df[,2]))] # 1 3 6 7 8 10; the beginnings of groups (observations) 
indices - 1 # 0 2 5 6 7 9; for finding the endings of groups 
numberoflowfreq <- length(indices) # 6: number of groupings (obs.) for Low Freq data 

公然写明白的模式:

mean(df[1:((indices -1)[2]),3]) # from 1 to 2 
mean(df[indices[2]:((indices -1)[3]),3]) # from 3 to 5 
mean(df[indices[3]:((indices -1)[4]),3]) # from 6 to 6 
mean(df[indices[4]:((indices -1)[5]),3]) # from 7 to 7 
mean(df[indices[5]:((indices -1)[6]),3]) # from 8 to 9 
mean(df[indices[6]:nrow(df),3]) # from 10 to 12 

简化模式:

mean3rdColumn_1st <- mean(df[1:((indices -1)[2]),3]) # from 1 to 2 
mean3rdColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),3])) 
mean3rdColumn_Last <- mean(df[indices[6]:nrow(df),3]) # from 10 to 12 
# 3rd column in low frequency data:  
c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last) 

同样对于第4列:

mean4thColumn_1st <- mean(df[1:((indices -1)[2]),4]) # from 1 to 2 
mean4thColumn_Between <- sapply(2:(numberoflowfreq-1), function(i) mean(df[indices[i]:((indices -1)[i+1]),4])) 
mean4thColumn_Last <- mean(df[indices[6]:nrow(df),4]) # from 10 to 12 
# 4th column in low frequency data: 
c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last) 

收集所有的努力:现在

LowFrqData <- data.frame(X1=c("GBP/USD"), X2=df[indices,2], X3=c(mean3rdColumn_1st, mean3rdColumn_Between, mean3rdColumn_Last), x4=c(mean4thColumn_1st, mean4thColumn_Between, mean4thColumn_Last), stringsAsFactors=FALSE) 
LowFrqData 

     X1     X2  X3  x4 
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615 
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573 
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600 
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 

,列X2具有独特的分钟值,X3X4被相关细胞的形成。

另请注意:某个范围内的所有分钟数可能不会有值。对于这种情况,您可以抽取NA。另一方面,在这种情况下,人们可能会忽略不规则的影响,因为观察的间隔对于许多观察来说可能是相同的,因此不是非常不规则。还要考虑使用线性内插将数据转换为等距观测的事实可以引入一些重要且难以量化的偏差(参见:Scholes和Williams)。

M. Scholes and J. Williams, “Estimating betas from nonsynchronous data”, Journal of Financial Economics 5: 309–327, 1977.

现在,经常5分钟系列部分:

as.numeric(as.POSIXct("1970-01-01 03:00:00")) # 0; starting point for ZERO seconds. "1970-01-01 03:01:00" equals 60. 
as.numeric(as.POSIXct("2017-06-01 00:00:00")) # 1496264400 
# Passed seconds after the first observation in the dataset 
PassedSecs <- as.numeric(as.POSIXct(LowFrqData$X2)) - 1496264400 

LowFrq5minuteRaw <- cbind(LowFrqData, PassedSecs, stringsAsFactors=FALSE) 
LowFrq5minuteRaw 

     X1     X2  X3  x4 PassedSecs 
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615   0 
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573   1 
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600   2 
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192 
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194 
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196 

5分钟装置5 * 60 = 300秒。因此,“在300分钟内具有相同的商数”以5分钟为间隔对观测结果进行分组。

LowFrq5minuteRaw2 <- cbind(LowFrqData, PassedSecs, QbyDto300 = PassedSecs%/%300, stringsAsFactors=FALSE) 
LowFrq5minuteRaw2 

     X1     X2  X3  x4 PassedSecs QbyDto300 
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287615   0   0 
2 GBP/USD 2017-06-01 00:00:01 1.287530 1.287573   1   0 
3 GBP/USD 2017-06-01 00:00:02 1.287570 1.287600   2   0 
4 GBP/USD 2017-06-30 20:59:52 1.300930 1.303000 2581192  8603 
5 GBP/USD 2017-06-30 20:59:54 1.301105 1.303450 2581194  8603 
6 GBP/USD 2017-06-30 20:59:56 1.301453 1.304487 2581196  8603 

indices2 <- seq_along(LowFrq5minuteRaw2[,6])[!(duplicated(LowFrq5minuteRaw2[,6]))] # 1 4; the beginnings of groups 

LowFrq5minute <- data.frame(X1=c("GBP/USD"), X2=LowFrq5minuteRaw2[indices2,2], X3=aggregate(LowFrqData[,3] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2], X4=aggregate(LowFrqData[,4] ~ QbyDto300, LowFrq5minuteRaw2, mean)[,2]) 
LowFrq5minute 

     X1     X2  X3  X4 
1 GBP/USD 2017-06-01 00:00:00 1.287550 1.287596 
2 GBP/USD 2017-06-30 20:59:52 1.301163 1.303646 

X2持有5分钟OBS的趴在区间的代表第一次出现次数的时间戳。


版权声明:本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。
喜欢 (0)