| Title: | Nonlinear Nonparametric Statistics |
|---|---|
| Description: | NNS (Nonlinear Nonparametric Statistics) leverages partial moments – the fundamental elements of variance that asymptotically approximate the area under f(x) – to provide a robust foundation for nonlinear analysis while maintaining linear equivalences. Designed for real-world data that violates symmetry, linearity, or distributional assumptions, NNS delivers a comprehensive suite of advanced statistical techniques, including: Numerical integration, Numerical differentiation, Clustering, Correlation, Dependence, Causal analysis, ANOVA, Regression, Classification, Seasonality, Autoregressive modeling, Normalization, Stochastic superiority / dominance and Advanced Monte Carlo sampling. All routines based on: Viole, F. and Nawrocki, D. (2013), Nonlinear Nonparametric Statistics: Using Partial Moments (ISBN: 1490523995, Second edition: <https://ovvo-financial.github.io/NNS/book/>). |
| Authors: | Fred Viole [aut, cre], Roberto Spadim [ctb] |
| Maintainer: | Fred Viole <[email protected]> |
| License: | GPL-3 |
| Version: | 12.1 |
| Built: | 2026-06-08 05:34:32 UTC |
| Source: | https://github.com/ovvo-financial/nns |
Computes the co‑lower partial moment (lower‑left quadrant 4) between two equal‑length numeric vectors at any degree and target.
Co.LPM(degree_lpm, x, y, target_x, target_y, degree_y = NULL)Co.LPM(degree_lpm, x, y, target_x, target_y, degree_y = NULL)
degree_lpm |
numeric; degree for x ("degree_x"). degree = 0 gives frequency, degree = 1 gives area. |
x |
numeric vector of observations. |
y |
numeric vector of the same length as x. |
target_x |
numeric vector; thresholds for x (defaults to mean(x)). |
target_y |
numeric vector; thresholds for y (defaults to mean(y)). |
degree_y |
numeric; optional degree for y. If omitted, 'degree_lpm' is used for both x and y. |
Numeric vector of co‑LPM values.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
set.seed(123) x <- rnorm(100); y <- rnorm(100) Co.LPM(0, x, y, mean(x), mean(y))set.seed(123) x <- rnorm(100); y <- rnorm(100) Co.LPM(0, x, y, mean(x), mean(y))
This function generates an n‑dimensional co‑lower partial moment (n >= 2) for any degree or target.
Co.LPM_nD(data, target, degree = 0, norm = TRUE)Co.LPM_nD(data, target, degree = 0, norm = TRUE)
data |
A numeric matrix with observations in rows and variables in columns. |
target |
A numeric vector, length equal to ncol(data). |
degree |
numeric; degree for lower deviations (0 = frequency, 1 = area). |
norm |
logical; if |
Numeric; the n‑dimensional co‑lower partial moment.
## Not run: mat <- matrix(rnorm(200), ncol = 4) Co.LPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)## Not run: mat <- matrix(rnorm(200), ncol = 4) Co.LPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)
Computes the co‑upper partial moment (upper‑right quadrant 1) between two equal‑length numeric vectors at any degree and target.
Co.UPM(degree_upm, x, y, target_x, target_y, degree_y = NULL)Co.UPM(degree_upm, x, y, target_x, target_y, degree_y = NULL)
degree_upm |
numeric; degree for x ("degree_x"). degree = 0 gives frequency, degree = 1 gives area. |
x |
numeric vector of observations. |
y |
numeric vector of the same length as x. |
target_x |
numeric vector; thresholds for x (defaults to mean(x)). |
target_y |
numeric vector; thresholds for y (defaults to mean(y)). |
degree_y |
numeric; optional degree for y. If omitted, 'degree_upm' is used for both x and y. |
Numeric vector of co‑UPM values.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
set.seed(123) x <- rnorm(100); y <- rnorm(100) Co.UPM(0, x, y, mean(x), mean(y))set.seed(123) x <- rnorm(100); y <- rnorm(100) Co.UPM(0, x, y, mean(x), mean(y))
This function generates an n‑dimensional co‑upper partial moment (n >= 2) for any degree or target.
Co.UPM_nD(data, target, degree = 0, norm = TRUE)Co.UPM_nD(data, target, degree = 0, norm = TRUE)
data |
A numeric matrix with observations in rows and variables in columns. |
target |
A numeric vector, length equal to ncol(data). |
degree |
numeric; degree for upper deviations (0 = frequency, 1 = area). |
norm |
logical; if |
Numeric; the n‑dimensional co‑upper partial moment.
## Not run: mat <- matrix(rnorm(200), ncol = 4) Co.UPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)## Not run: mat <- matrix(rnorm(200), ncol = 4) Co.UPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)
Computes the divergent lower partial moment (lower‑right quadrant 3) between two equal‑length numeric vectors.
D.LPM(degree_lpm, degree_upm, x, y, target_x, target_y)D.LPM(degree_lpm, degree_upm, x, y, target_x, target_y)
degree_lpm |
numeric; LPM degree = 0 gives frequency, = 1 gives area. |
degree_upm |
numeric; UPM degree = 0 gives frequency, = 1 gives area. |
x |
numeric vector of observations. |
y |
numeric vector of the same length as x. |
target_x |
numeric vector; thresholds for x (defaults to mean(x)). |
target_y |
numeric vector; thresholds for y (defaults to mean(y)). |
Numeric vector of divergent LPM values.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
set.seed(123) x <- rnorm(100); y <- rnorm(100) D.LPM(0, 0, x, y, mean(x), mean(y))set.seed(123) x <- rnorm(100); y <- rnorm(100) D.LPM(0, 0, x, y, mean(x), mean(y))
Computes the divergent upper partial moment (upper‑left quadrant 2) between two equal‑length numeric vectors.
D.UPM(degree_lpm, degree_upm, x, y, target_x, target_y)D.UPM(degree_lpm, degree_upm, x, y, target_x, target_y)
degree_lpm |
numeric; LPM degree = 0 gives frequency, = 1 gives area. |
degree_upm |
numeric; UPM degree = 0 gives frequency, = 1 gives area. |
x |
numeric vector of observations. |
y |
numeric vector of the same length as x. |
target_x |
numeric vector; thresholds for x (defaults to mean(x)). |
target_y |
numeric vector; thresholds for y (defaults to mean(y)). |
Numeric vector of divergent UPM values.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
set.seed(123) x <- rnorm(100); y <- rnorm(100) D.UPM(0, 0, x, y, mean(x), mean(y))set.seed(123) x <- rnorm(100); y <- rnorm(100) D.UPM(0, 0, x, y, mean(x), mean(y))
This function generates the aggregate n‑dimensional divergent partial moment (n >= 2) for any degree or target.
DPM_nD(data, target, degree = 0, norm = TRUE)DPM_nD(data, target, degree = 0, norm = TRUE)
data |
A numeric matrix with observations in rows and variables in columns. |
target |
A numeric vector, length equal to ncol(data). |
degree |
numeric; degree for upper deviations (0 = frequency, 1 = area). |
norm |
logical; if |
Numeric; the n-dimensional divergent partial moment.
## Not run: mat <- matrix(rnorm(200), ncol = 4) DPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)## Not run: mat <- matrix(rnorm(200), ncol = 4) DPM_nD(mat, rep(0, ncol(mat)), degree = 1, norm = FALSE) ## End(Not run)
Returns the numerical partial derivative of y with respect to [wrt] any regressor for a point of interest. Finite difference method is used with NNS.reg estimates as f(x + h) and f(x - h) values.
dy.d_(x, y, wrt, eval.points = "obs", mixed = FALSE, messages = TRUE)dy.d_(x, y, wrt, eval.points = "obs", mixed = FALSE, messages = TRUE)
x |
a numeric matrix or data frame. |
y |
a numeric vector with compatible dimensions to |
wrt |
integer; Selects the regressor to differentiate with respect to (vectorized). |
eval.points |
numeric or options: ("obs", "apd", "mean", "median", "last"); Regressor points to be evaluated.
|
mixed |
logical; |
messages |
logical; |
Returns column-wise matrix of wrt regressors:
dy.d_(...)[, wrt]$First the 1st derivative
dy.d_(...)[, wrt]$Second the 2nd derivative
dy.d_(...)[, wrt]$Mixed the mixed derivative (for two independent variables only).
For binary regressors, it is suggested to use eval.points = seq(0, 1, .05) for a better resolution around the midpoint.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Vinod, H. and Viole, F. (2020) "Comparing Old and New Partial Derivative Estimates from Nonlinear Nonparametric Regressions" doi:10.2139/ssrn.3681104
## Not run: set.seed(123) ; x_1 <- runif(1000) ; x_2 <- runif(1000) ; y <- x_1 ^ 2 * x_2 ^ 2 B <- cbind(x_1, x_2) ## To find derivatives of y wrt 1st regressor for specific points of both regressors dy.d_(B, y, wrt = 1, eval.points = t(c(.5, 1))) ## To find average partial derivative of y wrt 1st regressor, only supply 1 value in [eval.points], or a vector of [eval.points]: dy.d_(B, y, wrt = 1, eval.points = .5) dy.d_(B, y, wrt = 1, eval.points = fivenum(B[,1])) ## To find average partial derivative of y wrt 1st regressor, for every observation of 1st regressor: apd <- dy.d_(B, y, wrt = 1, eval.points = "apd") plot(B[,1], apd[,1]$First) ## 95% Confidence Interval to test if 0 is within ### Lower CI LPM.VaR(.025, 0, apd[,1]$First) ### Upper CI UPM.VaR(.025, 0, apd[,1]$First) ## End(Not run)## Not run: set.seed(123) ; x_1 <- runif(1000) ; x_2 <- runif(1000) ; y <- x_1 ^ 2 * x_2 ^ 2 B <- cbind(x_1, x_2) ## To find derivatives of y wrt 1st regressor for specific points of both regressors dy.d_(B, y, wrt = 1, eval.points = t(c(.5, 1))) ## To find average partial derivative of y wrt 1st regressor, only supply 1 value in [eval.points], or a vector of [eval.points]: dy.d_(B, y, wrt = 1, eval.points = .5) dy.d_(B, y, wrt = 1, eval.points = fivenum(B[,1])) ## To find average partial derivative of y wrt 1st regressor, for every observation of 1st regressor: apd <- dy.d_(B, y, wrt = 1, eval.points = "apd") plot(B[,1], apd[,1]$First) ## 95% Confidence Interval to test if 0 is within ### Lower CI LPM.VaR(.025, 0, apd[,1]$First) ### Upper CI UPM.VaR(.025, 0, apd[,1]$First) ## End(Not run)
Returns the numerical partial derivative of y wrt x for a point of interest.
dy.dx(x, y, eval.point = NULL)dy.dx(x, y, eval.point = NULL)
x |
a numeric vector. |
y |
a numeric vector. |
eval.point |
numeric or ("overall"); |
Returns a data.table of eval.point along with both 1st and 2nd derivative.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Vinod, H. and Viole, F. (2017) "Nonparametric Regression Using Clusters" doi:10.1007/s10614-017-9713-5
## Not run: x <- seq(0, 2 * pi, pi / 100) ; y <- sin(x) dy.dx(x, y, eval.point = 1.75) # First derivative dy.dx(x, y, eval.point = 1.75)[ , first.derivative] # Second derivative dy.dx(x, y, eval.point = 1.75)[ , second.derivative] # Vector of derivatives dy.dx(x, y, eval.point = c(1.75, 2.5)) ## End(Not run)## Not run: x <- seq(0, 2 * pi, pi / 100) ; y <- sin(x) dy.dx(x, y, eval.point = 1.75) # First derivative dy.dx(x, y, eval.point = 1.75)[ , first.derivative] # Second derivative dy.dx(x, y, eval.point = 1.75)[ , second.derivative] # Vector of derivatives dy.dx(x, y, eval.point = c(1.75, 2.5)) ## End(Not run)
This function generates a univariate lower partial moment for any degree or target.
LPM(degree, target, variable, excess_ret = FALSE)LPM(degree, target, variable, excess_ret = FALSE)
degree |
numeric; |
target |
numeric; Set to |
variable |
a numeric vector. data.frame or list type objects are not permissible. |
excess_ret |
logical; |
LPM of variable
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
set.seed(123) x <- rnorm(100) LPM(0, mean(x), x)set.seed(123) x <- rnorm(100) LPM(0, mean(x), x)
This function generates a standardized univariate lower partial moment of any non‑negative degree for a given target.
LPM.ratio(degree, target, variable)LPM.ratio(degree, target, variable)
degree |
numeric; degree = 0 gives frequency (CDF), degree = 1 gives area. |
target |
numeric vector; threshold(s). Defaults to mean(variable). |
variable |
numeric vector or data‑frame column to evaluate. |
Numeric vector of standardized lower partial moments.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
Viole, F. (2017) Continuous CDFs and ANOVA with NNS. doi:10.2139/ssrn.3007373
set.seed(123) x <- rnorm(100) LPM.ratio(0, mean(x), x) ## Not run: plot(sort(x), LPM.ratio(0, sort(x), x)) plot(sort(x), LPM.ratio(1, sort(x), x)) ## End(Not run)set.seed(123) x <- rnorm(100) LPM.ratio(0, mean(x), x) ## Not run: plot(sort(x), LPM.ratio(0, sort(x), x)) plot(sort(x), LPM.ratio(1, sort(x), x)) ## End(Not run)
Generates a value at risk (VaR) quantile based on the Lower Partial Moment ratio.
LPM.VaR(percentile, degree, x)LPM.VaR(percentile, degree, x)
percentile |
numeric [0, 1]; The percentile for left-tail VaR (vectorized). |
degree |
integer; |
x |
a numeric vector. |
Returns a numeric value representing the point at which "percentile" of the area of x is below.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) ## For 5th percentile, left-tail LPM.VaR(0.05, 0, x) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ## For 5th percentile, left-tail LPM.VaR(0.05, 0, x) ## End(Not run)
Missing values (NA, Inf, NaN) are added at the end of the vector as the last bin returned if missinglast is set to TRUE
NNS_bin(x, width, origin = 0, missinglast = FALSE)NNS_bin(x, width, origin = 0, missinglast = FALSE)
x |
A matrix of regressor variables. Must have the same number of rows as the length of y. |
width |
The width of the bins |
origin |
The starting point for the bins. Any number smaller than origin will be disregarded |
missinglast |
Boolean. Should the missing observations be added as a separate element at the end of the returned count vector. |
An list with elements counts (the frequencies), origin (the origin), width (the width), missing (the number of missings), and last_bin_is_missing (boolean) telling whether the missinglast is true or not.
## Not run: set.seed(1) x <- sample(10, 20, replace = TRUE) NNS_bin(x, 15) ## End(Not run)## Not run: set.seed(1) x <- sample(10, 20, replace = TRUE) NNS_bin(x, 15) ## End(Not run)
Performs a distribution-free ANOVA using partial-moment statistics to assess
differences between control and treatment groups. Depending on the setting of
means.only, the procedure tests either differences in central tendency
(means or medians) or differences across the full empirical distributions.
NNS.ANOVA( control, treatment, means.only = FALSE, medians = FALSE, confidence.interval = 0.95, tails = "Both", pairwise = FALSE, plot = TRUE, robust = FALSE )NNS.ANOVA( control, treatment, means.only = FALSE, medians = FALSE, confidence.interval = 0.95, tails = "Both", pairwise = FALSE, plot = TRUE, robust = FALSE )
control |
Numeric vector of control group observations |
treatment |
Numeric vector of treatment group observations |
means.only |
Logical; |
medians |
Logical; |
confidence.interval |
Numeric [0,1]; confidence level for effect size bounds (e.g., 0.95) |
tails |
Character; specifies CI tail(s): "both", "left", or "right" |
pairwise |
logical; |
plot |
Logical; |
robust |
logical; |
The key output is the Certainty metric, a calibrated probability in
representing the likelihood that the groups being compared are
the *same* with respect to the chosen comparison mode:
If means.only = TRUE: Certainty is the probability that
the group means (or medians, if medians = TRUE) are the same.
If means.only = FALSE: Certainty is the probability that
the two entire distributions are the same.
This makes Certainty the conceptual inverse of a classical p-value.
A *low* Certainty (e.g., < 0.10) indicates strong evidence of difference,
while a *high* Certainty (e.g., > 0.90) indicates strong evidence of similarity.
Returns a list containing:
Control_Statistic: Mean/median of control group
Treatment_Statistic: Mean/median of treatment group
Grand_Statistic: Grand mean/median
Control_CDF: CDF value at grand statistic (control)
Treatment_CDF: CDF value at grand statistic (treatment)
Certainty: Probability that the groups are the same
(means-only or full distribution depending on means.only).
Effect_Size_LB: Lower bound of treatment effect (if confidence.interval requested)
Effect_Size_UB: Upper bound of treatment effect (if confidence.interval requested)
Confidence_Level: Confidence level used (if confidence.interval requested)
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Viole, F. (2017) "Continuous CDFs and ANOVA with NNS" doi:10.2139/ssrn.3007373
## Not run: ### Binary analysis and effect size set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.ANOVA(control = x, treatment = y) ### Two variable analysis with no control variable A <- cbind(x, y) NNS.ANOVA(A) ### Medians test NNS.ANOVA(A, means.only = TRUE, medians = TRUE) ### Multiple variable analysis with no control variable set.seed(123) x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) A <- cbind(x, y, z) NNS.ANOVA(A) ### Different length vectors used in a list x <- rnorm(30) ; y <- rnorm(40) ; z <- rnorm(50) A <- list(x, y, z) NNS.ANOVA(A) ## End(Not run)## Not run: ### Binary analysis and effect size set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.ANOVA(control = x, treatment = y) ### Two variable analysis with no control variable A <- cbind(x, y) NNS.ANOVA(A) ### Medians test NNS.ANOVA(A, means.only = TRUE, medians = TRUE) ### Multiple variable analysis with no control variable set.seed(123) x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) A <- cbind(x, y, z) NNS.ANOVA(A) ### Different length vectors used in a list x <- rnorm(30) ; y <- rnorm(40) ; z <- rnorm(50) A <- list(x, y, z) NNS.ANOVA(A) ## End(Not run)
Autoregressive model incorporating nonlinear regressions of component series.
NNS.ARMA( variable, h = 1, training.set = NULL, seasonal.factor = TRUE, weights = NULL, best.periods = 1, modulo = NULL, mod.only = TRUE, negative.values = FALSE, method = "nonlin", dynamic = FALSE, shrink = FALSE, plot = TRUE, seasonal.plot = TRUE, pred.int = NULL )NNS.ARMA( variable, h = 1, training.set = NULL, seasonal.factor = TRUE, weights = NULL, best.periods = 1, modulo = NULL, mod.only = TRUE, negative.values = FALSE, method = "nonlin", dynamic = FALSE, shrink = FALSE, plot = TRUE, seasonal.plot = TRUE, pred.int = NULL )
variable |
a numeric vector. |
h |
integer; 1 (default) Number of periods to forecast. |
training.set |
numeric;
|
seasonal.factor |
logical or integer(s); |
weights |
numeric or |
best.periods |
integer; [2] (default) used in conjunction with |
modulo |
integer(s); NULL (default) Used to find the nearest multiple(s) in the reported seasonal period. |
mod.only |
logical; |
negative.values |
logical; |
method |
options: ("lin", "nonlin", "both", "means"); |
dynamic |
logical; |
shrink |
logical; |
plot |
logical; |
seasonal.plot |
logical; |
pred.int |
numeric [0, 1]; |
Returns a vector of forecasts of length (h) if no pred.int specified. Else, returns a data.table with the forecasts as well as lower and upper prediction intervals per forecast point.
For monthly data series, increased accuracy may be realized from forcing seasonal factors to multiples of 12. For example, if the best periods reported are: {37, 47, 71, 73} use
(seasonal.factor = c(36, 48, 72)).
(seasonal.factor = FALSE) can be a very computationally expensive exercise due to the number of seasonal periods detected.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Viole, F. (2019) "Forecasting Using NNS" doi:10.2139/ssrn.3382300
## Nonlinear NNS.ARMA using AirPassengers monthly data and 12 period lag ## Not run: NNS.ARMA(AirPassengers, h = 45, training.set = 100, seasonal.factor = 12, method = "nonlin") ## Linear NNS.ARMA using AirPassengers monthly data and 12, 24, and 36 period lags NNS.ARMA(AirPassengers, h = 45, training.set = 120, seasonal.factor = c(12, 24, 36), method = "lin") ## Nonlinear NNS.ARMA using AirPassengers monthly data and 2 best periods lag NNS.ARMA(AirPassengers, h = 45, training.set = 120, seasonal.factor = FALSE, best.periods = 2) ## End(Not run)## Nonlinear NNS.ARMA using AirPassengers monthly data and 12 period lag ## Not run: NNS.ARMA(AirPassengers, h = 45, training.set = 100, seasonal.factor = 12, method = "nonlin") ## Linear NNS.ARMA using AirPassengers monthly data and 12, 24, and 36 period lags NNS.ARMA(AirPassengers, h = 45, training.set = 120, seasonal.factor = c(12, 24, 36), method = "lin") ## Nonlinear NNS.ARMA using AirPassengers monthly data and 2 best periods lag NNS.ARMA(AirPassengers, h = 45, training.set = 120, seasonal.factor = FALSE, best.periods = 2) ## End(Not run)
Wrapper function for optimizing any combination of a given seasonal.factor vector in NNS.ARMA. Minimum sum of squared errors (forecast-actual) is used to determine optimum across all NNS.ARMA methods.
NNS.ARMA.optim( variable, h = NULL, training.set = NULL, seasonal.factor, lin.only = FALSE, negative.values = FALSE, obj.fn = expression(mean((predicted - actual)^2)/(NNS::Co.LPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)) + NNS::Co.UPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)))), objective = "min", linear.approximation = TRUE, ncores = NULL, pred.int = 0.95, print.trace = TRUE, plot = FALSE )NNS.ARMA.optim( variable, h = NULL, training.set = NULL, seasonal.factor, lin.only = FALSE, negative.values = FALSE, obj.fn = expression(mean((predicted - actual)^2)/(NNS::Co.LPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)) + NNS::Co.UPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)))), objective = "min", linear.approximation = TRUE, ncores = NULL, pred.int = 0.95, print.trace = TRUE, plot = FALSE )
variable |
a numeric vector. |
h |
integer; |
training.set |
integer; |
seasonal.factor |
integers; Multiple frequency integers considered for NNS.ARMA model, i.e. |
lin.only |
logical; |
negative.values |
logical; |
obj.fn |
expression;
|
objective |
options: ("min", "max") |
linear.approximation |
logical; |
ncores |
integer; value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1. |
pred.int |
numeric [0, 1]; 0.95 (default) Returns the associated prediction intervals for the final estimate. Constructed using the maximum entropy bootstrap NNS.meboot on the final estimates. |
print.trace |
logical; |
plot |
logical; |
Returns a list containing:
$period a vector of optimal seasonal periods
$weights the optimal weights of each seasonal period between an equal weight or NULL weighting
$obj.fn the objective function value
$method the method identifying which NNS.ARMA method was used.
$shrink whether to use the shrink parameter in NNS.ARMA.
$nns.regress whether to smooth the variable via NNS.reg before forecasting.
$bias.shift a numerical result of the overall bias of the optimum objective function result. To be added to the final result when using the NNS.ARMA with the derived parameters.
$errors a vector of model errors from internal calibration.
$results a vector of length h.
$lower.pred.int a vector of lower prediction intervals per forecast point.
$upper.pred.int a vector of upper prediction intervals per forecast point.
Typically, (training.set = 0.8 * length(variable)) is used for optimization. Smaller samples could use (training.set = 0.9 * length(variable)) (or larger) in order to preserve information.
The number of combinations will grow prohibitively large, they should be kept as small as possible. seasonal.factor containing an element too large will result in an error. Please reduce the maximum seasonal.factor.
Set (ncores = 1) if routine is used within a parallel architecture.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Nonlinear NNS.ARMA period optimization using 2 yearly lags on AirPassengers monthly data ## Not run: nns.optims <- NNS.ARMA.optim(AirPassengers[1:132], training.set = 120, seasonal.factor = seq(12, 24, 6)) ## To predict out of sample using best parameters: NNS.ARMA.optim(AirPassengers[1:132], h = 12, seasonal.factor = seq(12, 24, 6)) ## Incorporate any objective function from external packages (such as \code{Metrics::mape}) NNS.ARMA.optim(AirPassengers[1:132], h = 12, seasonal.factor = seq(12, 24, 6), obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min") ## End(Not run)## Nonlinear NNS.ARMA period optimization using 2 yearly lags on AirPassengers monthly data ## Not run: nns.optims <- NNS.ARMA.optim(AirPassengers[1:132], training.set = 120, seasonal.factor = seq(12, 24, 6)) ## To predict out of sample using best parameters: NNS.ARMA.optim(AirPassengers[1:132], h = 12, seasonal.factor = seq(12, 24, 6)) ## Incorporate any objective function from external packages (such as \code{Metrics::mape}) NNS.ARMA.optim(AirPassengers[1:132], h = 12, seasonal.factor = seq(12, 24, 6), obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min") ## End(Not run)
Ensemble method for classification using the NNS multivariate regression NNS.reg as the base learner instead of trees.
NNS.boost( IVs.train, DV.train, IVs.test = NULL, type = NULL, depth = NULL, learner.trials = 100, epochs = NULL, CV.size = NULL, balance = FALSE, ts.test = NULL, threshold = NULL, obj.fn = expression(sum((predicted - actual)^2)), objective = "min", extreme = FALSE, features.only = FALSE, feature.importance = TRUE, pred.int = NULL, status = TRUE )NNS.boost( IVs.train, DV.train, IVs.test = NULL, type = NULL, depth = NULL, learner.trials = 100, epochs = NULL, CV.size = NULL, balance = FALSE, ts.test = NULL, threshold = NULL, obj.fn = expression(sum((predicted - actual)^2)), objective = "min", extreme = FALSE, features.only = FALSE, feature.importance = TRUE, pred.int = NULL, status = TRUE )
IVs.train |
a matrix or data frame of variables of numeric or factor data types. |
DV.train |
a numeric or factor vector with compatible dimensions to |
IVs.test |
a matrix or data frame of variables of numeric or factor data types with compatible dimensions to |
type |
|
depth |
options: (integer, NULL, "max"); |
learner.trials |
integer; 100 (default) Sets the number of trials to obtain an accuracy |
epochs |
integer; |
CV.size |
numeric [0, 1]; |
balance |
logical; |
ts.test |
integer; NULL (default) Sets the length of the test set for time-series data; typically |
threshold |
numeric; |
obj.fn |
expression;
|
objective |
options: ("min", "max") |
extreme |
logical; |
features.only |
logical; |
feature.importance |
logical; |
pred.int |
numeric [0,1]; |
status |
logical; |
Returns a vector of fitted values for the dependent variable test set $results, prediction intervals $pred.int, and the final feature loadings $feature.weights, along with final feature frequencies $feature.frequency.
Like a logistic regression, the (type = "CLASS") setting is not necessary for target variable of two classes e.g. [0, 1]. The response variable base category should be 1 for classification problems.
Incorporate any objective function from external packages (such as Metrics::mape) via NNS.boost(..., obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min")
Fred Viole, OVVO Financial Systems
Viole, F. (2016) "Classification Using NNS Clustering Analysis" doi:10.2139/ssrn.2864711
## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150. ## Not run: a <- NNS.boost(iris[1:140, 1:4], iris[1:140, 5], IVs.test = iris[141:150, 1:4], epochs = 100, learner.trials = 100, type = "CLASS", depth = NULL, balance = TRUE) ## Test accuracy mean(a$results == as.numeric(iris[141:150, 5])) ## End(Not run)## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150. ## Not run: a <- NNS.boost(iris[1:140, 1:4], iris[1:140, 5], IVs.test = iris[141:150, 1:4], epochs = 100, learner.trials = 100, type = "CLASS", depth = NULL, balance = TRUE) ## Test accuracy mean(a$results == as.numeric(iris[141:150, 5])) ## End(Not run)
Returns the causality from observational data between two variables.
NNS.caus( x, y = NULL, factor.2.dummy = FALSE, tau = 0, plot = FALSE, p.value = FALSE, nperm = 100L, permute = c("y", "x", "both"), seed = NULL, conf.int = 0.95 )NNS.caus( x, y = NULL, factor.2.dummy = FALSE, tau = 0, plot = FALSE, p.value = FALSE, nperm = 100L, permute = c("y", "x", "both"), seed = NULL, conf.int = 0.95 )
x |
a numeric vector, matrix or data frame. |
y |
|
factor.2.dummy |
logical; |
tau |
options: ("cs", "ts", integer); 0 (default) Number of lagged observations to consider (for time series data). Otherwise, set |
plot |
logical; |
p.value |
logical; |
nperm |
integer; number of permutations to use when |
permute |
one of "both", "y", or "x"; which variable(s) to shuffle when constructing the null distribution. |
seed |
optional integer seed for reproducibility of the permutation test. |
conf.int |
numeric; 0.95 (default) confidence level for the partial-moment based interval computed on the permutation null distribution. |
If p.value=FALSE returns the original causation vector of length 3 (directional given/received and net), named either "C(x—>y)" or "C(y—>x)" in the third slot. If p.value=TRUE returns a list with components:
* causation: the original causation vector as above.
* p.value: a list with empirical two-sided and one-sided p-values (x_causes_y, y_causes_x), the null distribution, the observed signed statistic, and metadata (permute, nperm).
If p.value=TRUE for a matrix, the function returns a list with components:
* causality: the causality matrix.
* lower_CI: matrix of lower confidence bounds (partial-moment based).
* upper_CI: matrix of upper confidence bounds (partial-moment based).
* p.value: matrix of empirical two-sided p-values.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: ## x causes y... set.seed(123) x <- rnorm(1000) ; y <- x ^ 2 NNS.caus(x, y, tau = "cs") ## Causal matrix without per factor causation NNS.caus(iris, tau = 0) ## Causal matrix with per factor causation NNS.caus(iris, factor.2.dummy = TRUE, tau = 0) ## End(Not run)## Not run: ## x causes y... set.seed(123) x <- rnorm(1000) ; y <- x ^ 2 NNS.caus(x, y, tau = "cs") ## Causal matrix without per factor causation NNS.caus(iris, tau = 0) ## Causal matrix with per factor causation NNS.caus(iris, factor.2.dummy = TRUE, tau = 0) ## End(Not run)
This function generates an empirical CDF using partial moment ratios LPM.ratio, and resulting survival, hazard and cumulative hazard functions.
NNS.CDF(variable, degree = 0, target = NULL, type = "CDF", plot = TRUE)NNS.CDF(variable, degree = 0, target = NULL, type = "CDF", plot = TRUE)
variable |
a numeric vector or data.frame of >= 2 variables for joint CDF. |
degree |
numeric; |
target |
numeric; |
type |
options("CDF", "survival", "hazard", "cumulative hazard"); |
plot |
logical; plots CDF. |
Returns:
"Function" a data.table containing the observations and resulting CDF of the variable.
"target.value" value from the target argument.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Viole, F. (2017) "Continuous CDFs and ANOVA with NNS" doi:10.2139/ssrn.3007373
## Not run: set.seed(123) x <- rnorm(100) NNS.CDF(x) ## Empirical CDF (degree = 0) NNS.CDF(x) ## Continuous CDF (degree = 1) NNS.CDF(x, 1) ## Joint CDF x <- rnorm(5000) ; y <- rnorm(5000) A <- cbind(x,y) NNS.CDF(A, 0) ## Joint CDF with target NNS.CDF(A, 0, target = rep(0, ncol(A))) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) NNS.CDF(x) ## Empirical CDF (degree = 0) NNS.CDF(x) ## Continuous CDF (degree = 1) NNS.CDF(x, 1) ## Joint CDF x <- rnorm(5000) ; y <- rnorm(5000) A <- cbind(x,y) NNS.CDF(A, 0) ## Joint CDF with target NNS.CDF(A, 0, target = rep(0, ncol(A))) ## End(Not run)
Determines higher dimension dependence coefficients based on co-partial moment matrices ratios.
NNS.copula( X, target = NULL, continuous = TRUE, plot = FALSE, independence.overlay = FALSE )NNS.copula( X, target = NULL, continuous = TRUE, plot = FALSE, independence.overlay = FALSE )
X |
a numeric matrix or data frame. |
target |
numeric; Typically the mean of Variable X for classical statistics equivalences, but does not have to be. (Vectorized) |
continuous |
logical; |
plot |
logical; |
independence.overlay |
logical; |
Returns a multivariate dependence value [0,1].
Fred Viole, OVVO Financial Systems
Viole, F. (2016) "Beyond Correlation: Using the Elements of Variance for Conditional Means and Probabilities" doi:10.2139/ssrn.2745308.
## Not run: set.seed(123) x <- rnorm(1000) ; y <- rnorm(1000) ; z <- rnorm(1000) A <- data.frame(x, y, z) NNS.copula(A, target = colMeans(A), plot = TRUE, independence.overlay = TRUE) ### Target 0 NNS.copula(A, target = rep(0, ncol(A)), plot = TRUE, independence.overlay = TRUE) ## End(Not run)## Not run: set.seed(123) x <- rnorm(1000) ; y <- rnorm(1000) ; z <- rnorm(1000) A <- data.frame(x, y, z) NNS.copula(A, target = colMeans(A), plot = TRUE, independence.overlay = TRUE) ### Target 0 NNS.copula(A, target = rep(0, ncol(A)), plot = TRUE, independence.overlay = TRUE) ## End(Not run)
Returns the dependence and nonlinear correlation between two variables based on higher order partial moment matrices measured by frequency or area.
NNS.dep(x, y = NULL, asym = FALSE, p.value = FALSE, print.map = FALSE)NNS.dep(x, y = NULL, asym = FALSE, p.value = FALSE, print.map = FALSE)
x |
a numeric vector, matrix or data frame. |
y |
|
asym |
logical; |
p.value |
logical; |
print.map |
logical; |
Returns the bi-variate "Correlation" and "Dependence" or correlation / dependence matrix for matrix input.
For asymmetrical (asym = TRUE) matrices, directional dependence is returned as ([column variable] —> [row variable]).
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.dep(x, y) ## Correlation / Dependence Matrix x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) B <- cbind(x, y, z) NNS.dep(B) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.dep(x, y) ## Correlation / Dependence Matrix x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) B <- cbind(x, y, z) NNS.dep(B) ## End(Not run)
Determines numerical derivative of a given univariate function using projected secant lines on the y-axis. These projected points infer finite steps h, in the finite step method.
NNS.diff( f, point, h = abs(point) * 0.1 + 0.01, tol = 1e-10, max.iter = NULL, digits = 12, print.trace = FALSE, plot = FALSE )NNS.diff( f, point, h = abs(point) * 0.1 + 0.01, tol = 1e-10, max.iter = NULL, digits = 12, print.trace = FALSE, plot = FALSE )
f |
an expression or call or a formula with no lhs. |
point |
numeric; Point to be evaluated for derivative of a given function |
h |
numeric [0, ...]; Initial step for secant projection. Defaults to |
tol |
numeric; Sets the tolerance for the stopping condition of the inferred |
max.iter |
integer; |
digits |
numeric; Sets the number of digits specification of the output. Defaults to |
print.trace |
logical; |
plot |
logical; plots range, secant lines and y-intercept convergence. |
Returns a matrix of values, intercepts, derivatives, inferred step sizes for multiple methods of estimation.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: f <- function(x) sin(x) / x NNS.diff(f, 4.1) ## Noisy function with explicit iteration cap f_noisy <- function(x) sin(x) + rnorm(1, 0, 0.001) NNS.diff(f_noisy, 1.0, max.iter = 100) ## End(Not run)## Not run: f <- function(x) sin(x) / x NNS.diff(f, 4.1) ## Noisy function with explicit iteration cap f_noisy <- function(x) sin(x) + rnorm(1, 0, 0.001) NNS.diff(f_noisy, 1.0, max.iter = 100) ## End(Not run)
Internal kernel function for NNS multivariate regression NNS.reg parallel instances.
NNS.distance(rpm, dist.estimate, k = "all", class = NULL)NNS.distance(rpm, dist.estimate, k = "all", class = NULL)
rpm |
REGRESSION.POINT.MATRIX from NNS.reg |
dist.estimate |
Vector to generate distances from. |
k |
|
class |
if classification problem. |
Returns sum of weighted distances.
Bi-directional test of first degree stochastic dominance using lower partial moments.
NNS.FSD(x, y, type = "discrete", plot = TRUE)NNS.FSD(x, y, type = "discrete", plot = TRUE)
x |
a numeric vector. |
y |
a numeric vector. |
type |
options: ("discrete", "continuous"); |
plot |
logical; |
Returns one of the following FSD results: "X FSD Y", "Y FSD X", or "NO FSD EXISTS".
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
Viole, F. (2017) "A Note on Stochastic Dominance." doi:10.2139/ssrn.3002675.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.FSD(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.FSD(x, y) ## End(Not run)
Uni-directional test of first degree stochastic dominance using lower partial moments used in SD Efficient Set routine.
NNS.FSD.uni(x, y, type = "discrete")NNS.FSD.uni(x, y, type = "discrete")
x |
a numeric vector. |
y |
a numeric vector. |
type |
options: ("discrete", "continuous"); |
Returns (1) if "X FSD Y", else (0).
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012
Viole, F. (2017) "A Note on Stochastic Dominance." doi:10.2139/ssrn.3002675
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.FSD.uni(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.FSD.uni(x, y) ## End(Not run)
Alternative central tendency measure more robust to outliers.
NNS.gravity(x, discrete = FALSE)NNS.gravity(x, discrete = FALSE)
x |
vector of data. |
discrete |
logical; |
Returns a numeric value representing the central tendency of the distribution.
Fred Viole, OVVO Financial Systems
## Not run: set.seed(123) x <- rnorm(100) NNS.gravity(x) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) NNS.gravity(x) ## End(Not run)
Monte Carlo sampling from the maximum entropy bootstrap routine NNS.meboot, ensuring the replicates are sampled from the full [-1,1] correlation space.
NNS.MC( x, reps = 30, lower_rho = -1, upper_rho = 1, by = 0.01, exp = 1, type = "spearman", drift = TRUE, target_drift = NULL, target_drift_scale = NULL, xmin = NULL, xmax = NULL, ... )NNS.MC( x, reps = 30, lower_rho = -1, upper_rho = 1, by = 0.01, exp = 1, type = "spearman", drift = TRUE, target_drift = NULL, target_drift_scale = NULL, xmin = NULL, xmax = NULL, ... )
x |
vector of data. |
reps |
numeric; number of replicates to generate, |
lower_rho |
numeric |
upper_rho |
numeric |
by |
numeric; |
exp |
numeric; |
type |
options("spearman", "pearson", "NNScor", "NNSdep"); |
drift |
logical; |
target_drift |
numerical; |
target_drift_scale |
numerical; instead of calculating a |
xmin |
numeric; the lower limit for the left tail. |
xmax |
numeric; the upper limit for the right tail. |
... |
possible additional arguments to be passed to NNS.meboot. |
ensemble average observation over all replicates as a vector.
replicates maximum entropy bootstrap replicates as a list for each rho.
Vinod, H.D. and Viole, F. (2020) Arbitrary Spearman's Rank Correlations in Maximum Entropy Bootstrap and Improved Monte Carlo Simulations. doi:10.2139/ssrn.3621614
## Not run: # To generate a set of MC sampled time-series to AirPassengers MC_samples <- NNS.MC(AirPassengers, reps = 10, lower_rho = -1, upper_rho = 1, by = .5, xmin = 0) ## End(Not run)## Not run: # To generate a set of MC sampled time-series to AirPassengers MC_samples <- NNS.MC(AirPassengers, reps = 10, lower_rho = -1, upper_rho = 1, by = .5, xmin = 0) ## End(Not run)
Adapted maximum entropy bootstrap routine from meboot https://cran.r-project.org/package=meboot.
NNS.meboot( x, reps = 999, rho = NULL, type = "spearman", drift = TRUE, target_drift = NULL, target_drift_scale = NULL, trim = 0.1, xmin = NULL, xmax = NULL, reachbnd = TRUE, expand.sd = TRUE, force.clt = TRUE, scl.adjustment = FALSE, sym = FALSE, elaps = FALSE, digits = 6, colsubj, coldata, coltimes, ... )NNS.meboot( x, reps = 999, rho = NULL, type = "spearman", drift = TRUE, target_drift = NULL, target_drift_scale = NULL, trim = 0.1, xmin = NULL, xmax = NULL, reachbnd = TRUE, expand.sd = TRUE, force.clt = TRUE, scl.adjustment = FALSE, sym = FALSE, elaps = FALSE, digits = 6, colsubj, coldata, coltimes, ... )
x |
vector of data. |
reps |
numeric; number of replicates to generate. |
rho |
numeric [-1,1] (vectorized); A |
type |
options("spearman", "pearson", "NNScor", "NNSdep"); |
drift |
logical; |
target_drift |
numerical; |
target_drift_scale |
numerical; instead of calculating a |
trim |
numeric [0,1]; The mean trimming proportion, defaults to |
xmin |
numeric; the lower limit for the left tail. |
xmax |
numeric; the upper limit for the right tail. |
reachbnd |
logical; If |
expand.sd |
logical; If |
force.clt |
logical; If |
scl.adjustment |
logical; If |
sym |
logical; If |
elaps |
logical; If |
digits |
integer; 6 (default) number of digits to round output to. |
colsubj |
numeric; the column in |
coldata |
numeric; the column in |
coltimes |
numeric; an optional argument indicating the column that contains the times at which the observations for each individual are observed. It is ignored if the input data |
... |
possible argument |
Returns the following row names in a matrix:
x original data provided as input.
replicates maximum entropy bootstrap replicates.
ensemble average observation over all replicates.
xx sorted order stats (xx[1] is minimum value).
z class intervals limits.
dv deviations of consecutive data values.
dvtrim trimmed mean of dv.
xmin data minimum for ensemble=xx[1]-dvtrim.
xmax data x maximum for ensemble=xx[n]+dvtrim.
desintxb desired interval means.
ordxx ordered x values.
kappa scale adjustment to the variance of ME density.
elaps elapsed time.
Vectorized rho and drift parameters will not vectorize both simultaneously. Also, do not specify target_drift = NULL.
Vinod, H.D. and Viole, F. (2020) Arbitrary Spearman's Rank Correlations in Maximum Entropy Bootstrap and Improved Monte Carlo Simulations. doi:10.2139/ssrn.3621614
Vinod, H.D. (2013), Maximum Entropy Bootstrap Algorithm Enhancements. doi:10.2139/ssrn.2285041
Vinod, H.D. (2006), Maximum Entropy Ensembles for Time Series Inference in Economics, Journal of Asian Economics, 17(6), pp. 955-978.
Vinod, H.D. (2004), Ranking mutual funds using unconventional utility theory and stochastic dominance, Journal of Empirical Finance, 11(3), pp. 353-377.
## Not run: # To generate an orthogonal rank correlated time-series to AirPassengers boots <- NNS.meboot(AirPassengers, reps = 100, rho = 0, xmin = 0) # Verify correlation of replicates ensemble to original cor(boots["ensemble",]$ensemble, AirPassengers, method = "spearman") # Plot all replicates matplot(boots["replicates",]$replicates , type = 'l') # Plot ensemble lines(boots["ensemble",]$ensemble, lwd = 3) # Plot original lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized drift with a single rho boots <- NNS.meboot(AirPassengers, reps = 10, rho = 0, xmin = 0, target_drift = c(1,7)) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized rho with a single target drift boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, .5, 1), xmin = 0, target_drift = 3) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized rho with a single target drift scale boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, .5, 1), xmin = 0, target_drift_scale = 0.5) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ## End(Not run)## Not run: # To generate an orthogonal rank correlated time-series to AirPassengers boots <- NNS.meboot(AirPassengers, reps = 100, rho = 0, xmin = 0) # Verify correlation of replicates ensemble to original cor(boots["ensemble",]$ensemble, AirPassengers, method = "spearman") # Plot all replicates matplot(boots["replicates",]$replicates , type = 'l') # Plot ensemble lines(boots["ensemble",]$ensemble, lwd = 3) # Plot original lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized drift with a single rho boots <- NNS.meboot(AirPassengers, reps = 10, rho = 0, xmin = 0, target_drift = c(1,7)) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized rho with a single target drift boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, .5, 1), xmin = 0, target_drift = 3) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ### Vectorized rho with a single target drift scale boots <- NNS.meboot(AirPassengers, reps = 10, rho = c(0, .5, 1), xmin = 0, target_drift_scale = 0.5) matplot(do.call(cbind, boots["replicates", ]), type = "l") lines(1:length(AirPassengers), AirPassengers, lwd = 3, col = "red") ## End(Not run)
Mode of a distribution, either continuous or discrete.
NNS.mode(x, discrete = FALSE, multi = TRUE)NNS.mode(x, discrete = FALSE, multi = TRUE)
x |
vector of data. |
discrete |
logical; |
multi |
logical; |
Returns a numeric value representing the mode of the distribution.
Fred Viole, OVVO Financial Systems
## Not run: set.seed(123) x <- rnorm(100) NNS.mode(x) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) NNS.mode(x) ## End(Not run)
This function returns the first 4 moments of the distribution.
NNS.moments(x, population = TRUE)NNS.moments(x, population = TRUE)
x |
a numeric vector. |
population |
logical; |
Returns:
"$mean" mean of the distribution.
"$variance" variance of the distribution.
"$skewness" skewness of the distribution.
"$kurtosis" excess kurtosis of the distribution.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) NNS.moments(x) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) NNS.moments(x) ## End(Not run)
Normalizes a matrix of variables based on nonlinear scaling normalization method.
NNS.norm(X, linear = FALSE, chart.type = NULL, location = "topleft")NNS.norm(X, linear = FALSE, chart.type = NULL, location = "topleft")
X |
a numeric matrix or data frame, or a list. |
linear |
logical; |
chart.type |
options: ("l", "b"); |
location |
Sets the legend location within the plot, per the |
Returns a data.frame of normalized values.
Unequal vectors provided in a list will only generate linear=TRUE normalized values.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) A <- cbind(x, y) NNS.norm(A) ### Normalize list of unequal vector lengths vec1 <- c(1, 2, 3, 4, 5, 6, 7) vec2 <- c(10, 20, 30, 40, 50, 60) vec3 <- c(0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3) vec_list <- list(vec1, vec2, vec3) NNS.norm(vec_list) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) A <- cbind(x, y) NNS.norm(A) ### Normalize list of unequal vector lengths vec1 <- c(1, 2, 3, 4, 5, 6, 7) vec2 <- c(10, 20, 30, 40, 50, 60) vec3 <- c(0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3) vec_list <- list(vec1, vec2, vec3) NNS.norm(vec_list) ## End(Not run)
Creates partitions based on partial moment quadrant centroids, iteratively assigning identifications to observations based on those quadrants (unsupervised partitional and hierarchical clustering method). Basis for correlation, dependence NNS.dep, regression NNS.reg routines.
NNS.part( x, y, Voronoi = FALSE, type = NULL, order = NULL, obs.req = 8, min.obs.stop = TRUE, noise.reduction = "off" )NNS.part( x, y, Voronoi = FALSE, type = NULL, order = NULL, obs.req = 8, min.obs.stop = TRUE, noise.reduction = "off" )
x |
a numeric vector. |
y |
a numeric vector with compatible dimensions to |
Voronoi |
logical; |
type |
|
order |
integer; Number of partial moment quadrants to be generated. |
obs.req |
integer; (8 default) Required observations per cluster where quadrants will not be further partitioned if observations are not greater than the entered value. Reduces minimum number of necessary observations in a quadrant to 1 when |
min.obs.stop |
logical; |
noise.reduction |
the method of determining regression points options for the dependent variable |
Returns:
"dt" a data.table of x and y observations with their partition assignment "quadrant" in the 3rd column and their prior partition assignment "prior.quadrant" in the 4th column.
"regression.points" the data.table of regression points for that given (order = ...).
"order" the order of the final partition given "min.obs.stop" stopping condition.
min.obs.stop = FALSE will not generate regression points due to unequal partitioning of quadrants from individual cluster observations.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.part(x, y) ## Data.table of observations and partitions NNS.part(x, y, order = 1)$dt ## Regression points NNS.part(x, y, order = 1)$regression.points ## Voronoi style plot NNS.part(x, y, Voronoi = TRUE) ## Examine final counts by quadrant DT <- NNS.part(x, y)$dt DT[ , counts := .N, by = quadrant] DT ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.part(x, y) ## Data.table of observations and partitions NNS.part(x, y, order = 1)$dt ## Regression points NNS.part(x, y, order = 1)$regression.points ## Voronoi style plot NNS.part(x, y, Voronoi = TRUE) ## Examine final counts by quadrant DT <- NNS.part(x, y)$dt DT[ , counts := .N, by = quadrant] DT ## End(Not run)
Generates a nonlinear regression based on partial moment quadrant means.
NNS.reg( x, y, factor.2.dummy = TRUE, order = NULL, dim.red.method = NULL, tau = NULL, type = NULL, point.est = NULL, location = "top", return.values = TRUE, plot = TRUE, plot.regions = FALSE, residual.plot = TRUE, confidence.interval = NULL, threshold = 0, n.best = NULL, smooth = FALSE, noise.reduction = "off", dist = "L2", ncores = NULL, point.only = FALSE, multivariate.call = FALSE )NNS.reg( x, y, factor.2.dummy = TRUE, order = NULL, dim.red.method = NULL, tau = NULL, type = NULL, point.est = NULL, location = "top", return.values = TRUE, plot = TRUE, plot.regions = FALSE, residual.plot = TRUE, confidence.interval = NULL, threshold = 0, n.best = NULL, smooth = FALSE, noise.reduction = "off", dist = "L2", ncores = NULL, point.only = FALSE, multivariate.call = FALSE )
x |
a vector, matrix or data frame of variables of numeric or factor data types. |
y |
a numeric or factor vector with compatible dimensions to |
factor.2.dummy |
logical; |
order |
integer; Controls the number of partial moment quadrant means. Users are encouraged to try different |
dim.red.method |
options: ("cor", "NNS.dep", "NNS.caus", "all", "equal", |
tau |
options("ts", NULL); |
type |
|
point.est |
a numeric or factor vector with compatible dimensions to |
location |
Sets the legend location within the plot, per the |
return.values |
logical; |
plot |
logical; |
plot.regions |
logical; |
residual.plot |
logical; |
confidence.interval |
numeric [0, 1]; |
threshold |
numeric [0, 1]; |
n.best |
integer; |
smooth |
logical; |
noise.reduction |
the method of determining regression points options: ("mean", "median", "mode", "off"); In low signal:noise situations, |
dist |
options:("L1", "L2", "FACTOR") the method of distance calculation; Selects the distance calculation used. |
ncores |
integer; value specifying the number of cores to be used in the parallelized procedure. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1. |
point.only |
Internal argument for abbreviated output. |
multivariate.call |
Internal argument for multivariate regressions. |
UNIVARIATE REGRESSION RETURNS THE FOLLOWING VALUES:
"R2" provides the goodness of fit;
"SE" returns the overall standard error of the estimate between y and y.hat;
"Prediction.Accuracy" returns the correct rounded "Point.est" used in classifications versus the categorical y;
"derivative" for the coefficient of the x and its applicable range;
"Point.est" for the predicted value generated;
"pred.int" lower and upper prediction intervals for the "Point.est" returned using the "confidence.interval" provided;
"regression.points" provides the points used in the regression equation for the given order of partitions;
"Fitted.xy" returns a data.table of x, y, y.hat, resid, NNS.ID, gradient;
MULTIVARIATE REGRESSION RETURNS THE FOLLOWING VALUES:
"R2" provides the goodness of fit;
"equation" returns the numerator of the synthetic X* dimension reduction equation as a data.table consisting of regressor and its coefficient. Denominator is simply the length of all coefficients > 0, returned in last row of equation data.table.
"x.star" returns the synthetic X* as a vector;
"rhs.partitions" returns the partition points for each regressor x;
"RPM" provides the Regression Point Matrix, the points for each x used in the regression equation for the given order of partitions;
"Point.est" returns the predicted value generated;
"pred.int" lower and upper prediction intervals for the "Point.est" returned using the "confidence.interval" provided;
"Fitted.xy" returns a data.table of x,y, y.hat, gradient, and NNS.ID.
Please ensure point.est is of compatible dimensions to x, error message will ensue if not compatible.
Like a logistic regression, the (type = "CLASS") setting is not necessary for target variable of two classes e.g. [0, 1]. The response variable base category should be 1 for classification problems.
For low signal:noise instances, increasing the dimension may yield better results using NNS.stack(cbind(x,x), y, method = 1, ...).
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Vinod, H. and Viole, F. (2017) "Nonparametric Regression Using Clusters" doi:10.1007/s10614-017-9713-5
Vinod, H. and Viole, F. (2018) "Clustering and Curve Fitting by Line Segments" doi:10.20944/preprints201801.0090.v1
Viole, F. (2020) "Partitional Estimation Using Partial Moments" doi:10.2139/ssrn.3592491
Dana, J., and Dawes, R. M. (2004). The Superiority of Simple Alternatives to Regression for Social Science Predictions. Journal of Educational and Behavioral Statistics, 29(3), 317–331.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.reg(x, y) ## Manual {order} selection NNS.reg(x, y, order = 2) ## Maximum {order} selection NNS.reg(x, y, order = "max") ## x-only paritioning (Univariate only) NNS.reg(x, y, type = "XONLY") ## For Multiple Regression: x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100) NNS.reg(x, y, point.est = c(.25, .5, .75)) ## For Multiple Regression based on Synthetic X* (Dimension Reduction): x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100) NNS.reg(x, y, point.est = c(.25, .5, .75), dim.red.method = "cor", ncores = 1) ## IRIS dataset examples: # Dimension Reduction: NNS.reg(iris[,1:4], iris[,5], dim.red.method = "cor", order = 5, ncores = 1) # Dimension Reduction using causal weights: NNS.reg(iris[,1:4], iris[,5], dim.red.method = "NNS.caus", order = 5, ncores = 1) # Multiple Regression: NNS.reg(iris[,1:4], iris[,5], order = 2, noise.reduction = "off") # Classification: NNS.reg(iris[,1:4], iris[,5], point.est = iris[1:10, 1:4], type = "CLASS")$Point.est ## To call fitted values: x <- rnorm(100) ; y <- rnorm(100) NNS.reg(x, y)$Fitted ## To call partial derivative (univariate regression only): NNS.reg(x, y)$derivative ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.reg(x, y) ## Manual {order} selection NNS.reg(x, y, order = 2) ## Maximum {order} selection NNS.reg(x, y, order = "max") ## x-only paritioning (Univariate only) NNS.reg(x, y, type = "XONLY") ## For Multiple Regression: x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100) NNS.reg(x, y, point.est = c(.25, .5, .75)) ## For Multiple Regression based on Synthetic X* (Dimension Reduction): x <- cbind(rnorm(100), rnorm(100), rnorm(100)) ; y <- rnorm(100) NNS.reg(x, y, point.est = c(.25, .5, .75), dim.red.method = "cor", ncores = 1) ## IRIS dataset examples: # Dimension Reduction: NNS.reg(iris[,1:4], iris[,5], dim.red.method = "cor", order = 5, ncores = 1) # Dimension Reduction using causal weights: NNS.reg(iris[,1:4], iris[,5], dim.red.method = "NNS.caus", order = 5, ncores = 1) # Multiple Regression: NNS.reg(iris[,1:4], iris[,5], order = 2, noise.reduction = "off") # Classification: NNS.reg(iris[,1:4], iris[,5], point.est = iris[1:10, 1:4], type = "CLASS")$Point.est ## To call fitted values: x <- rnorm(100) ; y <- rnorm(100) NNS.reg(x, y)$Fitted ## To call partial derivative (univariate regression only): NNS.reg(x, y)$derivative ## End(Not run)
Rescale a vector using either min-max scaling or risk-neutral adjustment.
NNS.rescale(x, a, b, method = "minmax", T = NULL, type = "Terminal")NNS.rescale(x, a, b, method = "minmax", T = NULL, type = "Terminal")
x |
numeric vector; data to rescale (e.g., terminal prices for risk-neutral method). |
a |
numeric; defines the scaling target:
- For |
b |
numeric; defines the scaling range or rate:
- For |
method |
character; scaling method: |
T |
numeric; time to maturity in years (required for |
type |
character; for |
Returns a rescaled distribution:
- For "minmax": values scaled linearly to the range [a, b].
- For "riskneutral": values scaled multiplicatively to a risk-neutral mean (\( S_0 e^(rT) \) if type = "Terminal", or \( S_0 \) if type = "Discounted").
Fred Viole, OVVO Financial Systems
## Not run: set.seed(123) # Min-max scaling: a = lower limit, b = upper limit x <- rnorm(100) NNS.rescale(x, a = 5, b = 10, method = "minmax") # Scales to [5, 10] # Risk-neutral scaling (Terminal): a = S_0, b = r # Mean approx 105.13 prices <- 100 * exp(cumsum(rnorm(100, 0.001, 0.02))) NNS.rescale(prices, a = 100, b = 0.05, method = "riskneutral", T = 1, type = "Terminal") # Risk-neutral scaling (Discounted): a = S_0, b = r # Mean approx 100 NNS.rescale(prices, a = 100, b = 0.05, method = "riskneutral", T = 1, type = "Discounted") ## End(Not run)## Not run: set.seed(123) # Min-max scaling: a = lower limit, b = upper limit x <- rnorm(100) NNS.rescale(x, a = 5, b = 10, method = "minmax") # Scales to [5, 10] # Risk-neutral scaling (Terminal): a = S_0, b = r # Mean approx 105.13 prices <- 100 * exp(cumsum(rnorm(100, 0.001, 0.02))) NNS.rescale(prices, a = 100, b = 0.05, method = "riskneutral", T = 1, type = "Terminal") # Risk-neutral scaling (Discounted): a = S_0, b = r # Mean approx 100 NNS.rescale(prices, a = 100, b = 0.05, method = "riskneutral", T = 1, type = "Discounted") ## End(Not run)
Clusters a set of variables by iteratively extracting Stochastic Dominance (SD)-efficient sets, subject to a minimum cluster size.
NNS.SD.cluster( data, degree = 1, type = "discrete", min_cluster = 1, dendrogram = FALSE )NNS.SD.cluster( data, degree = 1, type = "discrete", min_cluster = 1, dendrogram = FALSE )
data |
A numeric matrix or data frame of variables to be clustered. |
degree |
Numeric options: (1, 2, 3). Degree of stochastic dominance test. |
type |
Character, either |
min_cluster |
Integer. The minimum number of elements required for a valid cluster. |
dendrogram |
Logical; |
The function applies NNS.SD.efficient.set iteratively, peeling off the SD-efficient set at each step
if it meets or exceeds min_cluster in size, until no more subsets can be extracted or all variables are exhausted.
Variables in each SD-efficient set form a cluster, with any remaining variables aggregated into the final cluster if it meets
the min_cluster threshold.
A list with the following components:
Clusters: A named list of cluster memberships where each element is the set of variable names belonging to that cluster.
Dendrogram (optional): If dendrogram = TRUE, an hclust object is also returned.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
Viole, F. (2017) "A Note on Stochastic Dominance." doi:10.2139/ssrn.3002675
## Not run: set.seed(123) x <- rnorm(100) y <- rnorm(100) z <- rnorm(100) A <- cbind(x, y, z) # Perform SD-based clustering (degree 1), requiring at least 2 elements per cluster results <- NNS.SD.cluster(data = A, degree = 1, min_cluster = 2) print(results$Clusters) # Produce a dendrogram as well results_with_dendro <- NNS.SD.cluster(data = A, degree = 1, min_cluster = 2, dendrogram = TRUE) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) y <- rnorm(100) z <- rnorm(100) A <- cbind(x, y, z) # Perform SD-based clustering (degree 1), requiring at least 2 elements per cluster results <- NNS.SD.cluster(data = A, degree = 1, min_cluster = 2) print(results$Clusters) # Produce a dendrogram as well results_with_dendro <- NNS.SD.cluster(data = A, degree = 1, min_cluster = 2, dendrogram = TRUE) ## End(Not run)
Determines the set of stochastic dominant variables for various degrees.
NNS.SD.efficient.set(x, degree, type = "discrete", status = TRUE)NNS.SD.efficient.set(x, degree, type = "discrete", status = TRUE)
x |
a numeric matrix or data frame. |
degree |
numeric options: (1, 2, 3); Degree of stochastic dominance test from (1, 2 or 3). |
type |
options: ("discrete", "continuous"); |
status |
logical; |
Returns set of stochastic dominant variable names.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
Viole, F. (2017) "A Note on Stochastic Dominance." doi:10.2139/ssrn.3002675
## Not run: set.seed(123) x <- rnorm(100) ; y<-rnorm(100) ; z<-rnorm(100) A <- cbind(x, y, z) NNS.SD.efficient.set(A, 1) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y<-rnorm(100) ; z<-rnorm(100) A <- cbind(x, y, z) NNS.SD.efficient.set(A, 1) ## End(Not run)
Seasonality test based on the coefficient of variation for the variable and lagged component series. A result of 1 signifies no seasonality present.
NNS.seas(variable, modulo = NULL, mod.only = TRUE, plot = TRUE)NNS.seas(variable, modulo = NULL, mod.only = TRUE, plot = TRUE)
variable |
a numeric vector. |
modulo |
integer(s); NULL (default) Used to find the nearest multiple(s) in the reported seasonal period. |
mod.only |
logical; |
plot |
logical; |
Returns a matrix of all periods exhibiting less coefficient of variation than the variable with "all.periods"; and the single period exhibiting the least coefficient of variation versus the variable with "best.period"; as well as a vector of "periods" for easy call into NNS.ARMA.optim. If no seasonality is detected, NNS.seas will return ("No Seasonality Detected").
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
## Not run: set.seed(123) x <- rnorm(100) ## To call strongest period based on coefficient of variation: NNS.seas(x, plot = FALSE)$best.period ## Using modulos for logical seasonal inference: NNS.seas(x, modulo = c(2,3,5,7), plot = FALSE) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ## To call strongest period based on coefficient of variation: NNS.seas(x, plot = FALSE)$best.period ## Using modulos for logical seasonal inference: NNS.seas(x, modulo = c(2,3,5,7), plot = FALSE) ## End(Not run)
Computes stochastic superiority between two numeric vectors as the empirical
probability that an observation from x exceeds an observation from
y, with optional tie adjustment and optional confidence intervals via
maximum entropy bootstrap.
NNS.SS( x, y, confidence.interval = FALSE, reps = 999, ci = 0.95, rho = 1 )NNS.SS( x, y, confidence.interval = FALSE, reps = 999, ci = 0.95, rho = 1 )
x |
a numeric vector. |
y |
a numeric vector. |
confidence.interval |
logical; |
reps |
numeric; number of maximum entropy bootstrap replicates used when
|
ci |
numeric in |
rho |
numeric; dependence target passed to |
NNS.SS returns:
the tie probability
and the tie-adjusted stochastic superiority measure
When confidence.interval = TRUE, confidence bounds for P^*
are computed from NNS.meboot bootstrap replicates using
LPM.VaR and UPM.VaR with degree = 0.
Missing values are removed from both x and y using
stats::na.omit. The empirical estimates are computed via a fast sorted
comparison routine rather than explicit pairwise expansion of all
x-y combinations.
For continuous data, p_tie will typically be zero, so p_star
and p_gt will be identical up to numerical precision. For discrete
data, p_star provides the standard tie-adjusted superiority measure.
When confidence.interval = TRUE, the interval is constructed from the
empirical bootstrap distribution of p_star, where
. The lower bound is obtained from
LPM.VaR evaluated at , and the upper bound is
obtained from UPM.VaR evaluated at , both with
degree = 0.
If confidence.interval = FALSE, returns a list containing:
p_gtempirical probability that x > y.
p_tieempirical probability that x = y.
p_startie-adjusted stochastic superiority probability.
If confidence.interval = TRUE, returns a list containing:
p_gtempirical probability that x > y.
p_tieempirical probability that x = y.
p_startie-adjusted stochastic superiority probability.
lowerlower confidence bound for p_star.
upperupper confidence bound for p_star.
ciconfidence level used.
repsnumber of bootstrap replicates used.
boot_valsbootstrap replicate values of p_star.
This function measures stochastic superiority as a pairwise exceedance
probability. This is distinct from first-, second-, or third-degree
stochastic dominance; see NNS.FSD, NNS.SSD, and
NNS.TSD for dominance testing.
Fred Viole, OVVO Financial Systems
Vinod, H.D. and Viole, F. (2020) Arbitrary Spearman's Rank Correlations in Maximum Entropy Bootstrap and Improved Monte Carlo Simulations. doi:10.2139/ssrn.3621614
Viole, F. and Nawrocki, D. (2013) Nonlinear Nonparametric Statistics: Using Partial Moments. ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/.
## Not run: set.seed(123) x <- rnorm(200, mean = 0.4, sd = 1) y <- rnorm(200, mean = 0.0, sd = 1) # Empirical stochastic superiority NNS.SS(x, y) # With confidence intervals NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95) # Discrete example with ties x <- sample(1:5, 100, replace = TRUE) y <- sample(1:5, 100, replace = TRUE) NNS.SS(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(200, mean = 0.4, sd = 1) y <- rnorm(200, mean = 0.0, sd = 1) # Empirical stochastic superiority NNS.SS(x, y) # With confidence intervals NNS.SS(x, y, confidence.interval = TRUE, reps = 999, ci = 0.95) # Discrete example with ties x <- sample(1:5, 100, replace = TRUE) y <- sample(1:5, 100, replace = TRUE) NNS.SS(x, y) ## End(Not run)
Bi-directional test of second degree stochastic dominance using lower partial moments.
NNS.SSD(x, y, plot = TRUE)NNS.SSD(x, y, plot = TRUE)
x |
a numeric vector. |
y |
a numeric vector. |
plot |
logical; |
Returns one of the following SSD results: "X SSD Y", "Y SSD X", or "NO SSD EXISTS".
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.SSD(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.SSD(x, y) ## End(Not run)
Uni-directional test of second degree stochastic dominance using lower partial moments used in SD Efficient Set routine.
NNS.SSD.uni(x, y)NNS.SSD.uni(x, y)
x |
a numeric vector. |
y |
a numeric vector. |
Returns (1) if "X SSD Y", else (0).
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.SSD.uni(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.SSD.uni(x, y) ## End(Not run)
Prediction model using the predictions of the NNS base models NNS.reg as features (i.e. meta-features) for the stacked model.
NNS.stack( IVs.train, DV.train, IVs.test = NULL, type = NULL, obj.fn = expression(sum((predicted - actual)^2)), objective = "min", optimize.threshold = TRUE, dist = "L2", CV.size = NULL, balance = FALSE, ts.test = NULL, folds = 5, order = NULL, method = c(1, 2), stack = TRUE, dim.red.method = "cor", pred.int = NULL, status = TRUE, ncores = NULL )NNS.stack( IVs.train, DV.train, IVs.test = NULL, type = NULL, obj.fn = expression(sum((predicted - actual)^2)), objective = "min", optimize.threshold = TRUE, dist = "L2", CV.size = NULL, balance = FALSE, ts.test = NULL, folds = 5, order = NULL, method = c(1, 2), stack = TRUE, dim.red.method = "cor", pred.int = NULL, status = TRUE, ncores = NULL )
IVs.train |
a vector, matrix or data frame of variables of numeric or factor data types. |
DV.train |
a numeric or factor vector with compatible dimensions to |
IVs.test |
a vector, matrix or data frame of variables of numeric or factor data types with compatible dimensions to |
type |
|
obj.fn |
expression; |
objective |
options: ("min", "max") |
optimize.threshold |
logical; |
dist |
options:("L1", "L2", "DTW", "FACTOR") the method of distance calculation; Selects the distance calculation used. |
CV.size |
numeric [0, 1]; |
balance |
logical; |
ts.test |
integer; NULL (default) Sets the length of the test set for time-series data; typically |
folds |
integer; |
order |
options: (integer, "max", NULL); |
method |
numeric options: (1, 2); Select the NNS method to include in stack. |
stack |
logical; |
dim.red.method |
options: ("cor", "NNS.dep", "NNS.caus", "equal", "all") method for determining synthetic X* coefficients. |
pred.int |
numeric [0,1]; |
status |
logical; |
ncores |
integer; value specifying the number of cores to be used in the parallelized subroutine NNS.reg. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1. |
Returns a vector of fitted values for the dependent variable test set for all models.
"NNS.reg.n.best" returns the optimum "n.best" parameter for the NNS.reg multivariate regression. "SSE.reg" returns the SSE for the NNS.reg multivariate regression.
"OBJfn.reg" returns the obj.fn for the NNS.reg regression.
"NNS.dim.red.threshold" returns the optimum "threshold" from the NNS.reg dimension reduction regression.
"OBJfn.dim.red" returns the obj.fn for the NNS.reg dimension reduction regression.
"probability.threshold" returns the optimum probability threshold for classification, else 0.5 when set to FALSE.
"reg" returns NNS.reg output.
"reg.pred.int" returns the prediction intervals for the regression output.
"dim.red" returns NNS.reg dimension reduction regression output.
"dim.red.pred.int" returns the prediction intervals for the dimension reduction regression output.
"stack" returns the output of the stacked model.
"pred.int" returns the prediction intervals for the stacked model.
Incorporate any objective function from external packages (such as Metrics::mape) via NNS.stack(..., obj.fn = expression(Metrics::mape(actual, predicted)), objective = "min")
Like a logistic regression, the (type = "CLASS") setting is not necessary for target variable of two classes e.g. [0, 1]. The response variable base category should be 1 for multiple class problems.
Missing data should be handled prior as well using na.omit or complete.cases on the full dataset.
If error received:
"Error in is.data.frame(x) : object 'RP' not found"
reduce the CV.size.
Fred Viole, OVVO Financial Systems
Viole, F. (2016) "Classification Using NNS Clustering Analysis" doi:10.2139/ssrn.2864711
## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150. ## Not run: NNS.stack(iris[1:140, 1:4], iris[1:140, 5], IVs.test = iris[141:150, 1:4], type = "CLASS", balance = TRUE) ## Using 'iris' dataset to determine [n.best] and [threshold] with no test set. NNS.stack(iris[ , 1:4], iris[ , 5], type = "CLASS") ## End(Not run)## Using 'iris' dataset where test set [IVs.test] is 'iris' rows 141:150. ## Not run: NNS.stack(iris[1:140, 1:4], iris[1:140, 5], IVs.test = iris[141:150, 1:4], type = "CLASS", balance = TRUE) ## Using 'iris' dataset to determine [n.best] and [threshold] with no test set. NNS.stack(iris[ , 1:4], iris[ , 5], type = "CLASS") ## End(Not run)
Bi-directional test of third degree stochastic dominance using lower partial moments.
NNS.TSD(x, y, plot = TRUE)NNS.TSD(x, y, plot = TRUE)
x |
a numeric vector. |
y |
a numeric vector. |
plot |
logical; |
Returns one of the following TSD results: "X TSD Y", "Y TSD X", or "NO TSD EXISTS".
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.TSD(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.TSD(x, y) ## End(Not run)
Uni-directional test of third degree stochastic dominance using lower partial moments used in SD Efficient Set routine.
NNS.TSD.uni(x, y)NNS.TSD.uni(x, y)
x |
a numeric vector. |
y |
a numeric vector. |
Returns (1) if "X TSD Y", else (0).
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2016) "LPM Density Functions for the Computation of the SD Efficient Set." Journal of Mathematical Finance, 6, 105-126. doi:10.4236/jmf.2016.61012.
## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.TSD.uni(x, y) ## End(Not run)## Not run: set.seed(123) x <- rnorm(100) ; y <- rnorm(100) NNS.TSD.uni(x, y) ## End(Not run)
Nonparametric vector autoregressive model incorporating NNS.ARMA estimates of variables into NNS.reg for a multi-variate time-series forecast.
NNS.VAR( variables, h, tau = 1, dim.red.method = "cor", naive.weights = TRUE, obj.fn = expression(mean((predicted - actual)^2)/(NNS::Co.LPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)) + NNS::Co.UPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)))), objective = "min", status = TRUE, ncores = NULL, nowcast = FALSE )NNS.VAR( variables, h, tau = 1, dim.red.method = "cor", naive.weights = TRUE, obj.fn = expression(mean((predicted - actual)^2)/(NNS::Co.LPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)) + NNS::Co.UPM(1, predicted, actual, target_x = mean(predicted), target_y = mean(actual)))), objective = "min", status = TRUE, ncores = NULL, nowcast = FALSE )
variables |
a numeric matrix or data.frame of contemporaneous time-series to forecast. |
h |
integer; 1 (default) Number of periods to forecast. |
tau |
positive integer [ > 0]; 1 (default) Number of lagged observations to consider for the time-series data. Vector for single lag for each respective variable or list for multiple lags per each variable. |
dim.red.method |
options: ("cor", "NNS.dep", "NNS.caus", "all") method for reducing regressors via NNS.stack. |
naive.weights |
logical; |
obj.fn |
expression;
|
objective |
options: ("min", "max") |
status |
logical; |
ncores |
integer; value specifying the number of cores to be used in the parallelized subroutine NNS.ARMA.optim. If NULL (default), the number of cores to be used is equal to the number of cores of the machine - 1. |
nowcast |
logical; |
Returns the following matrices of forecasted variables:
"interpolated_and_extrapolated" Returns a data.frame of the linear interpolated and NNS.ARMA extrapolated values to replace NA values in the original variables argument. This is required for working with variables containing different frequencies, e.g. where NA would be reported for intra-quarterly data when indexed with monthly periods.
"relevant_variables" Returns the relevant variables from the dimension reduction step.
"univariate" Returns the univariate NNS.ARMA forecasts.
"multivariate" Returns the multi-variate NNS.reg forecasts.
"ensemble" Returns the ensemble of both "univariate" and "multivariate" forecasts.
"Error in { : task xx failed -}" should be re-run with NNS.VAR(..., ncores = 1).
Not recommended for factor variables, even after transformed to numeric. NNS.reg is better suited for factor or binary regressor extrapolation.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
Viole, F. (2019) "Multi-variate Time-Series Forecasting: Nonparametric Vector Autoregression Using NNS" doi:10.2139/ssrn.3489550
Viole, F. (2020) "NOWCASTING with NNS" doi:10.2139/ssrn.3589816
Viole, F. (2019) "Forecasting Using NNS" doi:10.2139/ssrn.3382300
Vinod, H. and Viole, F. (2017) "Nonparametric Regression Using Clusters" doi:10.1007/s10614-017-9713-5
Vinod, H. and Viole, F. (2018) "Clustering and Curve Fitting by Line Segments" doi:10.20944/preprints201801.0090.v1
## Not run: #################################################### ### Standard Nonparametric Vector Autoregression ### #################################################### set.seed(123) x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) A <- cbind(x = x, y = y, z = z) ### Using lags 1:4 for each variable NNS.VAR(A, h = 12, tau = 4, status = TRUE) ### Using lag 1 for variable 1, lag 3 for variable 2 and lag 3 for variable 3 NNS.VAR(A, h = 12, tau = c(1,3,3), status = TRUE) ### Using lags c(1,2,3) for variables 1 and 3, while using lags c(4,5,6) for variable 2 NNS.VAR(A, h = 12, tau = list(c(1,2,3), c(4,5,6), c(1,2,3)), status = TRUE) ### PREDICTION INTERVALS # Store NNS.VAR output nns_estimate <- NNS.VAR(A, h = 12, tau = 4, status = TRUE) # Create bootstrap replicates using NNS.meboot replicates <- NNS.meboot(nns_estimate$ensemble[,1], rho = seq(-1,1,.25))["replicates",] replicates <- do.call(cbind, replicates) # Apply UPM.VaR and LPM.VaR for desired prediction interval...95 percent illustrated # Tail percentage used in first argument per {LPM.VaR} and {UPM.VaR} functions lower_CIs <- apply(replicates, 1, function(z) LPM.VaR(0.025, 0, z)) upper_CIs <- apply(replicates, 1, function(z) UPM.VaR(0.025, 0, z)) # View results cbind(nns_estimate$ensemble[,1], lower_CIs, upper_CIs) ######################################### ### NOWCASTING with Mixed Frequencies ### ######################################### library(Quandl) econ_variables <- Quandl(c("FRED/GDPC1", "FRED/UNRATE", "FRED/CPIAUCSL"),type = 'ts', order = "asc", collapse = "monthly", start_date = "2000-01-01") ### Note the missing values that need to be imputed head(econ_variables) tail(econ_variables) NNS.VAR(econ_variables, h = 12, tau = 12, status = TRUE) ## End(Not run)## Not run: #################################################### ### Standard Nonparametric Vector Autoregression ### #################################################### set.seed(123) x <- rnorm(100) ; y <- rnorm(100) ; z <- rnorm(100) A <- cbind(x = x, y = y, z = z) ### Using lags 1:4 for each variable NNS.VAR(A, h = 12, tau = 4, status = TRUE) ### Using lag 1 for variable 1, lag 3 for variable 2 and lag 3 for variable 3 NNS.VAR(A, h = 12, tau = c(1,3,3), status = TRUE) ### Using lags c(1,2,3) for variables 1 and 3, while using lags c(4,5,6) for variable 2 NNS.VAR(A, h = 12, tau = list(c(1,2,3), c(4,5,6), c(1,2,3)), status = TRUE) ### PREDICTION INTERVALS # Store NNS.VAR output nns_estimate <- NNS.VAR(A, h = 12, tau = 4, status = TRUE) # Create bootstrap replicates using NNS.meboot replicates <- NNS.meboot(nns_estimate$ensemble[,1], rho = seq(-1,1,.25))["replicates",] replicates <- do.call(cbind, replicates) # Apply UPM.VaR and LPM.VaR for desired prediction interval...95 percent illustrated # Tail percentage used in first argument per {LPM.VaR} and {UPM.VaR} functions lower_CIs <- apply(replicates, 1, function(z) LPM.VaR(0.025, 0, z)) upper_CIs <- apply(replicates, 1, function(z) UPM.VaR(0.025, 0, z)) # View results cbind(nns_estimate$ensemble[,1], lower_CIs, upper_CIs) ######################################### ### NOWCASTING with Mixed Frequencies ### ######################################### library(Quandl) econ_variables <- Quandl(c("FRED/GDPC1", "FRED/UNRATE", "FRED/CPIAUCSL"),type = 'ts', order = "asc", collapse = "monthly", start_date = "2000-01-01") ### Note the missing values that need to be imputed head(econ_variables) tail(econ_variables) NNS.VAR(econ_variables, h = 12, tau = 12, status = TRUE) ## End(Not run)
Builds a list containing CUPM, DUPM, DLPM, CLPM and the overall covariance matrix.
PM.matrix(LPM_degree, UPM_degree, target, variable, pop_adj, norm = FALSE)PM.matrix(LPM_degree, UPM_degree, target, variable, pop_adj, norm = FALSE)
LPM_degree |
numeric; lower partial moment degree (0 = freq, 1 = area). |
UPM_degree |
numeric; upper partial moment degree (0 = freq, 1 = area). |
target |
numeric vector; thresholds for each column (defaults to colMeans). |
variable |
numeric matrix or data.frame. |
pop_adj |
logical; TRUE adjusts population vs. sample moments. |
norm |
logical; default FALSE. If TRUE, each quadrant matrix is cell-wise normalized so their sum is 1 at each (i,j). |
Partial Moment Matrix
A list: $cupm, $dupm, $dlpm, $clpm, $cov.matrix.
When norm = TRUE, each cell (i,j) of the four quadrant matrices
is normalized so that their sum equals 1. In this case,
$cov.matrix is computed as
$cupm + $clpm - $dupm - $dlpm, yielding a dimensionless,
signed dependence measure bounded between -1 and 1.
This representation discards magnitude information and is therefore
a lossy nonlinear correlation matrix. A higher fidelity nonlinear
correlation matrix is available via the NNS.dep function.
set.seed(123) A <- cbind(rnorm(100), rnorm(100), rnorm(100)) # Uses norm = FALSE by default PM.matrix(1, 1, target = NULL, variable = A, pop_adj = TRUE) # Enable normalization PM.matrix(1, 1, target = NULL, variable = A, pop_adj = TRUE, norm = TRUE) # Use 0's for targets PM.matrix(1, 1, target = rep(0, ncol(A)), variable = A, pop_adj = TRUE) # Use variable medians as targets PM.matrix(1, 1, target = apply(A, 2, "median"), variable = A, pop_adj = TRUE)set.seed(123) A <- cbind(rnorm(100), rnorm(100), rnorm(100)) # Uses norm = FALSE by default PM.matrix(1, 1, target = NULL, variable = A, pop_adj = TRUE) # Enable normalization PM.matrix(1, 1, target = NULL, variable = A, pop_adj = TRUE, norm = TRUE) # Use 0's for targets PM.matrix(1, 1, target = rep(0, ncol(A)), variable = A, pop_adj = TRUE) # Use variable medians as targets PM.matrix(1, 1, target = apply(A, 2, "median"), variable = A, pop_adj = TRUE)
This function generates a univariate upper partial moment for any degree or target.
UPM(degree, target, variable, excess_ret = FALSE)UPM(degree, target, variable, excess_ret = FALSE)
degree |
numeric; |
target |
numeric; Set to |
variable |
a numeric vector. data.frame or list type objects are not permissible. |
excess_ret |
logical; |
UPM of variable
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
set.seed(123) x <- rnorm(100) UPM(0, mean(x), x)set.seed(123) x <- rnorm(100) UPM(0, mean(x), x)
This function generates a standardized univariate upper partial moment of any non‑negative degree for a given target.
UPM.ratio(degree, target, variable)UPM.ratio(degree, target, variable)
degree |
numeric; degree = 0 gives frequency, degree = 1 gives area. |
target |
numeric vector; threshold(s). Defaults to mean(variable). |
variable |
numeric vector or data‑frame column to evaluate. |
Numeric vector of standardized upper partial moments.
Fred Viole, OVVO Financial Systems
Viole, F. & Nawrocki, D. (2013) *Nonlinear Nonparametric Statistics: Using Partial Moments* (ISBN:1490523995)
set.seed(123) x <- rnorm(100) UPM.ratio(0, mean(x), x) ## Not run: plot3d(x, y, Co.UPM(0, sort(x), sort(y), x, y), …) ## End(Not run)set.seed(123) x <- rnorm(100) UPM.ratio(0, mean(x), x) ## Not run: plot3d(x, y, Co.UPM(0, sort(x), sort(y), x, y), …) ## End(Not run)
Generates an upside value at risk (VaR) quantile based on the Upper Partial Moment ratio
UPM.VaR(percentile, degree, x)UPM.VaR(percentile, degree, x)
percentile |
numeric [0, 1]; The percentile for right-tail VaR (vectorized). |
degree |
integer; |
x |
a numeric vector. |
Returns a numeric value representing the point at which "percentile" of the area of x is above.
Fred Viole, OVVO Financial Systems
Viole, F. and Nawrocki, D. (2013) "Nonlinear Nonparametric Statistics: Using Partial Moments" (ISBN: 1490523995, 2nd edition: https://ovvo-financial.github.io/NNS/book/)
set.seed(123) x <- rnorm(100) ## For 5th percentile, right-tail UPM.VaR(0.05, 0, x)set.seed(123) x <- rnorm(100) ## For 5th percentile, right-tail UPM.VaR(0.05, 0, x)