lookout.Rd
This function identifies outliers using the algorithm lookout, an outlier detection method that uses leave-one-out kernel density estimates and generalized Pareto distributions to find outliers.
lookout(X, alpha = 0.05, unitize = TRUE, bw = NULL, gpd = NULL, fast = TRUE)
The input data in a dataframe, matrix or tibble format.
The level of significance. Default is 0.05
.
An option to normalize the data. Default is TRUE
,
which normalizes each column to [0,1]
.
Bandwidth parameter. Default is NULL
as the bandwidth is
found using Persistent Homology.
Generalized Pareto distribution parameters. If `NULL` (the default), these are estimated from the data.
If set to TRUE
, makes the computation faster by sub-setting
the data for the bandwidth calculation.
A list with the following components:
outliers
The set of outliers.
outlier_probability
The GPD probability of the data.
outlier_scores
The outlier scores of the data.
bandwidth
The bandwdith selected using persistent homology.
kde
The kernel density estimate values.
lookde
The leave-one-out kde values.
gpd
The fitted GPD parameters.
X <- rbind(
data.frame(x = rnorm(500),
y = rnorm(500)),
data.frame(x = rnorm(5, mean = 10, sd = 0.2),
y = rnorm(5, mean = 10, sd = 0.2))
)
lo <- lookout(X)
lo
#> Leave-out-out KDE outliers using lookout algorithm
#>
#> Call: lookout(X = X)
#>
#> Outliers Probability
#> 1 501 0.02261325
#> 2 502 0.02276128
#> 3 503 0.02294858
#> 4 504 0.02299057
#> 5 505 0.02289632
#>
autoplot(lo)