Generalized Isolation Forest for Anomaly Detection

Abstract

This letter introduces a generalization of Isolation Forest (IF) based on the existing Extended IF (EIF). EIF has shown some interest compared to IF being for instance more robust to some artefacts. However, some information can be lost when computing the EIF trees since the sampled threshold might lead to empty branches. This letter introduces a generalized isolation forest algorithm called Generalized IF (GIF) to overcome these issues. GIF is faster than EIF with a similar performance, as shown in several simulation results associated with reference databases used for anomaly detection.

Publication
Pattern Recognition Letters

The Python folder contains:

  • datasets, comprising all the datasets as explained in the paper
  • eif_old.py, the class to create EIF as provided by Hariri et al. *gif.py, our implementation of the proposed GIF
  • images, a folder to save the images
  • Main_2D_blob.ipynb, a Jupyter notebook to process the single blob 2D dataset, as inspired by Hariri et al.
  • Main_2D_double_blob.ipynb, a Jupyter notebook to process the double blob 2D dataset, as inspired by Hariri et al.
  • Main_3D_blob.ipynb, a Jupyter notebook to process the single blob 3D dataset, as inspired by Hariri et al.
  • Main_4D_blob.ipynb, a Jupyter notebook to process the single blob 4D dataset, as inspired by Hariri et al.
  • main_lettre.ipynb, a Jupyter notebook to process all the datasets with a Monte-Carlo approach. Note that the corresponding outputs are provided in the folder results to save time for you as computations might last a while
  • main_lettre_parrallel.ipynb, a parallelised versiob of the previous notebook
  • process_results_2D.ipynb, a Jupyter notebook to process the 2D datasets
  • process_results.ipynb, a Jupyter notebook to process the others datasets, which requires to run the main notebbok before to have the Monte-Carlo results
  • results, a folder containing the Monte-Carlo results to save some time
Julien LESOUPLE
Julien LESOUPLE
Lecturer/Researcher

My research interests include statistical signal processing applied to navigation.