Paper

Dimensionality reduction for clustering of nonlinear industrial data: A tutorial
Author
Hae Rang Roh, Chae Sun Kim, Yongseok Lee, Jong Min Lee*
Journal
Korean Journal of Chemical Engineering
Page
987-1001
Year
2025

Dimensionality reduction is essential for industrial process data with numerous nonlinear variables to retain only the important features for visualization or subsequent tasks. This study serves as a tutorial demonstrating how various dimensionality reduction techniques perform as the complexity of process variables in toy examples increases. Among the variables, there are those containing fault signals, aiming to demonstrate the process of performing a fault detection task. The results evaluated based on three criteria showed that Uniform Manifold Approximation and Projection (UMAP) demonstrated notable results, particularly with sparse and noisy data, while also offering adequate robustness to out-of-sample test data. This tutorial provides guidance on selecting the appropriate dimensionality reduction technique based on data complexity, ultimately enabling more effective execution of subsequent tasks.