Optimizing neural networks: a comparative study of activation functions in deep learning

Ahmed Mobarki; Abdullah Sheikh

doi:10.11591/ijece.v16i2.pp945-963

Optimizing neural networks: a comparative study of activation functions in deep learning

Ahmed Mobarki, Abdullah Sheikh

Abstract

Activation functions play a pivotal role in deep learning (DL) models, thus shaping their learning capabilities, convergence behavior, and generalization performance. However, the selection of activation functions without systematic evaluation in many applications has limited the model's performance. Inappropriate activation functions may cause gradients to shrink or blow-up during backpropagation, thereby affecting effective learning. To conquer this problem, this paper provides a novel comprehensive empirical investigation of nine activation functions, including traditional functions like rectified linear unit (ReLU), Sigmoid, Tanh, and ELU, and modern nonlinearities like Swish, Mish, GELU, and SMU. In the proposed methodology, these nine activation functions are evaluated within two prominent neural network architectures, namely convolutional neural networks (CNNs) and multi-layer perceptrons (MLPs), across benchmark datasets, namely CIFAR-10, CIFAR-100, and MNIST. The evaluation criteria include validation accuracy, loss, training time, and gradient stability. Experimental results proved that GELU activation function improved MLP accuracy to 98.03% and CNN accuracy to 93.82% while maintaining stable gradients and low loss values of 0.088 and 0.221, respectively. These findings provided practical guidelines for selecting activation functions suited to specific task complexities and model depths, contributing to the design of more efficient and accurate DL systems.

Keywords

Activation functions; Deep learning; Gaussian error linear unit; Mish; Neural networks; Rectified linear unit; Swish

Full Text:

PDF

DOI: http://doi.org/10.11591/ijece.v16i2.pp945-963

Copyright (c) 2026 Ahmed Mobarki, Abdullah Sheikh

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES).

Username
Password
Remember me