Deep Network Approximation: Beyond ReLU to Diverse Activation Functions
This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set 𝒜 is defined to encompass the majority of commonly used activation functions, such as 𝚁𝚎𝙻𝚄, 𝙻𝚎𝚊𝚔𝚢𝚁𝚎𝙻𝚄, 𝚁𝚎𝙻𝚄^2, 𝙴𝙻𝚄, 𝚂𝙴𝙻𝚄, 𝚂𝚘𝚏𝚝𝚙𝚕𝚞𝚜, 𝙶𝙴𝙻𝚄, 𝚂𝚒𝙻𝚄, 𝚂𝚠𝚒𝚜𝚑, 𝙼𝚒𝚜𝚑, 𝚂𝚒𝚐𝚖𝚘𝚒𝚍, 𝚃𝚊𝚗𝚑, 𝙰𝚛𝚌𝚝𝚊𝚗, 𝚂𝚘𝚏𝚝𝚜𝚒𝚐𝚗, 𝚍𝚂𝚒𝙻𝚄, and 𝚂𝚁𝚂. We demonstrate that for any activation function ϱ∈𝒜, a 𝚁𝚎𝙻𝚄 network of width N and depth L can be approximated to arbitrary precision by a ϱ-activated network of width 6N and depth 2L on any bounded set. This finding enables the extension of most approximation results achieved with 𝚁𝚎𝙻𝚄 networks to a wide variety of other activation functions, at the cost of slightly larger constants.
READ FULL TEXT