Notice: This post is based on my own views and experience with machine learning algorithms which is orthogonal to the views and experience of everyone else.
Modern deep learning algorithms were not designed in order to have a nice mathematical theory nor were they designed to be particularly interpretable or understandable. They were instead designed to perform well while the inner workings of the neural network are mostly regarded as a black box. The end consumer of AI technologies will typically be unfamiliar with and uninterested in the inner workings of an AI system, so the developers of these systems develop AI models that look good on the outside, but the parameters of these AI models may not even be publicly available. In practice, neural networks tend to be a hodgepodge of techniques (such as convolutional layers, regularization, normalization, various activation functions, etc.) that experimentally work well for data in a specific format rather than a unified structure that mathematicians can prove theorems about. I think that this is not a good thing as we need to be more focused on AI safety/interpretability than simply performance.
Criteria by which machine learning is mathematical
A fitness function which has many a unique maximum or few local maxima should be considered as more mathematical and interpretable than a fitness function with many local maxima (or at least the local optimum should be completely describable using few bits of information). I prefer machine learning algorithms where the process of training is more pseudodeterministic in the sense that the learned AI model does not depend very much on a random initialization or other sources of randomness produced during training. The pseudodeterminism should also be robust in several different ways. For example, the Hessian at the local maximum should not have eigenvalues that are too close to zero, and the process of training the fitness function should remain pseudodeterministic even if generalized. If these pseudodeterminism requirements are satisfied, then the trained model should be interpretable and one should be able to investigate the trained AI model mathematically.
The nature of neural networks
I consider neural networks with ReLU activation to be mathematical objects in the sense that the functions computed by these neural networks are precisely the roots of tropical rational functions, so perhaps the connection between neural networks with ReLU and tropical geometry may be explored further.
I have performed a simple experiment where I have trained a reasonably small neural network twice with the same initialization and have taken some measures to ensure that the trained neural networks would end up at the same local optimum. But even when taking these measures, the two neural networks ended up looking quite different from each other. To make things worse, after training a neural network, one can remove over 90 percent of the weights without harming the performance of the neural network. This convinces me that trained neural networks still have a lot of random information in them and are difficult to investigate from a pure mathematics perspective. Neural networks are quite noisy and perhaps this noise is a reason for their lack of interpretability.
Alternative mathematical machine learning techniques
There are some machine learning algorithms that do have a nice mathematical theory behind them. Unfortunately, these more mathematical machine learning algorithms have not been developed to the point where they can compete with neural networks, but they still have an important role in machine learning, and I believe that people can develop the theory and practice of these more mathematical machine learning algorithms so that they will help with tasks that today can only be accomplished using deep neural networks.
The algorithm PageRank that is used for Google and other search engines simply consists of computing the Perron-Frobenius (the dominant) eigenvector of the adjacency matrix of a directed graph. The eigenvectors of the Laplacian matrix of a graph can be used to partition the nodes of the graph into clusters. I would therefore consider spectral graph theory (along with related concepts such as the eigenvalues of the Laplacian on some Riemannian manifold) as an area of mathematics applicable to machine learning.
Making machine learning more mathematical using functions on a complex domain
Suppose that $D=\{z\in\mathbb{C}:|z|\leq 1\}$, and let $f:D^n\rightarrow[-\infty,\infty)$ be a non-constant continuous function that is plurisubharmonic on the interior of $D^n$. Let $g:S_1^n\rightarrow[-\infty,\infty)$ be the restriction of $f$ to the torus $S_1^n$.
Let $L_f$ and $L_g$ denote the set of local maxima of $f$ and $g$ respectively. Then $L_f\subseteq L_g$ and $\max f=\max g$ by the maximum principle.
If $\mathbf{z}$ is a typical element in $L_g$, then there is approximately a $0.5^n$ probability that $\mathbf{z}$ also belongs to $L_f$, so $|L_f|\approx(0.5)^n|L_g|$. Since $f$ has fewer local maxima than $g$, the fitness function $f$ should be easier to investigate mathematically than the function $f$. Now, the main problem is to use fitness functions like $f$ to solve machine learning tasks.
Unfortunately, I have not seen much literature trying about any attempts to use plurisubharmonic fitness functions for machine learning, and there are certainly difficulties with this sort of approach, but it seems like researchers should spend more resources developing more fitness functions of a complex variable to develop safer and more interpretable AI systems.