While adding various backends to TensorLy, I had to deal with the various advantages and oddities of the many deep learning frameworks out there. Here, I am collecting some of the lessons I learnt, along with some thoughts on what an ideal deep learning framework should look like, for those of you also combating this interesting issue.
First and foremost, while diversity is good and tends to encourage novelty, I believe the deep learning field is in dire need of unification. Just like Travis Oliphant unified Numeric and Numarray in what we now know as the ubiquitous NumPy, there needs a big unification work for deep learning frameworks. Here are some elements to go forward.
Tensors VS matrices
Even though most frameworks manipulate tensors, a lot of the methods actually focus on matrices, or tensors of small order. Some frameworks even break if you try to manipulate tensors of too high an order... Tensor methods are powerful and multi-linear algebra is a strict superset of traditional linear algebra. As computational resources and tensor methods develop, leveraging the structure (spatio-temporal for instance) in the data will become increasingly important. We need to develop tools that work equally well with tensors of any order.
Imperative of symbolic?
While both approaches have they pros and cons, I believe imperative will gain more traction, especially for research and prototyping. It is much more intuitive and flexible than symbolic approaches (that build computational graphs).
It does not have to be slower and can even combine the best of both worlds under the hood, whether with dynamical graphs (e.g. PyTorch) or hybrid approach (e.g. MXNet's Gluon). Tools like TVM will make it possible to automatically optimize the code and compile it for a large array of platforms and hardwares.
The ndarray structure
Rather than being monolithic, the framework should be broken down into a core multi-dimensional array structure, on which the rest of the framework would be built. NumPy has a proven and established interface so is the best candidate for the API.
One thing to consider is how to specify the context (or device) on which the tensor should reside. I do not want to have to explicitly call a method on my tensor every time, such as:
T = tensor(data, dtype).gpu()
Not only is it annoying, but, putting aside the relative lack of elegance of that solution, it also makes it harder to abstract the hardware. You no longer can set the context as one of the variables. Instead you have to either set a global context (i.e. module.use_gpu() ) or have conditional statements everywhere…
if use_gpu: tensor.gpu()
This needlessly crowds the code with logic that should be abstracted away in a context argument
Ideally, the structure should transparently support CPU and GPU (and even multi-CPU or GPU), via an additional argument, such as: .. code:
T = tensor(data, dtype, context)
An important aspect is to have parameters and function names as intuitive and informative as possible. But mostly consistent. The choices of dim instead of axis, or view instead of reshape, while justified in some cases, are the best way to create errors and drive users mad.
In supporting GPU, frameworks should be transparent about how they use the memory. If by default, all the memory of a GPU is reserved, this should be explicit and there should be a clear variable or method to change that behaviour. Similarly, the garbage collector should work as expected, regardless of the device in which the array is declared.
While designing the ndarray structure, it is important not to over-simplify. Amongst necessary functionality are slicing and fancy indexing: the ability to index an array by slices or using, for instance, a list of indexes. Importantly, these operations should be differentiable.
The actual deep learning / tensor frameworks should build on such a structure and leverage it. Chainer is a great example of this.
In addition, a mature, drop-in replacement for NumPy with GPU and multi-machine support will be a tremendous addition for the whole Python community.
Zeroth order tensors (or 0-dimensional)
One important, most often neglected aspect of tensors is zeroth-order tensors, aka scalars. If you fully contract a tensor, for instance by taking its norm n, with an inner product or with tensor contraction, you get a scalar. This needs to be differentiable, i.e. have gradients attached. However, it also needs to be comparable to other scalars. For instance, if it is a loss, is this higher than a certain threshold, etc.
This might be a technical challenge to get right, but which must be addressed.
Data: loading and augmentation
Data loading and processing as well as augmentation should be prime citizens in any serious framework, not an after thoughts like is sometimes the case.
In particular, if the framework has an ndarray structure as described above, a lot of the processing can be done directly on the target device. What we do not want is a pipeline where the image is loaded, for instance, with OpenCV, processed on CPU, converted to NumPy and finally to the target framework….
The other aspect is multi-processing: rather than having several classes for several classes, a parameter could control the number of threads (think n_jobs in Scikit-Learn). Again, this makes the code more flexible and clearer.
Whilst there might be imprecisions in the list above, the goal is mainly to collect thoughts towards designing better frameworks for tensor methods and deep learning, to allow us to focus on the algorithms without having lower level considerations in the way.
I will update this list as I go but I am curious to get your thoughts, so don’t hesitate to leave a comment!