Since TensorLy was refactored to support backends, it is fairly easy to add new backends, so as a proof of concepts I put together a Pytorch backend. There are most likely a few optimisations to do and some things could be done better but all the tests pass. Here is a quick demonstration.

## Requirements

For the pytorch backend you will need the master version of TensorLy as well as the master version of PyTorch.

We need pytorch Variables to have a shape property, as added in this pull-request, so either install pytorch from master or do the modification yourself!

## Tucker decomposition using SGD and autograd

Let's see how we can use TensorLy and the pytorch backend to perform Tucker tensor decomposition via gradient descent.

First let's import all the necessary stuff:

```
>> import numpy as np
>> import torch
>> from torch.autograd import Variable
>> import tensorly as tl
>> from tensorly.tucker_tensor import tucker_to_tensor
>> from tensorly.random import check_random_state
Using mxnet backend.
```

Now let's switch to the PyTorch backend:

```
>> tl.set_backend('pytorch')
Using pytorch backend.
```

We just fix the random seed for reproducibility

```
>> random_state = 1234
>> rng = check_random_state(random_state)
```

Define a random tensor which we will try to decompose. We wrap our tensors in Variables so we can backpropagate through them:

```
>> shape = [5, 5, 5]
>> tensor = Variable(tl.tensor(rng.random_sample(shape)), requires_grad=True)
```

Initialise a random Tucker decomposition of that tensor

```
>> ranks = [5, 5, 5]
>> core = Variable(tl.tensor(rng.random_sample(ranks)), requires_grad=True)
>> factors = [Variable(tl.tensor(rng.random_sample((tensor.shape[i], ranks[i]))),
requires_grad=True) for i in range(tl.ndim(tensor))]
```

Now we just iterate through the training loop and backpropagate...

```
n_iter = 10000
lr = 0.00005
penalty = 0.1
optimizer = torch.optim.Adam([core]+factors, lr=lr)
for i in range(1, n_iter):
# Important: do not forget to reset the gradients
optimizer.zero_grad()
# Reconstruct the tensor from the decomposed form
rec = tucker_to_tensor(core, factors)
# squared l2 loss
loss = (rec - tensor).pow(2).sum()
# squared l2 penalty on the factors of the decomposition
for f in factors:
loss = loss + penalty * f.pow(2).sum()
loss.backward()
optimizer.step()
if i % 1000 == 0:
rec_error = tl.norm(rec.data - tensor.data, 2)/tl.norm(tensor.data, 2)
print("Epoch %s,. Rec. error: %s" % (i, rec_error))
```

You should see the reconstruction error go down:

Epoch 1000,. Rec. error: 9.85501529153 Epoch 2000,. Rec. error: 5.4266791947 Epoch 3000,. Rec. error: 2.93432695168 Epoch 4000,. Rec. error: 1.58708802561 Epoch 5000,. Rec. error: 1.03465270384 Epoch 6000,. Rec. error: 0.94976522999 Epoch 7000,. Rec. error: 0.979246423375 Epoch 8000,. Rec. error: 0.996610962433 Epoch 9000,. Rec. error: 0.999994015288

## What next?

This is very much a proof of concept so there might be bugs. If you see something that can be improved or if you have any suggestions, feel free to comment here or open an issue on the Github page.