No, you're wrong about Pytorch. If your custom op is a combination of existing o...

No, you're wrong about Pytorch. If your custom op is a combination of existing ops, you don't need to define a custom backward pass. This is true for any DL framework with autodiff. For more details, look at this answer [1].

Regarding more verbose Pytorch code for the update step, compare:

In Tensorflow:

loss = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_logits)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

sess.run(optimizer)

In PyTorch:

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

loss = nn.CrossEntropyLoss()(output, label)

optimizer.zero_grad()

loss.backward()

optimizer.step()

In my opinion, PyTorch makes the parameter update process a lot easier to understand, control, and modify (if needed). For example what if you want to modify gradients right before the weight update? In PyTorch I'd do it right here in my code after the loss.backward() statement, while in TF I'd have to modify the optimizer code. Which option would you prefer?

[1] https://stackoverflow.com/questions/44428784/when-is-a-pytor...