DistributedDataParallel non-floating point dtype parameter with requires_grad=False · Issue #32018 · pytorch/pytorch · GitHub
$ 26.99 · 4.6 (503) · In stock
🐛 Bug Using DistributedDataParallel on a model that has at-least one non-floating point dtype parameter with requires_grad=False with a WORLD_SIZE <= nGPUs/2 on the machine results in an error "Only Tensors of floating point dtype can re
parameters() is empty in forward when using DataParallel · Issue
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM
RuntimeError: Only Tensors of floating point and complex dtype can
Training on 16bit floating point - PyTorch Forums
Achieving FP32 Accuracy for INT8 Inference Using Quantization
Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First
Wrong gradients when using DistributedDataParallel and autograd
Cannot convert a MPS Tensor to float64 dtype as the MPS framework
Tensor data dtype ComplexFloat not supported for NCCL process
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM