I was able to serve deep learning models on 4 2080TI GPUs based on django, gunicorn and Nginx. The majority of the latency is around 200ms, but several requests takes over than 2s to finish. It happens occasionally and is hard to reproduce under some specific setting. How to fix this problem?
BTW, the QPS is just 1~2, so it's not result from busy GPU/CPU usage.
Aucun commentaire:
Enregistrer un commentaire