Inference speed

#5
by omarabb315 - opened

I tried to run the model on T4 gpu but couldn’t get the output in less than 0.5 seconds even though the output tokens is short (about 20 to 30 tokens)

Sign up or log in to comment