kenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 1 year agoLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.coexternal-linkmessage-square0linkfedilinkarrow-up11arrow-down10cross-posted to: hackernews@derp.foo
arrow-up11arrow-down1external-linkLLM in a flash: Efficient Large Language Model Inference with Limited Memoryhuggingface.cokenna@lemmy.dbzer0.comM to human centered computing@lemmy.dbzer0.com · 1 year agomessage-square0linkfedilinkcross-posted to: hackernews@derp.foo