Machine Learning Engineer Internship, Quantization - US Remote
Hugging Face
N/A
At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.
About the Role
Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer (int8) instead of the usual 32-bit floating point (float32). It is a very promising technique as it allows to run and fine-tune on consumer-grade hardware LLMs with minimal performance degradation.
This internship works at the intersections of software engineering, machine learning engineering, and education. The focus will be to integrate new quantization methods in Hugging Face ecosystem (transformers, accelerate, peft, diffusers),...