ML Research Engineer Internship, FineWeb - US Remote
Hugging Face
N/A
At Hugging Face, we’re on a journey to democratize good AI. We are building the fastest growing platform for AI builders with over 5 million users & 100k organizations who collectively shared over 1M models, 300k datasets & 300k apps. Our open-source libraries have more than 400k+ stars on Github.
About the Role
High-quality datasets are the foundation of strong LLMs, yet, most labs releasing state-of-the-art models are vague when it comes to the pretraining data. At Hugging Face we want to enable all the community to build the best models by building and open-sourcing the finest datasets. FineWeb and FineWeb-Edu are examples of very strong, web-scale datasets we released this year while also open-sourcing the distributed processing library datatrove.
During this internship you will work alongside the FineWeb team and build the next generation of high-quality web data, by running distributed data processing and ablating the data...