What's in the RedPajama-Data-1T LLM training set
$ 11.50 · 5 (297) · In stock
RedPajama is “a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens”. It’s a collaboration between Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, …
Open-Sourced Training Datasets for Large Language Models (LLMs)
LLM360, A true Open Source LLM
Exploring the training data behind Stable Diffusion
Open-Sourced Training Datasets for Large Language Models (LLMs)
RedPajama Project: An Open-Source Initiative to Democratizing LLMs
The Latest Open Source LLMs and Datasets
What's in the RedPajama-Data-1T LLM training set
Easily Train a Specialized LLM: PEFT, LoRA, QLoRA, LLaMA-Adapter
RedPajama-Data-v2: An open dataset with 30 trillion tokens for
Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens