When training machine learning models, finding the perfect combination of hyper parameters can feel overwhelming however if you do it well, it can turn a good model into a great one! Hyper parameter sweeps help you find the best performing model for the least amount of compute or time spent training - think of them as your systematic approach to testing every variation to uncover the best result. In this tutorial, we’ll walk through training Llama 3.2, using Wandb (Weights and Biases) to run hyper parameter sweeps to optimize its performance and we’ll leverage Cerebrium to scale our experiments across serverless GPUs, allowing us to find the best-performing model faster than ever. If you would like to see the final version of this tutorial, you can view it on Github here. Read this section if you’re unfamiliar with sweeps.Documentation Index
Fetch the complete documentation index at: https://cerebrium-fix-make-entrypoint-docs-explicit.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Analogy: Pizza Topping Sweep
Forget about ML for a second. Imagine you’re making pizzas, and you want to discover the most delicious combination of toppings. You can change three things about your pizza: • Type of Cheese (mozzarella, cheddar, parmesan) • Type of Sauce (tomato, pesto) • Extra Topping (pepperoni, mushrooms, olives) There are 12 possible combinations of pizzas you can make. One of them will taste the best! To find out which pizza is the tastiest, you need to try all the combinations and rate them. This process is called a hyperparameter sweep. Your three hyperparameters are the cheese, sauce, and extra topping. If you do it one pizza at a time, it could take hours. But if you had 12 ovens, you could bake all the pizzas at once and find the best one in just a few minutes! If a kitchen is a GPU, then you need 12 GPUs to run each experiment to see which cookie is the best. The power of Cerebrium is the ability to run sweeps like this on 12 different GPUs (or 1,000 GPUs if you’d like) to get you the best version of a model fast.Setup Cerebrium
If you don’t have a Cerebrium account, you can run the following in your cli:- main.py - Our entrypoint file where our code lives.
- cerebrium.toml - A configuration file that contains all our build and environment settings Add the following pip packages near the bottom of your cerebrium.toml. This will be used in creating our deployment environment.
Setup Wandb
Weights & Biases (Wandb) is a powerful tool for tracking, visualizing, and managing machine learning experiments in real-time. It helps you log hyperparameters, metrics, and results, making it easy to compare models, optimize performance, and collaborate effectively with your team.- Sign up for a free account and then log in to your wandb account by running the following in your CLI.
- Key: WANDB_API_KEY
- Value: The value you copied from the Wandb website.

Training Script
To train with Llama 3.2, you’ll need:-
Model access permission:
- Visit the Llama 3.2 model page on Hugging Face
- Accept all permissions
-
Hugging Face token:
- Click your profile image (top right)
- Select “Access token”
- Create a new token if needed
- Add to Cerebrium Secrets:
- Key:
HF_TOKEN - Value: Your Hugging Face token
- Key:
- Click “Save All Changes”

requirements.txt file with these dependencies:
cerebrium.toml to include:
- The requirements.txt path
- Hardware requirements for training
- A 1-hour max timeout using
response_grace_period
main.py:
- This code sets up a fine-tuning pipeline for a Large Language Model (specifically Llama 3.2) using several modern training techniques:
- The function takes a dictionary of parameters for flexibility in training configurations - this is our hyper parameter sweep.
- We load a customer support dataset from Hugging Face and format the data into a chat template format
- We implement QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning.
- We use Weights & Biases (wandb) for experiment tracking logging results to our Wandb dashboard as they are available.
- At the end, we saves the final model to our Cerebrium volume and return a “success” message to show that the training was successful.
- Sets up the environment with required packages
- Deploys the training script as an endpoint
- Returns a POST URL (save this for later)
Hyperparameter Sweep
Let us create a run.py file that we will use to run locally. Put the following code in there:- Create a .env file and add your Inference API key from your Cerebrium Dashboard.
- Update the Cerebrium endpoint based on your project ID and the function name your wish to call. You will see we append this url with “?async=true”. This means its a fire-and-forgot request that can run up to 12 hours. You can read more here.
- We then define a Bayesian optimization sweep configuration that will search through different hyperparameters including:
- Learning rate (log uniform distribution between ~4.54e-5 and ~9.12e-4)
- Batch size (1, 2, or 4)
- Gradient accumulation steps (2, 4, or 8)
- LoRA parameters (r, alpha, and dropout)
- Maximum sequence length (512 or 1024)
- We create this sweep in the “Llama-3.2-Customer-Support” W&B project
- For each sweep iteration:
- We initialize a new W&B run
- Combines the sweep’s hyperparameters with fixed parameters (like model name and dataset)
- Sends the parameters to a Cerebrium endpoint for training that happens asynchronously.
- Logs the results back to W&B
- Run these combinations across 10 experiments (10 concurrent GPU’s is the limit on Cerebrium’s Hobby plan)


Next Steps
-
Export model:
- Copy to AWS S3 using Boto3
- Download locally using Cerebrium Python package
-
Quality assurance:
- Run CI/CD tests on model outputs
- Use Cerebrium’s webhook functionality
-
Deployment:
- Create inference endpoint
- Load model directly from Cerebrium volume