This is a simple example of how to run and fine-tune a large language model on the Wahab cluster. It uses the Mistral model from Hugging Face.
Start an interactive session on a gpu node.
salloc -c 8 -p gpu --gres gpu:1
Load the needed gpu pytorch module.
module load container_env pytorch-gpu/2.2.0
Several libraries are required for running and fine-tuning including the transformers and bitsandbytes library.
crun.pytorch-gpu -c -p ~/envs/llm
crun.pytorch-gpu -p ~/envs/llm pip install transformers accelerate bitsandbytes
crun.pytorch-gpu -p ~/envs/llm pip install trl peft # for fine tuning
LLM module will be downloaded automatically if using a Auto* class such as (AutoTokenizer/AutoModelFor*), or it can be download manually with git command:
crun.pytorch-gpu git clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Below is a python script that instantiates the model with a prompt.
# hello.py
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch
model_name = "Mistral-7B-Instruct-v0.2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
pipe = pipeline(
"text-generation",
model=model,
tokenizer = tokenizer,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Hello world! and I want to die."
while True:
sequences = pipe(
prompt,
do_sample=True,
max_new_tokens=1024,
temperature=0.7,
top_k=50,
top_p=0.95,
num_return_sequences=1,
)
print(sequences[0]['generated_text'])
prompt = input("Prompt: ")
The model is run and generates a response.
crun.pytorch-gpu -p ~/envs/llm/ python hello.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:03<00:00, 1.08s/it]
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Hello world! and I want to die. I know, I know, it's a bit extreme, but I've always wanted to say it. It's not that I want to die, it's just that I wish things were different. I wish I had a purpose, or at least a reason to keep living.
I've always felt like an outsider, like I don't belong. I've never really fit in anywhere. I've tried to make friends, but it never seems to work out. I've tried to find hobbies, but nothing really interests me. I've even tried to find a career, but I can't seem to find one that I'm passionate about.
I've always felt like I'm just going through the motions, living my life on autopilot. I get up, go to work, come home, and then do it all over again the next day. I feel like I'm just existing, not really living. And I don't know how to change that.
I've thought about ending it all, but I know that's not the answer. I just don't know what the answer is. I wish I had someone to talk to, someone who could help me figure things out. But I don't have anyone. I'm all alone in this world, and it's a very lonely place.
I guess all I can do is keep trying, keep searching for something that makes me feel alive. I don't know if I'll ever find it, but I'll keep looking. I'll keep hoping. And maybe, just maybe, I'll find something that makes it all worthwhile.
Until then, I'll just keep living, one day at a time. And I'll keep writing, because maybe, just maybe, someone out there will read my words and find comfort in knowing they're not alone.
So, if you're feeling lost, or lonely, or like you don't belong, know that you're not alone. We're all in this together, and we'll get through it, one day at a time.
And if you ever need to talk, I'll be here. I may not have all the answers, but I'll listen. And maybe, just maybe, together, we'll find the answers we're looking for.
Until next time,
Your Friendly Neighborhood Introvert
Prompt:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from trl import SFTTrainer
import torch
import peft
import datasets
model_name = "Mistral-7B-Instruct-v0.2"
fine_tuned_name = "Mistral-7B-Instruct-v0.2-mrpc-rcc"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
peft_config = peft.LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
)
model = peft.prepare_model_for_kbit_training(model)
model = peft.get_peft_model(model, peft_config)
model.config.use_cache=False
def formatting_func(example):
text = f"### Question: {example['sentence1']}\n ### Answer: {example['sentence2']}"
return [text,]
dataset = datasets.load_dataset('glue', 'mrpc', split='train')
training_arguments = TrainingArguments(
output_dir=fine_tuned_name,
num_train_epochs=10,
per_device_train_batch_size=4,
gradient_accumulation_steps=1,
optim="paged_adamw_32bit",
save_steps=25,
logging_steps=25,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False,
bf16=False,
max_grad_norm=0.3,
max_steps=-1,
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant",
report_to="none",
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant':True},
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
max_seq_length=1024,
tokenizer=tokenizer,
args=training_arguments,
formatting_func=formatting_func,
packing=False,
)
trainer.train()
trainer.model.save_pretrained(fine_tuned_name)
crun.pytorch-gpu -p ~/envs/llm/ python fine_tune.py
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.01s/it]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3668/3668 [00:00<00:00, 3713.17 examples/s]
{'train_runtime': 147.0695, 'train_samples_per_second': 0.272, 'train_steps_per_second': 0.068, 'train_loss': 1.824334716796875, 'epoch': 10.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [02:27<00:00, 14.71s/it]