Learn to use pdb, the command line Python debugger, to debug. Its useful when VS Code debugger does not work. Also it allows you to capture stack traces and other information as text which is useful for sharing in forums etc. when you are asking for help.
Common pdb commands:
b filename:linenowill set breakpointcwill continuep variablewill print variableqto quitbtorwto print the stacktrace. Python’s stacktrace is reversed. The topmost line in the stacktrace is the frame at bottom of the callstack.uordto move up or down the callstack. Python convention is opposite souactually moves down the callstack (but is consistent withbtoutput).
Sample debug session
Below is output of sample debug session to illustrate how to use pdb. It actually illustrates two things: how to use pdb and what does per_device_train_batch_size do. The source code comment is really helpful (sarcastic): The batch size per GPU/TPU core/CPU for training.
Objective: find out what per_device_train_batch_size does.
Hypothesis: when we train a model, we pass it a training dataset. think of it as a dataframe that contains rows which are examples used to train. these are essentially the data points to which
we are trying to fit the model. The hypothesis is that per_device_train_batch_size selects M out of N data points in each iteration of the forward/backward pass where N is total
number of rows (examples) in the dataset and M=per_device_train_batch_size.
How will we test the hypothesis: the hypothesis can be tested by sticking a breakpoint in /transformers/models/gpt2/modeling_gpt2.py:770
and monitoring the size of input_ids as per_device_train_batch_size is varied. If you are using a different model, then stick breakpoint accordingly.
Begin by launching a pdb session:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python \
python3 -m pdb fine-tune.py \
--model gpt2 \
--file-type jsonl \
--file quotes.jsonl \
--data-key quote \
--per-device-train-batch-size 1 \
--gradient-accumulation-steps 1 \
--debug
We don’t list the source code for fine-tune.py but it is basically doing the same thing as this. Running above command gets us to:
> /Users/siddjain/llm/fine-tune.py(2)<module>()
-> import torch
(Pdb)
Now stick breakpoint using b:
(Pdb) b /Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py:770
let the program run (continue) by typing c. it will break at:
(Pdb) c
...
> /Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py(770)forward()
-> output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
(Pdb)
to see the callstack we can type w:
(Pdb) w
/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/bdb.py(580)run()
-> exec(cmd, globals, locals)
<string>(1)<module>()
/Users/siddjain/llm/fine-tune.py(154)<module>()
-> fine_tune(trainer)
/Users/siddjain/llm/fine-tune.py(149)fine_tune()
-> trainer.train()
/Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/trainer.py(1536)train()
-> return inner_training_loop(
/Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/trainer.py(1801)_inner_training_loop()
-> tr_loss_step = self.training_step(model, inputs)
/Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/trainer.py(2646)training_step()
-> loss = self.compute_loss(model, inputs)
/Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/trainer.py(2671)compute_loss()
-> outputs = model(**inputs)
/Users/siddjain/llm/.env/lib/python3.9/site-packages/torch/nn/modules/module.py(1501)_call_impl()
-> return forward_call(*args, **kwargs)
/Users/siddjain/llm/.env/lib/python3.9/site-packages/peft/peft_model.py(849)forward()
-> return self.base_model(
/Users/siddjain/llm/.env/lib/python3.9/site-packages/torch/nn/modules/module.py(1501)_call_impl()
-> return forward_call(*args, **kwargs)
/Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py(1076)forward()
-> transformer_outputs = self.transformer(
/Users/siddjain/llm/.env/lib/python3.9/site-packages/torch/nn/modules/module.py(1501)_call_impl()
-> return forward_call(*args, **kwargs)
> /Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py(770)forward()
-> output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
(Pdb)
The callstack is reversed. That’s how Python displays it.
In VS Code you would see /Users/siddjain/llm/.env/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py(770)forward() at the top of the callstack.
Let’s inspect the size of input_ids using p:
(Pdb) p input_ids.size()
torch.Size([1, 14])
This matches our hypothesis. We should also check how many examples are there in our training dataset. To do that we have to go “up” the callstack. we are using Python terminology here which is opposite of actual. In reality we are going down the callstack:
(Pdb) u 10
> /Users/siddjain/llm/fine-tune.py(149)fine_tune()
-> trainer.train()
Now lets see how many training examples we have:
(Pdb) p data
Dataset({
features: ['text', 'input_ids', 'attention_mask'],
num_rows: 2508
})
Good. These are the number of quotes in the training data. You can verify it here.
We will abort the program now (by typing q) and re-run with a different value of per-device-train-batch-size and repat the steps.
we leave this as exercise but I was able to verify that changing the per-device-train-batch-size from 1 to 10 and then 100 changed the input_ids.size() to 10 and 100
respectively. That confirms our hypothesis and we learned how to use pdb along the way!