Because instruction tuning trains a model to follow instructions rather than just replicate answers from a particular dataset, it frequently performs better than typical supervised fine-tuning.
Using samples from a specific job, a model learns to map inputs to outputs through supervised fine-tuning. The model may find it difficult to generalize when the format, language, or context changes, even though this can enhance performance on that task. It becomes extremely well-suited to the training distribution.
Instruction tweaking is more comprehensive. The model is trained on a wide range of tasks that are given to it as natural language instructions. It learns the fundamental ability of comprehending user intent and reacting properly, rather than just task-specific patterns. This aids in the model's ability to adjust to novel tasks.
For instance, a supervised-tuned model that was simply trained on sentiment analysis would function effectively in the same format. However, because an instruction-tuned model has learned to read instructions rather than memorize task formats, it can frequently handle variations including summarization, classification, extraction, translation, or question answering.
Improved few-shot and zero-shot performance is another benefit. The model can generalize more successfully with few or no more examples because it has practiced obeying a variety of directions during training.