In order to manage the memory and performance of Generative AI Model  implement the following code:
 
In the code above we have used gradient checkpointing , inference mode , cache clearing and variable management. These techniques make it easier to handle large models on limited hardware.