You can evaluate the quality of generated outputs by following  techniques:
- BLUE Metric Score[14] : Used for code Generated outputs
 
- ROGUE Score : Used to evaluate quality of  text summarizer generated.
 
Here is the code reference:

Note that you can use  ROUGE Score , perplexity and human aligned metrics like coherence , sentiments or relevance to content.
Hence by following these techniques you can you can evaluate the quality of generated output.