You can reduce latency for real time applications using language models like GPT-3/4 by referring to the following: 

To reduce latency in the above we are using the following:
- Batching
 
- Quantization
 
- Hardware Optimization
 
Hence by using these techniques you can reduce latency in the real-time applications.
Related Post: GPT models in real-time applications