1. Python backend
2. Web-based frontend
3. Integration with Microsoft's Speech Software Development Kit (SDK)
4. POC is streaming two minutes of text and then displaying the output as it comes in within the browser
If the architecture can be simplified compared to what has been provided in the image that is fine.
Looking to spend no more than $150.