There is another major architectural issue that I would like to draw special attention - Requested resources does not allocated before provisioning and quota can be exeeded during requests with some short difference in time.
Quotas are fixed only after provisioning of resources. In this case a large number of simultaneous requests can significantly exceed the allocated quotas. A relatively small number of users (i.e. one hundred) simultaneously or with some short difference in time can execute a request to provision one small virtual machine.
The quota validation workflow checks each request individually and each small request passes a quota check, although all requests in aggregate can significantly exceed the limits.
How to reproduce:
I can say that to reproduce this problem you can use two users. Each user executes the provisioning request. Second user should send a request before virtual machines of first user are provisioned.