We seriously need a service that is as cheap and fast as the OpenAI/Anthropic APIs but allow us to run the various community-fine-tuned versions of Mixtral and LLaMA 3 that are not/less censored.
Such services already exists. I don't want to promote any in particular, but if you do a research on pay-as-you-go inference of e.g. mixtral or llama3 you will find offerings that offer an API and charge just cents for XY amount of tokens, exactly as OpenAI does.