LLM would act as the only person/thing making a refund judgement based on only user input?
Easy answer is two LLMs. One that takes input from user and one that makes the decisions. The decision making llm is told the trust level of first LLM (are they verified / logged in / guest) and filters accordingly. The decision making llm has access to non-public data it will never share but will use.
Running two llms can be expensive today but won't be tomorrow.
> The decision making llm has access to non-public data it will never share but will use.
Yes, if you've already solved prompt injection as this implies, using two LLMs, one of which applies the solution, will also solve prompt injection.
However, if you haven't solved prompt injection, you have to be concerned that the input to the first LLM will produce output to the second LLM that itself will contain a prompt injection that will cause the second LLM to share data that it should not.
> Running two llms can be expensive today but won't be tomorrow.
Running two LLMs doesn't solve prompt injection, though it might make it harder through security by obscurity, since any successful two-model injection needs to create the prompt injection targeting the second LLM in the output of the first.
> LLM would act as the only person/thing making a refund judgement based on only user input?
> Easy answer is two LLMs.
I think the easier answer is add a human to the loop. Instead of employees having to reply to customer emails themselves... the LLM drafts a reply, which the employee then has to review, with the opportunity to modify it before sending, or choose not to send it all.
Reviewing proposed replies from an LLM is still likely to be less work than writing the reply by hand, so the employee can get through more emails than they would with manual replying. It may also have other benefits, such as a more consistent communication style.
Even if the customer commits a prompt injection attack, hopefully the employee notices it and refuses to send that reply.
Easy answer is two LLMs. One that takes input from user and one that makes the decisions. The decision making llm is told the trust level of first LLM (are they verified / logged in / guest) and filters accordingly. The decision making llm has access to non-public data it will never share but will use.
Running two llms can be expensive today but won't be tomorrow.