LLM would act as the only person/thing making a refund judgement based on only u...

dragonwriter · on Dec 8, 2023

> The decision making llm has access to non-public data it will never share but will use.

Yes, if you've already solved prompt injection as this implies, using two LLMs, one of which applies the solution, will also solve prompt injection.

However, if you haven't solved prompt injection, you have to be concerned that the input to the first LLM will produce output to the second LLM that itself will contain a prompt injection that will cause the second LLM to share data that it should not.

> Running two llms can be expensive today but won't be tomorrow.

Running two LLMs doesn't solve prompt injection, though it might make it harder through security by obscurity, since any successful two-model injection needs to create the prompt injection targeting the second LLM in the output of the first.

skissane · on Dec 8, 2023

> LLM would act as the only person/thing making a refund judgement based on only user input?

> Easy answer is two LLMs.

I think the easier answer is add a human to the loop. Instead of employees having to reply to customer emails themselves... the LLM drafts a reply, which the employee then has to review, with the opportunity to modify it before sending, or choose not to send it all.

Reviewing proposed replies from an LLM is still likely to be less work than writing the reply by hand, so the employee can get through more emails than they would with manual replying. It may also have other benefits, such as a more consistent communication style.

Even if the customer commits a prompt injection attack, hopefully the employee notices it and refuses to send that reply.

simonw · on Dec 8, 2023

Yeah, human in the loop is one of the best options we have for many potential prompt injection problems at the moment.

(Aside from prompt injection, I think putting a human in the loop before sending a message for other people to read is good manners anyway.)

simonw · on Dec 7, 2023

That can work provided not a single sentence of text from an untrusted source is passed to the decision making LLM.

I call that the Dual LLM pattern: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/