Yes you'll be able to run a lightweight Wit runtime locally very soon. Learning will still happen on the server. The embedded client will upload its usage data to feed training, and download updated models (for both speech and natural language understanding).
Any chance for those of us with lots of spare CPU and memory resources lying around to get ahold of a standalone server package? This out of an interest of all of not relying on third party services, a compulsion for DIY, and a slight completely illogical unease with sending personal training datasets to a potentially untrusted source.