Looks really cool. I'm sad that I'll never know; Intel + Linux + non-VM excludes literally every computer I have access to. We're all AMD + Windows, and my only access to Intel Linux machines would be a cloud VM.
There's also the regular performance traces you can capture with wpr and friends. I don't think these provide function-level traces, and I also don't think it's possible to do that (but I could be wrong). You just get sampled callstacks, which may or may not be enough for your needs.
In my experience on Windows you need to instrument applications to get function-level tracing.
Also, KVM, VMWare (and maybe Xen?) allow PMU access too (but not VirtualBox). Both VMWare ESXi and VMware Server/Workstation/Fusion allow PMU access, you just need to make sure it's enabled in the settings.
A quick way to check if PMU access is enabled is this:
dmesg | grep "Performance Events"
Edit: Oh, unfortunately VMware Fusion 12 (on Mac OSX Intel) does not expose the performance counters to the VM anymore, as it's using the OSX Hypervisor Framework instead of its own kernel module.