It's always bugged me that AF_UNIX addresses have a relatively short length limit. If you look at the actual kernel ABI, you'll see that the sun_addr structure can be made as long as necessary. The kernel could support long unix socket addresses without new APIs.
I also don't like abstract sockets at all. They're vulnerable to squatting attacks. The nice thing about non-abstract unix sockets is that they take advantage of the filesystem permissions that everyone already understands. When possible, I prefer putting things into a single namespace.
Unix sockets (of the non-abstract sort) also need a way to atomically unlink an existing socket and bind in its place. Doing that as two operations is pretty common, but it's racy.
> It's always bugged me that AF_UNIX addresses have a relatively short length limit.
I agree, it's not pretty. Some unixes have an extremely short limit, so short you can't really use absolute paths portably unless in somewhere like /tmp.
But there is a workaround.
The path doesn't have to be absolute, so if you chdir(2) first and use a relative name with bind() and connect(); that gets you a socket in any directory.
(See also bindat() and connectat() in FreeBSD).
If you don't want to chdir(), you can make a temporary symbolic link in /tmp to your directory of choice for bind(), and directly to the socket for connect().
If the last path component is too long for bind(), renaming afterwards may work (or hard linking on the same filesystem then unlinking).
All the above assumes that AF_UNIX sockets are indexed by the inode on the filesystem, so different paths, rename etc work. I'm not sure if that is true on every old and weird kernel, if you're going for high portability.
> Unix sockets (of the non-abstract sort) also need a way to atomically unlink an existing socket and bind in its place. Doing that as two operations is pretty common, but it's racy.
To atomically unlink an existing socket and bind another in place, without races, you should be able to first bind a new socket to a temporary name in the same directory, and then use rename(2) to atomically replace the existing name.
There may be queued connections that need accept(2) in the old socket after the rename, but that's not a race condition.
My complaint wasn't detailed enough, What I really want is a way to atomically rebind a socket if nobody else is listening on that socket --- the stale process cleanup case. I want the bind to fail of a socket is live. Does that make sense? Haven't had enough coffee yet today.
It makes sense but I don't think it actually gains you anything because of a subtlety around races.
If you want the rebind to fail only when the socket is bound by another process, it may be the other process is about to close the socket, and your atomic-rebind would result in the socket not bound by either process.
That outcome is identical to the outcome from the available method, where you attempt connect() first and only bind-and-rename if the connect() fails with an appropriate error. When there's no race, you always get the desired outcome of the socket bound by exactly one process, but if the processes are racing, the result can be the socket not bound by either process.
Because both methods produce the same outcome in the race case, and the only difference is an unobservable difference in each process' logical clocks (you can't tell which events really happened first), both methods actually have the same race properties.
If you want to always end up with exactly one server running, and to always avoid taking away the socket from an existing server which is running just fine, I think you need to involve some protocol. You can try connect() and then if it succeeds, send a request "are you shutting down?". An appropriate type of connect() error or "yes" means it's safe to bind a new socket and rename over, otherwise leave the socket alone.
Abstract sockets sound, to me, kind of like an attempt to retrofit microkernel-like ports back onto UNIX domain sockets for the lack of a better mechanism. It might be worthwhile to consider giving up on that idea entirely and introduce a new IPC primitive (that must be compatible with D-Bus for it to find any kind of widespread adoption).
You inherit all the weirdness from UNIX domain sockets, such as having the semantics of SOCK_STREAM, SOCK_DGRAM or SOCK_SEQPACKET (or finding out that your OS of choice doesn't support the one you wanted) depending on which one you picked.
Abstract sockets have no permissions. Not having to deal with the filesystem at all is nice, but having absolutely no security model whatsoever also seems dangerous.
You have the ugly issue of having to deal with handling zeroes in your name, as noted in the article. This can and will break something, somewhere on the first attempt of writing it.
When you want microkernel IPC, you usually tend to want a message-oriented reply-and-receive primitive to main loop around them. Emulating this with sockets is error-prone since send(2) and recv(2) make extremely weak promises about their behavior on error. sendmsg(2) and recvmsg(2), which are necessary to pass kernel objects around (i.e. file descriptors), are very difficult to use.
Sockets are nice because you can select(2) etc. on them, however. I'd expect a replacement would interoperate smoothly with at least those system calls.
All of those communication modes are actually useful. Seqpacket works especially well. You can build whatever high level facilities you want on top of them: I have several times. Ease of use at the raw system call level is not relevant: application developers shouldn't be working at that level anyway.
If you want object IPC with a reply and receive operation, you can use Android's binder, which is already in mainline.
> you can use Android's binder, which is already in mainline
This sounds interesting, and as a regular Linux dev, I’ve never heard of it. Is there a tutorial for this that doesn’t assume you’re coming from an Android background?
> It's always bugged me that AF_UNIX addresses have a relatively short length limit. If you look at the actual kernel ABI, you'll see that the sun_addr structure can be made as long as necessary. The kernel could support long unix socket addresses without new APIs.
There is no limit except the typical file path limit. The kernel does support longer paths. It will obey the third, socklen_t, parameter to both connect(2) and bind(2). The size of .sun_path from the kernel's perspective extends to the end of the sockaddr structure declared by the socklen_t parameter.
This applies not only to Linux but all Unix-like systems except, IIRC, Minix.
The flip side is that you have to check the return values of accept(2), getsockname(2), and getpeername(2) for truncation. The full path of the socket may not fit in the size of the structure you pass.
I also don't like abstract sockets at all. They're vulnerable to squatting attacks. The nice thing about non-abstract unix sockets is that they take advantage of the filesystem permissions that everyone already understands. When possible, I prefer putting things into a single namespace.
Unix sockets (of the non-abstract sort) also need a way to atomically unlink an existing socket and bind in its place. Doing that as two operations is pretty common, but it's racy.