Or how about the opposite? Give me a CLI tool to pipe implicitly-tabular space-padded text into — a smart cut(1) — where I can say "give me column 3" and it understands how to analyze the document as a whole (or at least a running sample of a dozen lines or so), to model the correct column boundaries, to extract the contents of that column. (Which would also include trimming off any space-padding from the content. I want the data, not a fixed-width field containing it!)
For that matter, give me a CLI tool that takes in an entire such table, and lets me say "give me rows 4-6 of column Foo" — and it reads the table's header (even through fancy box-drawing line-art) to determine which column is Foo, ignores any horizontal dividing lines, etc.
I'm not sure whether these tasks actually require full-on ML — probably just a pile of heuristics would work. Anything would be better than the low-level tools we have today.
For that matter, give me a CLI tool that takes in an entire such table, and lets me say "give me rows 4-6 of column Foo" — and it reads the table's header (even through fancy box-drawing line-art) to determine which column is Foo, ignores any horizontal dividing lines, etc.
I'm not sure whether these tasks actually require full-on ML — probably just a pile of heuristics would work. Anything would be better than the low-level tools we have today.