Futilities/human
The Human Futilities
|
Purpose
The Human Futilities are a set of file-oriented command-line utilities, primarily useful for handling large filesets.
Three need-cases have been the primary drivers for developing these: (1) Woozle-compatible grep/find, (2) merging of old folderset, and (3) replacing Nextcloud's desktop client.
1. Woozle Compatibility
I find it very difficult to remember how to use the "grep" and "find" utilities, and the available help pages (e.g. «man grep» and «grep --help») overwhelming. I'm not sure how the situation ended up this way, but I do know how I'd like and expect a file-finding application (or any CLI utility, really) to behave -- so I wrote FF to behave in that way.
Its main purpose is finding/identifying files that meet specific criteria, and the input and output formats reflect that. If you're looking for text within the file, it won't go into a lot of detail about what it finds that matches. If you're looking for a file by date, it will show the actual timestamps that matched, for the matching files it finds. It will also optionally show its progress in a non-overwhelming way that doesn't require any pre-crawling of the folderset.
The specific reason I needed this utility was to find a file I knew I had created on a specific date in 2017, but could not find in my Nextcloud folderset. I didn't know exactly what I had called it, but I knew the date it would have been created and what the file extension would almost certainly be.
2. Archive Merging
When I found it, it was in an archive of a folderset from OwnCloud (the predecessor to Nextcloud) which had apparently never been completely merged into our current (Nextcloud) folderset. Since the contents had been rearranged since the time when we last used OwnCloud, I couldn't just merge the folders in by the usual method (using the same relative paths) without ending up with a lot of duplication and/or misplaced files. It's 651.2 GB, according to Caja, so we can't really afford to just have duplicates until we can manually sort things out. (At ~1 TB, the current Nextcloud folderset is already straining or past the limits of various devices that use it.
FTI, FIC, and FCC were developed as a way of accomplishing this kind of merge. By indexing foldersets, comparing them ("what's in A but missing from B"), and then being able to do a folder-relative copy on the comparison results, we can accomplish this in a series of relatively simple and transparent (and therefore debuggable) steps.
3. Firing the Nextcloud Client
While I was in the process of working all this out, it became apparent that the Nextcloud desktop client simply wasn't able to keep up -- and even when it is, the UI it provides is often deeply problematic. Initially, it kept crashing part way through the "checking" process (which, I'm guessing, is where it compares the local folderset to the one on the server in order to determine what needs synching from A to B and vice-versa) and always having to start over. Weeks went by when no new files on either side were being copied to the other.
After an upgrade of my local system (from Mint 20 to Mint 21), it appeared to start working again, and did in fact synchronize at least some files from the server, but it got stuck this time dealing with "file conflicts" which for some reason it couldn't resolve. There seemed to be two main cases:
- files with the same name, timestamp, and size
- Nextcloud does not indicate whether it checked the contents, but I have no reason to think they aren't identical. Why couldn't Nextcloud determine this, and just skip them?
- files where one version is zero bytes
- In this case, I could see maybe being a little cautious -- but I'd think a reasonable default would be to assume the file with zero bytes should be overwritten, as it's very easy to reconstruct a zero-byte file. Perhaps the sync client could write out a list of such overwrites, in case it was important to know which files were zero bytes... but this seems like a very unlikely edge-case.
In any case, there was a very large number of these files -- more than would fit in the non-resizable dialog box they provide -- and there are a number of issues with the UI provided for resolving the problem:
- Each file has to be examined individually, in a separate dialogue (there's no way to select a group of files and say which version (server vs. local) to use).
- The popup dialog seems to crash a lot -- comes up blank, goes into the background, is invisible...
- After approving a single file, the whole process seems to reset -- making it impossible to approve additional files until it regenerates the list. This makes the whole process very slow, to say the least.
I want something which will (a) not crash, (b) let me write rules for handling "conflicts", and (c) actually compare contents to see if there even really is a conflict.
Writing to address this need-case is still in progress, but the basic idea is that we will have FTI running on both sides (server and client), a process will download the server's latest index and use FIC to compare it to the local index, and FCC will somehow provide a means to synchronize the necessary files (there are at least a couple of different ways to do this).
Future
Thinking about how Nextcloud works has led me to realize its shortcomings and how overspecialized it is. I'm thinking that each piece of it can eventually be replaced by much more flexible tools.
Commands
- FF: find files by mask, date, contents
- FCC: file collection copy
- FIC: file index comparison
- FTI: file tree index
- FTS: file tree sync
Other
- lib: class library
- ui standards: user interface standards and conventions