Futilities/human: Difference between revisions

From Woozle Writes Code
Jump to navigation Jump to search
No edit summary
No edit summary
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{fmt/title|The Human Futilities}}
==Purpose==
==Purpose==
The Human Futilities are a set of file-oriented command-line utilities, primarily useful for handling large filesets.
The Human Futilities are a set of file-oriented command-line utilities, primarily useful for handling large filesets.
 
==Pages==
Three need-cases have been the primary drivers for developing these: (1) Woozle-compatible grep/find, (2) merging of old folderset, and (3) replacing Nextcloud's desktop client.
* {{l/sub|goals}}: what these are for
===1. Woozle Compatibility===
I find it very difficult to remember how to use the "{{l/htyp|grep}}" and "find" utilities, and the available help pages e.g. ({{fmt/code|man grep}} and {{fmt/code|grep --help}}) overwhelming. I'm not sure how the situation ended up this way, but I do know how I'd like and expect a file-finding application ({{l/sub|or any CLI utility, really}}) to behave -- so I wrote {{l/sub|ff}} to behave in that way.
 
Its main purpose is ''finding/identifying files that meet specific criteria'', and the input and output formats reflect that. If you're looking for text within the file, it won't go into a lot of detail about what it finds that matches. If you're looking for a file by date, it will show the actual timestamps that matched, for the matching files it finds. It will also optionally show its progress in a non-overwhelming way that doesn't require any pre-crawling of the folderset.
 
The ''specific'' reason I needed this utility was to find a file I knew I had created on a specific date in 2017, but could not find in my Nextcloud folderset. I didn't know exactly what I had called it, but I knew the date it would have been created and what the file extension would almost certainly be.
===2. Archive Merging===
When I found it, it was in an archive of a folderset from OwnCloud (the predecessor to Nextcloud) which had apparently never been completely merged into our current (Nextcloud) folderset. Since the contents had been rearranged since the time when we last used OwnCloud, I couldn't just merge the folders in by the usual method (using the same relative paths) without ending up with a lot of duplication and/or misplaced files. It's 651.2 GB, according to Caja, so we can't really afford to just have duplicates until we can manually sort things out. (At ~1 TB, the current Nextcloud folderset is already straining or past the limits of various devices that use it.
 
FTI, FIC, and FLC were developed as a way of accomplishing this kind of merge. By indexing foldersets, comparing them ("what's in A but missing from B"), and then being able to do a folder-relative copy on the comparison results, we can accomplish this in a series of relatively simple and transparent (and therefore debuggable) steps.
===3. Firing the Nextcloud Client===
While I was in the process of working all this out, it became apparent that the Nextcloud desktop client simply wasn't able to keep up -- and even when it is, the UI it provides is often deeply problematic. Initially, it kept crashing part way through the "checking" process (which, I'm guessing, is where it compares the local folderset to the one on the server in order to determine what needs synching from A to B and vice-versa) and always having to start over. Weeks went by when no new files on either side were being copied to the other.
 
After an upgrade of my local system (from Mint 20 to Mint 21), it appeared to start working again, and did in fact synchronize at least ''some'' files from the server, but it got stuck this time dealing with "file conflicts" which for some reason it couldn't resolve. There seemed to be two main cases:
* files with the same name, timestamp, and size
** Nextcloud does not indicate whether it checked the contents, but I have no reason to think they aren't identical. Why couldn't Nextcloud determine this, and just skip them?
* files where one version is zero bytes
** In this case, I could see maybe being a little cautious -- but I'd think a reasonable default would be to assume the file with zero bytes should be overwritten, as it's very easy to reconstruct a zero-byte file. Perhaps the sync client could write out a list of such overwrites, in case it was important to know ''which'' files were zero bytes... but this seems like a very unlikely edge-case.
 
In any case, there was a ''very large number'' of these files -- more than would fit in the non-resizable dialog box they provide -- and there are a number of issues with the UI provided for resolving the problem:
* Each file has to be examined individually, in a separate dialogue (there's no way to select a group of files and say which version (server vs. local) to use).
* The popup dialog seems to crash a lot -- comes up blank, goes into the background, is invisible...
* After approving a single file, the whole process seems to reset -- making it impossible to approve additional files until it regenerates the list. This makes the whole process very slow, to say the least.
 
I want something which will (a) not crash, (b) let me write rules for handling "conflicts", and (c) actually compare contents to see if there even really ''is'' a conflict.
 
Writing to address this need-case is still in progress, but the basic idea is that we will have FTI running on both sides (server and client), a process will download the server's latest index and use FIC to compare it to the local index, and FLC will somehow provide a means to synchronize the necessary files (there are at least a couple of different ways to do this).
===Future===
Thinking about how Nextcloud works has led me to realize its shortcomings and how overspecialized it is. I'm thinking that each piece of it can eventually be replaced by much more flexible tools.
==Commands==
* {{l/sub|ff}}: find files by mask, date, contents
* {{l/sub|fti}}: file tree index
* {{l/sub|fic}}: file index comparison
* {{l/sub|flc}}: file list copy
==Other==
* {{l/sub|lib}}: class library
* {{l/sub|lib}}: class library
* {{l/sub|options}}: some options are common to all apps
* {{l/sub|ui standards}}: user interface standards and conventions
* {{l/sub|ui standards}}: user interface standards and conventions
==Terminology==
* A '''file tree''' is a given folder and all of the files and folders inside of it.
* A '''file index''' is a collection of file-content hashes and the filepecs for the files that had those hashes at the time of indexing.
* A '''file collection''' is all the files referred to by a particular content-hash index
==Commands==
{| class="wikitable sortable"
|-
! name || seq || spider? || hash || description
|-
| {{l/sub/lc|FF}} || 0 || Y || - || find files by mask, date, contents
|-
| {{l/sub/lc|FCM}} || 3 || n || I || file collection merge
|-
| {{l/sub/lc|FIC}} || 2 || n || I || file index comparison
|-
| {{l/sub/lc|FTM}} || 0 || Y || n || file-tree mover
|-
| {{l/sub/lc|FTI}} || 1 || Y || O || file tree index
|-
| {{l/sub/lc|FTS}} || 3 || n || I (opt) || file tree sync
|}
I might end up splitting FTS into multiple parts...

Latest revision as of 13:51, 22 October 2022

The Human Futilities

Purpose

The Human Futilities are a set of file-oriented command-line utilities, primarily useful for handling large filesets.

Pages

  • goals: what these are for
  • lib: class library
  • options: some options are common to all apps
  • ui standards: user interface standards and conventions

Terminology

  • A file tree is a given folder and all of the files and folders inside of it.
  • A file index is a collection of file-content hashes and the filepecs for the files that had those hashes at the time of indexing.
  • A file collection is all the files referred to by a particular content-hash index

Commands

name seq spider? hash description
FF 0 Y - find files by mask, date, contents
FCM 3 n I file collection merge
FIC 2 n I file index comparison
FTM 0 Y n file-tree mover
FTI 1 Y O file tree index
FTS 3 n I (opt) file tree sync

I might end up splitting FTS into multiple parts...