2022/07/21/FutilityCloud process design

From Woozle Writes Code
Revision as of 13:07, 24 September 2022 by Woozle (talk | contribs) (→‎Overall)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Codeblog

Getting back to work on the Human Futillities Nexcloud-client replacement functionality...

Thinking this over last night, it is a bit of a complex problem -- or at least there are issues which may come up that require complex solutions.

I think it can be divided up into layers of complexity, where each layer deals with what it can resolve and then defers the stuff needing more analysis for the next layer to handle.

Layers

From simpler to more complex:

Layer 1 just does a straight 1:1 folder-tree comparison. Where the same file exists in the same folder, it checks to see if they are identical; if they are not, it keeps the newer one (and archives the older one). Where they are identical but with different timestamps, it keeps the older timestamp.

Where a file is missing from one, we have to look deeper -- is it missing because it's new, or because it was recently (intentionally) deleted, or because it's been moved? You can't get a timestamp for a file that isn't there. So we defer those to Layer 2.

Layer 2 looks at a recent tree-index (FTI) to see if the missing file has just been moved somewhere. If it has, though, how do we decide which repo has the newer information? We could look at the timestamp on each folder, but if both folders contain more than one change, that's inconclusive. In the edge-case where there's at least one folder that has no other changes in it and existed both before and after, we could let that timestamp determine which is more recent. I don't know if this will happen very often.

For more certainty, though, we will need...

Layer 3 attempts to keep a log of changes as they occur on each end, hopefully through using a filesystem hook to receive notifications when files are changed or moved. Where the log is discontinuous, we'll have to defer back to Layer 2 methods. For timespans where the log is available on both ends, though, we can see what the actual sequence of events was, and make sure it's replicated on both ends. (There will be cases where the two sides contradict each other, but those should be rare... we'll treat that as another possible layer.

Deferral: Ideally, we'll have a decent UI for manually resolving ambiguity at any layer. It should also be possible to disable automatic resolution, where it exists, and relatively easy to modify the heuristics for automatic resolution.

Overall

This lets me build the functionality incrementally. I can start with the simplest layer, have it write a list of files with issues needing next-layer attention, and temporarily use manual resolution to deal with those (they should be only a small percentage of the total, I think -- but even if not, this lets me deal with the backlog of new files on each end) -- and then eventually automate that layer and move on to the next.