2022/07/21/FutilityCloud process design: Difference between revisions
(Created page with "{{nav/codeblog}} Getting back to work on the Human Futillities Nexcloud-client replacement functionality... Thinking this over last night, it ''is'' a bi...") |
(→Layers) |
||
Line 10: | Line 10: | ||
'''Layer 1''' just does a straight 1:1 folder-tree comparison. Where the same file exists in the same folder, it checks to see if they are identical; if they are not, it keeps the newer one (and archives the older one). Where they are identical but with different timestamps, it keeps the older timestamp. | '''Layer 1''' just does a straight 1:1 folder-tree comparison. Where the same file exists in the same folder, it checks to see if they are identical; if they are not, it keeps the newer one (and archives the older one). Where they are identical but with different timestamps, it keeps the older timestamp. | ||
Where a file is missing from one, we have to look deeper -- | Where a file is missing from one, we have to look deeper -- is it missing because it's new, or because it was recently (intentionally) deleted, or because it's been moved? You can't get a timestamp for a file that isn't there. So we defer those to Layer 2. | ||
'''Layer 2''' looks at a recent tree-index (FTI) to see if the missing file has just been moved somewhere. If it has, though, how do we decide which repo has the newer information? We could look at the timestamp on each folder, but if both folders contain more than one change, that's inconclusive. In the edge-case where there's at least one folder that has no other changes in it and existed both before and after, we could let that timestamp determine which is more recent. I don't know if this will happen very often. | '''Layer 2''' looks at a recent tree-index (FTI) to see if the missing file has just been moved somewhere. If it has, though, how do we decide which repo has the newer information? We could look at the timestamp on each folder, but if both folders contain more than one change, that's inconclusive. In the edge-case where there's at least one folder that has no other changes in it and existed both before and after, we could let that timestamp determine which is more recent. I don't know if this will happen very often. | ||
Line 16: | Line 16: | ||
For more certainty, though, we will need... | For more certainty, though, we will need... | ||
'''Layer 3''' attempts to keep a log of changes as they occur on each end, hopefully through using a filesystem hook to receive notifications when files are changed or moved. Where the log is discontinuous, we'll have to defer back to Layer 2 methods. For timespans where the log is available on both ends, though, we can see what the actual sequence of events was, and make sure it's replicated on both ends. (There will be cases where the two sides contradict each other, but those should be rare | '''Layer 3''' attempts to keep a log of changes as they occur on each end, hopefully through using a filesystem hook to receive notifications when files are changed or moved. Where the log is discontinuous, we'll have to defer back to Layer 2 methods. For timespans where the log is available on both ends, though, we can see what the actual sequence of events was, and make sure it's replicated on both ends. (There will be cases where the two sides contradict each other, but those should be rare... we'll treat that as another possible layer. | ||
'''Deferral''': Ideally, we'll have a decent UI for manually resolving ambiguity at ''any'' layer. It should also be possible to disable automatic resolution, where it exists, and relatively easy to modify the heuristics for automatic resolution. | |||
==Overall== | ==Overall== | ||
This lets me build the functionality incrementally. I can start with the simplest piece, have it write a list of files with issues needing next-layer attention, and temporarily use manual resolution to deal with those (they should be only a small percentage of the total, I think -- but even if not, this lets me deal with the backlog of new files on each end). | This lets me build the functionality incrementally. I can start with the simplest piece, have it write a list of files with issues needing next-layer attention, and temporarily use manual resolution to deal with those (they should be only a small percentage of the total, I think -- but even if not, this lets me deal with the backlog of new files on each end). |
Revision as of 21:51, 21 September 2022
Getting back to work on the Human Futillities Nexcloud-client replacement functionality...
Thinking this over last night, it is a bit of a complex problem -- or at least there are issues which may come up that require complex solutions.
I think it can be divided up into layers of complexity, where each layer deals with what it can resolve and then defers the stuff needing more analysis for the next layer to handle.
Layers
From simpler to more complex:
Layer 1 just does a straight 1:1 folder-tree comparison. Where the same file exists in the same folder, it checks to see if they are identical; if they are not, it keeps the newer one (and archives the older one). Where they are identical but with different timestamps, it keeps the older timestamp.
Where a file is missing from one, we have to look deeper -- is it missing because it's new, or because it was recently (intentionally) deleted, or because it's been moved? You can't get a timestamp for a file that isn't there. So we defer those to Layer 2.
Layer 2 looks at a recent tree-index (FTI) to see if the missing file has just been moved somewhere. If it has, though, how do we decide which repo has the newer information? We could look at the timestamp on each folder, but if both folders contain more than one change, that's inconclusive. In the edge-case where there's at least one folder that has no other changes in it and existed both before and after, we could let that timestamp determine which is more recent. I don't know if this will happen very often.
For more certainty, though, we will need...
Layer 3 attempts to keep a log of changes as they occur on each end, hopefully through using a filesystem hook to receive notifications when files are changed or moved. Where the log is discontinuous, we'll have to defer back to Layer 2 methods. For timespans where the log is available on both ends, though, we can see what the actual sequence of events was, and make sure it's replicated on both ends. (There will be cases where the two sides contradict each other, but those should be rare... we'll treat that as another possible layer.
Deferral: Ideally, we'll have a decent UI for manually resolving ambiguity at any layer. It should also be possible to disable automatic resolution, where it exists, and relatively easy to modify the heuristics for automatic resolution.
Overall
This lets me build the functionality incrementally. I can start with the simplest piece, have it write a list of files with issues needing next-layer attention, and temporarily use manual resolution to deal with those (they should be only a small percentage of the total, I think -- but even if not, this lets me deal with the backlog of new files on each end).