Futilities/human/fic: Difference between revisions

From Woozle Writes Code
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
{{fmt/title|FIC: File Index Comparator}}
{{fmt/title|FIC: File Index Comparator}}
==About==
==About==
===old description===
This looks at a list of file-tree indexes (in the format that {{l/same/lc|FTI}} outputs) and outputs a data-file which lists:
This is a pretty simple utility, so not a lot of options. It takes two file-tree indexes (in the format that {{l/same/lc|FTI}} outputs) and produces a simple text listing of all files listed in the first index that are not present in the second index.
* all the indexes inspected; for each index, it lists:
 
** an internally-assigned shortcode (A, B, etc.)
This file does not have any structures that need parsing (as JSON or XML would), so it can be processed in line by line rather than having to read the entire file into memory before processing. This design decision may or may not turn out to be a good idea; the current design of FIC requires the entire list to be held in memory before outputting ''anyway'', and not preprocessing it as input means we can't estimate percentage complete -- but if that becomes important, it's easy enough switch to a read-to-memory-and-process mode.
** the index's filespec
** the path that was examined to build the index
* all the hashes found; for each hash, it lists:
** the shortname of the index containing it
** the paths of the matching files
===Command===
This uses the [[../options/spider|standard spidering options]].
==History==
==History==
* '''2022-10-13''' It seems to be spidering only a portion of the tree, so I'm adding some [[../options/spider|options]] to log what it spidered (which should also be useful in other contexts).
* '''2022-10-07''' Revising how this works -- I think what I want is: look at two file-indexes ({{l/same/lc|FTI}} output), and save (in JSON) the following: (1) hashes found in both, (2) hashes found in A but not B, (3) hashes found in B but not A. Each hash should list the filespec(s) where it was found. Since we're showing the results of a cross-comparison, it gets too complicated to compare more than two indexes (a feature I've not ended up using, regardless), so we'll now only allow 2 files (model and compare)... but let's change the names to A and B, since all calculations are now symmetrical.
* '''2022-10-07''' Revising how this works -- I think what I want is: look at two file-indexes ({{l/same/lc|FTI}} output), and save (in JSON) the following: (1) hashes found in both, (2) hashes found in A but not B, (3) hashes found in B but not A. Each hash should list the filespec(s) where it was found. Since we're showing the results of a cross-comparison, it gets too complicated to compare more than two indexes (a feature I've not ended up using, regardless), so we'll now only allow 2 files (model and compare)... but let's change the names to A and B, since all calculations are now symmetrical.
** ...although now that I think about it, the output could be organized by hash, with each hash having a list of the places it was found (with filespecs). Hrm. Maybe wait for a need-case...
** ...although now that I think about it, the output could be organized by hash, with each hash having a list of the places it was found (with filespecs). Hrm. Maybe wait for a need-case...

Latest revision as of 19:15, 13 October 2022

FIC: File Index Comparator

About

This looks at a list of file-tree indexes (in the format that FTI outputs) and outputs a data-file which lists:

  • all the indexes inspected; for each index, it lists:
    • an internally-assigned shortcode (A, B, etc.)
    • the index's filespec
    • the path that was examined to build the index
  • all the hashes found; for each hash, it lists:
    • the shortname of the index containing it
    • the paths of the matching files

Command

This uses the standard spidering options.

History

  • 2022-10-13 It seems to be spidering only a portion of the tree, so I'm adding some options to log what it spidered (which should also be useful in other contexts).
  • 2022-10-07 Revising how this works -- I think what I want is: look at two file-indexes (FTI output), and save (in JSON) the following: (1) hashes found in both, (2) hashes found in A but not B, (3) hashes found in B but not A. Each hash should list the filespec(s) where it was found. Since we're showing the results of a cross-comparison, it gets too complicated to compare more than two indexes (a feature I've not ended up using, regardless), so we'll now only allow 2 files (model and compare)... but let's change the names to A and B, since all calculations are now symmetrical.
    • ...although now that I think about it, the output could be organized by hash, with each hash having a list of the places it was found (with filespecs). Hrm. Maybe wait for a need-case...