Ferret File System/v0.1/SQL/firev: Difference between revisions

From Woozle Writes Code
Jump to navigation Jump to search
(pasting original version before modifying it, just for historical reference)
 
m (Woozle moved page FileFerret/SQL/firev to Ferret File System/v0.1/SQL/firev: obsolete)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
===fideals===
==About==
"[[/fideal|Fideal]]" is short for "ideal file". A fideal is the abstract idealization of the particular set of bytes contained within a file which is a perfect representation of the original – i.e. it either ''is'' the original, or it is a perfect copy. ("Original file" might be as good a name, but in some cases the "original" may become corrupted and one of the copies may be more accurate; the "fideal" is how the file is ''supposed'' to be.) Fideal records are generally only created when multiple copies of the same file are found, though they may also be created manually in order to track more information about a file.
* '''Purpose''': tracks revisions to {{l/same|file}}s (see {{l/fileferret|terms/firev}})
<mysql>CREATE TABLE `fideals` (
* '''Future''':
   `ID`        INT NOT NULL AUTO_INCREMENT,
** We may want to support multiple types of hash, in which case there would be a separate firev_hash table containing the output of each hash-type for each firev hashed.
   `Title`     VARCHAR(255) COMMENT "unique name for the file, wherever it may be found",
** not sure if we should be tracking FirstFound and LastFound here. I expect them to be useful, but they might be better determined by looking at the event records.
   `Descr`    VARCHAR(255) COMMENT "description, or pointer to wiki page",
===Fields===
   `AutoTitle` VARCHAR(255) COMMENT "automatically-generated title",
* '''isDelib''': TRUE if this is known to be a deliberate user edit of the parent firev (must be FALSE if isAccid is TRUE)
   `AutoDescr` VARCHAR(255) COMMENT "automatically-generated description",
* '''isAccid''': TRUE if this is known to be an accidental corruption of the parent firev (must be FALSE if isDelib is TRUE)
   `FileSizeINT          COMMENT "correct file size in bytes",
* '''Descr''' should describe, if known, what distinguishes this firev from other firevs of the same fideal (or from the parent firev, if there is one).
   `FileCkSum` INT          COMMENT "correct file checksum",
* '''FileHash''' is a unique fingerprint (hash) of the file, using an algorithm unlikely to produce duplicate hashes for even the smallest variation. See [http://us3.php.net/manual/en/function.hash.php hash()] for details; apparently these can be as long as 128 hexadecimal digits = 64 bytes.
 
I think we're also going to want a field, possibly called '''FileMean''', which changes very little (or not at all) when there are small changes to the data. This should make it easier to connect firevs to a fideal.
===Rules===
* '''isDelib''' and '''isAccid''' can both be FALSE if the cause of the firev is not known (i.e. we know there's a new firev, but not whether it was deliberate or accidental), but they cannot both be TRUE.
 
===History===
* '''2012-12-25''' Adapted/created from fideals
* '''2016-02-27'''
** Added '''FirstFound''', '''LastFound''' timestamps.
** Scrapped Fideal, Parent.
** Renamed "firevs" -> "firev".
==SQL==
<syntaxhighlight lang=mysql>CREATE TABLE `firev` (
   `ID`        INT NOT     NULL AUTO_INCREMENT,
   `isDelib`   BOOL DEFAULT FALSE COMMENT "TRUE = this is known to be a deliberate user edit of the parent",
  `isAccid`  BOOL DEFAULT FALSE COMMENT "TRUE = this is known to be an accidental corruption of the parent",
   `Descr`    VARCHAR(255)       COMMENT "description of differences",
   `FileSize` INT                COMMENT "correct file size in bytes",
   `FileHash` VARBINARY(64)     COMMENT "unique hash",
   `FirstFoundDATETIME    NOT NULL COMMENT "time/date when file was first found during a scan",
   `LastFound`   DATETIME    NOT NULL COMMENT "time/date when file was most recently found during a scan",
   PRIMARY KEY(`ID`)
   PRIMARY KEY(`ID`)
)
)
ENGINE = MYISAM;</mysql>
ENGINE = InnoDB;</syntaxhighlight>
* Basically, if you want to describe a file, you don't; you describe the fideal. Files are localized instances (possibly imperfect copies) of fideals.
===scrapped===
* '''Title''' will probably end up being some form of the filename, possibly with disambiguating text prepended.
These might get used later, but I suspect they don't belong here:
* At some point I'll probably have some kind of syntax whereby the '''Descr''' field can refer to a wiki page for more info. Maybe just straight HTML?
<syntaxhighlight lang=mysql>
* '''AutoDescr''' can be generated by the application which first goes looking for the fideal and which therefore may have more understanding of the fideal's purpose in life
  `ID_Fideal` INT  DEFAULT NULL  COMMENT "ID of ideal file",
* It's not clear whether we need something more elaborate than just a 4-byte checksum, because there are so many different ways of generating this.
  `ID_Parent` INT  DEFAULT NULL  COMMENT "ID of parent firev, if any",
</syntaxhighlight>

Latest revision as of 13:40, 25 February 2024

About

  • Purpose: tracks revisions to files (see terms/firev)
  • Future:
    • We may want to support multiple types of hash, in which case there would be a separate firev_hash table containing the output of each hash-type for each firev hashed.
    • not sure if we should be tracking FirstFound and LastFound here. I expect them to be useful, but they might be better determined by looking at the event records.

Fields

  • isDelib: TRUE if this is known to be a deliberate user edit of the parent firev (must be FALSE if isAccid is TRUE)
  • isAccid: TRUE if this is known to be an accidental corruption of the parent firev (must be FALSE if isDelib is TRUE)
  • Descr should describe, if known, what distinguishes this firev from other firevs of the same fideal (or from the parent firev, if there is one).
  • FileHash is a unique fingerprint (hash) of the file, using an algorithm unlikely to produce duplicate hashes for even the smallest variation. See hash() for details; apparently these can be as long as 128 hexadecimal digits = 64 bytes.

I think we're also going to want a field, possibly called FileMean, which changes very little (or not at all) when there are small changes to the data. This should make it easier to connect firevs to a fideal.

Rules

  • isDelib and isAccid can both be FALSE if the cause of the firev is not known (i.e. we know there's a new firev, but not whether it was deliberate or accidental), but they cannot both be TRUE.

History

  • 2012-12-25 Adapted/created from fideals
  • 2016-02-27
    • Added FirstFound, LastFound timestamps.
    • Scrapped Fideal, Parent.
    • Renamed "firevs" -> "firev".

SQL

CREATE TABLE `firev` (
  `ID`        INT  NOT     NULL AUTO_INCREMENT,
  `isDelib`   BOOL DEFAULT FALSE COMMENT "TRUE = this is known to be a deliberate user edit of the parent",
  `isAccid`   BOOL DEFAULT FALSE COMMENT "TRUE = this is known to be an accidental corruption of the parent",
  `Descr`     VARCHAR(255)       COMMENT "description of differences",
  `FileSize`  INT                COMMENT "correct file size in bytes",
  `FileHash`  VARBINARY(64)      COMMENT "unique hash",
  `FirstFound`  DATETIME     NOT NULL COMMENT "time/date when file was first found during a scan",
  `LastFound`   DATETIME     NOT NULL COMMENT "time/date when file was most recently found during a scan",
  PRIMARY KEY(`ID`)
)
ENGINE = InnoDB;

scrapped

These might get used later, but I suspect they don't belong here:

  `ID_Fideal` INT  DEFAULT NULL  COMMENT "ID of ideal file",
  `ID_Parent` INT  DEFAULT NULL  COMMENT "ID of parent firev, if any",