Ferret File System/v0.1
Introduction
FileFerret is a software system intended to tackle the problem of keeping track of large numbers of files, including those archived to offline media. It can also provide services to other applications which may need to track large numbers of files for their own purposes.
My original notes are on HypertWiki, but the official documentation will be here (HTYP) as I create it.
Preliminary Table Designs
These are based on what I had created in MS Access 97.
fideals
"Fideal" is short for "ideal file". A fideal is the abstract idealization of the particular set of bytes contained within a file which is a perfect representation of the original – i.e. it either is the original, or it is a perfect copy. ("Original file" might be as good a name, but in some cases the "original" may become corrupted and one of the copies may be more accurate; the "fideal" is how the file is supposed to be.) Fideal records are generally only created when multiple copies of the same file are found, though they may also be created manually in order to track more information about a file. <mysql>CREATE TABLE `fideals` (
`ID` INT NOT NULL AUTO_INCREMENT, `Title` VARCHAR(255) COMMENT "unique name for the file, wherever it may be found", `Descr` VARCHAR(255) COMMENT "description, or pointer to wiki page", `AutoTitle` VARCHAR(255) COMMENT "automatically-generated title", `AutoDescr` VARCHAR(255) COMMENT "automatically-generated description", `FileSize` INT COMMENT "correct file size in bytes", `FileCkSum` INT COMMENT "correct file checksum", PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>
- Basically, if you want to describe a file, you don't; you describe the filament. Files are localized instances (possibly imperfect copies) of filaments.
- Title will probably end up being some form of the filename, possibly with disambiguating text prepended.
- At some point I'll probably have some kind of syntax whereby the Descr field can refer to a wiki page for more info. Maybe just straight HTML?
- AutoDescr can be generated by the application which first goes looking for the filament and which therefore may have more understanding of the filament's purpose in life
- It's not clear whether we need something more elaborate than just a 4-byte checksum, because there are so many different ways of generating this.
files
<mysql>CREATE TABLE `files` (
`ID` INT NOT NULL AUTO_INCREMENT, `ID_Folder` INT NOT NULL COMMENT "Folders.ID of parent folder", `ID_Fideal` INT DEFAULT NULL COMMENT "Fideals.ID for this file", `FileName` VARCHAR(255) COMMENT "just the filename.ext, not the full path", `FileSize` INT DEFAULT NULL COMMENT "number of bytes in file", `FileCksum` INT DEFAULT NULL COMMENT "file's checksum (algorithm to be worked out later)", `WhenCreated` DATETIME DEFAULT NULL COMMENT "file's creation timestamp", `WhenChanged` DATETIME DEFAULT NULL COMMENT "file's modification timestamp", `FirstFound` DATETIME NOT NULL COMMENT "time/date when file was first found during a scan", `LastFound` DATETIME NOT NULL COMMENT "time/date when file was most recently found during a scan", `isFound` BOOL DEFAULT FALSE COMMENT "TRUE = was found on most recent scan", PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>
folders
<mysql>CREATE TABLE `folders` (
`ID` INT NOT NULL AUTO_INCREMENT, `ID_Parent` INT DEFAULT NULL COMMENT "parent folder; NULL = filesystem root", `Name` VARCHAR(255) COMMENT "filesystem's name for the folder", `Descr` VARCHAR(255) COMMENT "description for human consumption (optional; mainly for local roots)", `WhenCreated` DATETIME DEFAULT NULL "folder's creation timestamp, where available", `WhenChanged` DATETIME DEFAULT NULL "folder's modification timestamp, where available", `FirstFound` DATETIME NOT NULL COMMENT "time/date when folder was first found during a scan", `LastFound` DATETIME NOT NULL COMMENT "time/date when folder was most recently found during a scan", `isFound` BOOL DEFAULT FALSE COMMENT "TRUE = was found on most recent scan", `noScan` BOOL DEFAULT FALSE COMMENT "TRUE = for whatever reason, don't bother scanning inside this folder", `wasDenied` BOOL DEFAULT FALSE COMMENT "TRUE = filesystem did not allow access on last scan attempt", `isRecur` BOOL DEFAULT FALSE COMMENT "TRUE = this folder is a repeat of a parent folder, probably due to a recursive link", PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>
- Name is not the full path, just the folder's filename; if it is a volume's root folder, it may be null or a device name (c:) depending on OS
- noScan is for blacklisting folders whose contents we really don't want to track, such as temp and trash areas which may become full of useless clutter, or areas where things that aren't really files are kept (e.g. "/dev")
locations
I'm going to try eliminating this. It's basically just all folders where ID_Parent IS NULL. <mysql>CREATE TABLE `locations` (
`ID` INT NOT NULL AUTO_INCREMENT, `ID_Folder` INT NOT NULL, `Name` VARCHAR(63), `Descr` VARCHAR(255), PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>
machines
<mysql>CREATE TABLE `machines` (
`ID` INT NOT NULL AUTO_INCREMENT, `NetName` VARCHAR(63) NOT NULL COMMENT "network name of machine", `FirstConnected` DATETIME DEFAULT NULL COMMENT "when this machine was first connected to the DB", `LastConnected` DATETIME DEFAULT NULL COMMENT "when this machine was last connected to the DB", PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>
mappings
<mysql>CREATE TABLE `mappings` (
`ID` INT NOT NULL AUTO_INCREMENT, `ID_Folder` INT NOT NULL COMMENT "folders.ID", `ID_Machine` INT NOT NULL COMMENT "machines.ID", `Name` VARCHAR(63) NOT NULL COMMENT "short name for lists", `Descr` VARCHAR(255) DEFAULT NULL COMMENT "longer description", `FileSpec` VARCHAR(255) NOT NULL COMMENT "must not have terminating slash; will be used as prefix for folder chains", `isPrimary` BOOL DEFAULT FALSE COMMENT "is Machine primarily responsible for maintaining (scanning) this Folder?", `isDrive` BOOL DEFAULT FALSE COMMENT "TRUE = can be queried for volume and capacity information; may be removable.", PRIMARY KEY(`ID`)
) ENGINE = MYISAM;</mysql>