FactFerret: Difference between revisions

Revision as of 18:32, 22 March 2013

About

DataFerret is a way to store arbitrary data from multiple sources in a single database, while preserving the meaning of each datum.

The purpose is to allow multi-layered querying (e.g. "display rate of all A that didn't include B from year C to year D") without foreknowledge of what data is available. This is in some ways similar to what GapMinder does, except it should be relatively easy to add new datasets (one database schema should be able to accommodate any assertion of fact) and it should be capable of data-dependent conditionals ("all A that didn't include B" is just a very simple example).

It also bears some similarity to Semantic MediaWiki, except that data is entered and updated independently of a wiki page, via both manual and automated methods.

Schema

This is a preliminary schema, just to give an idea of how it works.

point

ID (auto)
ID_Series

value

ID (auto)
ID_Axis

series_attrib

ID (auto)
ID_Series
ID_Attrib
Value

axis

ID (auto)
ID_Series
Name
ID_Unit

series

ID (auto)
Name
Descrip
ID_Source
possibly other attributes

source

ID (auto)
ID_Entity -- organization or individual who created the data
URL -- (optional) web page where data may be found
When_Retrieved

unit

ID (auto)
Name
ID_Handler -- sprintf(), date(), custom code...

unit_format

ID (auto)
ID_Unit
Name -- a name for the format, e.g. "ISO xxxx"
Tplt -- template string to pass to unit handler (e.g. "%y/%m/%d")

Views

Represent any 2 axes as a graph/chart -- basically, spreadsheet graphing functionality:
- variety of graph/chart formats available
- eventually, add more dimensions (color, size, slider) a la GapMinder
- restrict range or show entire range
Answer questions written in English-like syntax, with graphs or scalars:
- "During the 2008 mortgage crisis, what percent of loan defaults came from CRA-inspired loans?" (scalar output)
- "Display rate of default for CRA-inspired loans versus all loans during the 2008 mortgage crisis." (graph output, restricted range)
- "Display {profitability of loans to minorities} and {profitability of loans overall} by year." (graph output, unrestricted range)
Offer sources for all data presented.
Where data from multiple sources differs: offer to average it, present each source separately, or show each source separately in the output (e.g. as a differently-colored line).

@@ Line 1: / Line 1: @@
 ==About==
-[[DataFerret]] is a way to store arbitrary data from multiple sources in a single database.
+[[DataFerret]] is a way to store arbitrary data from multiple sources in a single database, while preserving the meaning of each datum.
 The purpose is to allow multi-layered querying (e.g. "display rate of all A that didn't include B from year C to year D") without foreknowledge of what data is available. This is in some ways similar to what [http://gapminder.com GapMinder] does, except it should be relatively easy to add new datasets (one database schema should be able to accommodate any assertion of fact) and it should be capable of data-dependent conditionals ("all A that didn't include B" is just a very simple example).

FactFerret: Difference between revisions

Revision as of 18:32, 22 March 2013

About

Schema

Views

Navigation menu

Search