New data model
A metadata table to explain the different types of data that we want to record, for example:
A metadata values tables will list the acceptable values for a piece of data:
A simplified country table is required with just:
Column |
Type |
Description |
ID |
Integer |
A unique “key” for this country |
Code |
String |
2-letter code |
Short Name |
String |
The short name in English |
Full name |
String |
The full official name in English |
Region |
Integer |
The region this country belongs to, stored as a reference to the regions table |
Finally a data table that actually records all the different types of data:
Column |
Type |
Description |
Integer |
A unique “key” for this piece of data |
|
Type |
Integer |
The ID from the metadata table to explain this data |
Country |
Integer |
The ID of the country |
Occurred |
Date |
The date that this new data occurred in real life. |
Changed |
Date |
The date that this new data was entered into the database (this may be useful for double-checking the work of interns etc.) |
Who |
String |
The email address of the staff member (or intern) who entered this data into the database. (This can be the email address of a visitor who gave feedback) |
DataString |
String |
The actual data itself but only if the metadata type is a String, otherwise null |
DataNumber |
Float |
The actual data is the metadata type is Number, Decimal or Boolean, otherwise null |
Source |
String |
The official source of this version of the data, including a web link where possible |
CommentsInternal |
String |
Comments added by the staff member which is for internal reference |
CommentsPublic |
String |
Comments which are useful background information for this data point, which should be shown as a footnote or popup on the website. |
Status |
Integer |
The status of this item of data. 0 = Pending user confirmation 1 = Pending approval 2 = Approved and visible on the site 10 = Rejected (perhaps lacking credible source, but kept so that the submitter is still on record) 11 = Visible but contested |
ReviewedBy |
String |
The email address of the staff member who reviewed this entry and made the approve/reject decision. |
The reason why there are 2 Data columns is that we need to store the numeric data in the database in a special format. By doing this we can very quickly and flexibly do advanced queries such as “Show me the average voter turnout in the Americas, between 1970 and 1980”, or “Show me the highest percentage of invalid votes anywhere in the world since 1990”
Having both an occurred and changed column means that for specific types of data, we can say when the last update was. For example we can automatically print out that this data for ESD type for Italy was last updated 2007-10-12. If we have the human resources to check each field, we can even update the “Changed” value to today’s date to indicate that the data has been checked for accuracy on this date, without changing the value. For example, we checked the ESD of Italy on 2007-10-12 and confirmed that the last change was still in 2005.
