Whitepaper: International IDEA's Unified Database

IDEA front page

The customer

The International Institute for Democracy and Electoral Assistance (International IDEA) is an intergovernmental organization that supports democracy worldwide. Their mission is to support sustainable democratic change by providing comparative knowledge, assisting in democratic reform, and influencing policies and politics. Read more

Motivation

The situation with IDEA databases was problematic for both users and staff. For visitors to IDEA’s site, the databases each appeared as completely distinct entities with different navigation and appearance (maps, tables, menus etc.) with only minimum connection between the databases. Furthermore, there was no clear indication of how up-to-date each database is, and no clear workflow on how visitors can contribute to improve the quality of each database.

Better staff interface

For staff, each database had a different interface to browse and edit data. The editing interfaces had little or no help or documentation, and no error checking code. This had led to interns entering incorrect data or data in the wrong format (i.e. typing decimals with the comma) which in turn led to errors in the publicly visible database interface. Now there is a single location for IDEA data. Staff can browse any field whether it previously resided in the VT, ESD, EDR or any other IDEA database. This makes it a valuable tool for in-house research and comparison.

Systematic data definitions

By completely reviewing the databases at IDEA and building a central database structure to house all the results, we could build a single consistent interface to both web visitors and staff. During the creation of this platform, all data fields were specified in more detail (i.e. given a title, an example, notes to staff about the correct format, and formatting instructions for how each field should be displayed on the web). Most of this meta-data was ascertained from the existing formatting of the different databases, but teams were involved to ensure accuracy, and used the opportunity to update some of the glossary of terms.

Consistent data lookups

Part of the work was the creation of a set of functions for querying the database in different ways. You can think of these as query “templates”. For example, there may be a template for querying and displaying the average value for a certain field (say Turnout/VAP) for a set of Countries (which is a parameter to the query). This same template would be used to display data for all the different data themes. The implication is that looking up data from the VT, ESD or EDR webpage would function the same and be displayed in the same way.

Consistent naming

Previously, most of IDEA’s databases already shared a country database table, which included the standard 2-letter country codes and the “IDEA” names of countries. So, changing the name of a country (from “Taiwan, province of China” to “Taiwan”) could already be done once and affect the whole site. However, it was preferable to have both an official short name (e.g. “Congo DRC”) and an official long name (e.g. “Congo, the Democratic Republic of”) that we can use as the space allows (e.g. in a map we decided to use short names because of lack of space)

Unfortunately, there was no simple solution for regional breakdowns. Some databases did not provide a breakdown by region at all, and those that did differed to the rest of the IDEA website. Creating standard regions in the unified structure, allowed us to keep regional names and breakdowns consistent.

Cross-thematic data lookup

While the unified database could be used to bring consistency to the current database web pages, it could also be used to provide whole new options for web visitors to explore IDEA’s data. Since all the data is in a shared format and location, it became possible to query combinations of data from what were previously separate databases. For example, a query can list the electoral system, the voting age population, and whether a country has remote voting all in a single table on screen.

Shared improvements

Another benefit of having a single library for all the databases, is that when we add a new feature, all the thematic interfaces can immediately make use of the new feature with a bare minimum of changes. For example, if we create an new RSS feature to track new related database entries, it would just be a matter of adding a new link to each thematic page for that feature to come into effect.

Image: 

Technology platform

Since IDEA’s website and intranet use Adobe’s ColdFusion as the programming platform, it was decided to continue with this platform. This enabled us to reuse parts of the existing editing interface and display code for the new system.

Wherever possible, software development standards were used. The database itself is a standard relational database, currently Microsoft SQL Server 2008, but compatible with free open-source alternatives. Migrating the database component to one of these free alternatives is trivial.

The interactive components of the pages are built using Javascript which is cross-platform and also completely independent of ColdFusion. The styling of the pages is done with CSS technology which is also independent.

The Query library was built entirely in the ColdFusion programming language (which is built on top of the Java platform), but properly documented, it also would be reasonably easy to migrate to a Microsoft or open-source platform.

Structure of the system

Databases before

Problems with previous design

  • ESD data is recorded in the Countries database and so does not allow for historic data to be kept. For
    example, when the president changes, we must overwrite the old president’s name.

Patterns to follow

The Turnout table was a good starting point for a better data management system. This table allowed us to
record voter turnout for each and every election that has every occurred. The VoteYear columns acts as a
“version number” to lets us track changes over time.

Data that we could not previously record

There was a high cost in changing the data fields that we record. If the elections team decided that they
wanted to start recording and display the remote voting system used, we needed to alter the database tables
involved, then change both the pages that display the data, but also update the editing interface so that
the team can enter that data. With a unified model, anyone can specify a new data field and start entering
data immediately. Displaying that new data type inside a web page requires just a single line of code.

Requirements for the new model

  • The data is highly granular (i.e. we can track particular values such as the parliamentary gender quota
    of a particular country as it happens, and even know the date that the quota changed, and perhaps even
    have a comment field to explain the background behind the change)

New data model

A metadata table to explain the different types of data that we want to record, for example:

Column

Type

Description

ID

Integer

A unique “key” for this type data

Name

String

The name of this data type in English, e.g. “Remote voting system”

Theme

String

The theme that this data is related to, e.g. “Electoral processes” or “Voter turnout”

Format

String

This explain the expected format of the data – String, Number, Decimal, Boolean etc.

Weight

Integer

A value indicating how important this data is – the “lighter” the value, the higher it appears in a list of results.

Multiple

Boolean

Can there be multiple values for this metadata – e.g. Can a country have 2 ESD types?

Description

String

Information to explain the details of the metadata

Link

String

A web link for more explanatory information.

Active

Boolean

Setting this to false will indicate that this type of data is no longer being maintained by IDEA, or has been replaced by something else

A metadata values tables will list the acceptable values for a piece of data:

Column

Type

Description

ID

Integer

The metadata key

Value

String

The possible value, e.g. “First past the post” would be a value for the electoral system.

Description

String

A longer description (in English) of what this value actually means.

Color

String

The color associated with this value (in HTML RRGGBB format), which matches the color used in the related handbook.

Weight

Integer

A value indicating how important this data is – the “lighter” the value, the higher it appears in a list of results.

A simplified country table is required with just:

Column

Type

Description

ID

Integer

A unique “key” for this country

Code

String

2-letter code

Short Name

String

The short name in English

Full name

String

The full official name in English

Region

Integer

The region this country belongs to, stored as a reference to the regions table

Finally a data table that actually records all the different types of data:

Column

Type

Description

ID

Integer

A unique “key” for this piece of data

Type

Integer

The ID from the metadata table to explain this data

Country

Integer

The ID of the country

Occurred

Date

The date that this new data occurred in real life.

Changed

Date

The date that this new data was entered into the database (this may be useful for double-checking the work of interns etc.)

Who

String

The email address of the staff member (or intern) who entered this data into the database. (This can be the email address of a visitor who gave feedback)

DataString

String

The actual data itself but only if the metadata type is a String, otherwise null

DataNumber

Float

The actual data is the metadata type is Number, Decimal or Boolean, otherwise null

Source

String

The official source of this version of the data, including a web link where possible

CommentsInternal

String

Comments added by the staff member which is for internal reference

CommentsPublic

String

Comments which are useful background information for this data point, which should be shown as a footnote or popup on the website.

Status

Integer

The status of this item of data.

0 = Pending user confirmation

1 = Pending approval

2 = Approved and visible on the site

10 = Rejected (perhaps lacking credible source, but kept so that the submitter is still on record)

11 = Visible but contested

ReviewedBy

String

The email address of the staff member who reviewed this entry and made the approve/reject decision.

The reason why there are 2 Data columns is that we need to store the numeric data in the database in a special format. By doing this we can very quickly and flexibly do advanced queries such as “Show me the average voter turnout in the Americas, between 1970 and 1980”, or “Show me the highest percentage of invalid votes anywhere in the world since 1990”

Having both an occurred and changed column means that for specific types of data, we can say when the last update was. For example we can automatically print out that this data for ESD type for Italy was last updated 2007-10-12. If we have the human resources to check each field, we can even update the “Changed” value to today’s date to indicate that the data has been checked for accuracy on this date, without changing the value. For example, we checked the ESD of Italy on 2007-10-12 and confirmed that the last change was still in 2005.

User Administration

The previous databases never had a way to add user profiles, set passwords and remove access for old staff
members. The web administrators previously had to manually create new accounts, or people (particularly interns)
had to log in with other peoples’ passwords. Neither option was very practical.

Part of the new database is an interface where staff are able to deactivate (but not delete) old user
accounts. Deleting accounts is not suitable in this case because we would lose information about who edited
old data. This information is useful to track old errors by certain past interns, for example. Disabling an
account will mean that no one will be able to log in with that username. It will be easy to re-enable this
account if necessary.

Active accounts will be able to create new user accounts and set a password. A password reminder function
will also be created, so that staff can receive a password reminder to their IDEA email account if they
forget.

Community feedback

An additional feedback table is not required to take feedback from visitors to the website. We
can reuse the data table, and add a Status field to flag feedback from visitors into the same
table, but marked as pending. All visible data (including summaries, searches and exports) will check the
Status field and only consider approved data).

A view that is only visible to database administrators has been created, listing all the pending data items,
with an option to approve or reject. The internal Comments fields can be used to record the reason
for a rejection. The email address from the Who field is used when automatically sending a thank
you email (both when they submit the feedback and when it is accepted or rejected).

The ReviewedBy field is used for recording who made the reject or accept decision. The changed date
will also be updated to reflect that date of the decision, and thus becomes the date that the data is shown
as “last updated”.

The data table therefore also acts as a history of review decisions, as well as its primary role tracking how
the values change over time. From that data, reports can be generated, such as “Country that has had most
community feedback”, “Countries where the “voting system” has changed since 2000”

Outstanding issues

Translation

It will be quite easy to translate the metadata names, but translating the actual data may be trickier. We could just use a single translation table like this:

Column

Type

Description

English

String

The metadata name, value, or DataString value

Language

String

The language of the translation

Translation

String

The translation itself

Previously, there were around 8 databases each with about 12 fields on average. This means that there were around 100 fields that need to be labelled, described and given formatting preferences. It is easy to translate these 100 fields x 2 (one for label and one for description). The data entered, however, can’t be specified in advance. Fields that contain text, such as the comments to QP’s “Constitutional Quota for National Parliaments?” will each need to be translated separately. Every time data with text is added or changed, re-translation will need to occur.

Copyright

We have assumed that all data is owned by IDEA, in instances where the data has come from partnerships, we could add an extra column to the data table which is “Source” or “Copyright owner” and record the data there

Predefined searches

For simplicity, staff from each thematic area will be asked to agree on “preset” searches for common situations. “All countries with compulsory voting”, “All election data for countries with presidential elections“.

Teams will also prepare a list of “Top 25” searches, such as “Top 25 countries by highest turnout in the latest parliamentary or presidential election”. These searches are useful for the media and will hopefully generate good press for IDEA.

Using the database

I

Advanced search page

V

Understanding the search results

[

  1. The preview window introduction explains how to use the window to edit or export data.
  2. This view ("Quick view") shows the most recent data point for each country and field. The year that the data occurred is displayed in brackets underneath. The most recent version of this data point of this occurrence is shown, meaning that any approved correction supplied by the community is automatically displayed.
  3. For fields with pre-defined values, a legend is shown at the bottom of each field’s column.
  4. You can see any additional notes about each data point by clicking on the cell. You will also have the option to add missing data or update/correct existing data.

Suggest update or correction

<

Steps (continued)

  1. The update page is pre-configured for a particular country and data field. The year refers to the time that the data was recorded or measured. For example, if we are correcting the voter turnout for an election that occurred in 1996, then 1996 should be in selected here.
  2. Contributors enter their e-mail address so that we can confirm that the submission is not made by a spammer. It also helps the team at IDEA to get more information and sources when confirming the submission.
  3. Pressing the “Save” button submits the new data to IDEA. The responsible team is alerted to the update, and will review and approve the change as soon as possible. Submissions will not appear in the search results until approved by a staff member.

Most pages showing data include an edit icon that will take users this page.

Conclusion

In 2010, information and data from previously separate databases and resources covering topics such as voter turnout, electoral system design, electoral gender quotas, direct democracy and more, have been incorporated into one common interface, the “unified database” (www.idea.int/uid). Navigation and appearance (such as maps, tables, menus, etc.) are now common for all.

The major benefits of this change are to streamline and standardize the Institute’s data collection, maintenance and presentation of information. For partners and users, the unified database now provides a standard web search interface and workflow to access the data that is stored in either text or numerical form.

If you want to learn more about the unified database project or discuss a new project, please contact us.

About International IDEA

The International Institute for Democracy and Electoral Assistance (International IDEA) is an intergovernmental organization that supports democracy worldwide. Their mission is to support sustainable democratic change by providing comparative knowledge, assisting in democratic reform, and influencing policies and politics.

International IDEA was launched on 27 February 1995 in Stockholm as an intergovernmental organization numbering 14 countries. As of 2010, there are 25 Member States: Australia, Barbados, Belgium, Botswana, Canada, Cape Verde, Chile, Costa Rica, Denmark, Finland, Germany, Ghana, India, Mauritius, Mexico, Namibia, The Netherlands, Norway, Peru, Portugal, South Africa, Spain, Sweden, Switzerland and Uruguay. In addition, Japan is an Observer State.

Member States are all democracies spanning many cultures and countries. They provide both financial and political support to the work of the Institute.

IDEA is led by a Secretary-General, with a governing body composed of a Council of its Member States. The Board of Advisers are eminent personalities and experts in the field of democracy support.

As well as the Stockholm headquarters, International IDEA has regional offices in Africa, Latin America and Asia. An Office of the Permanent Observer for International IDEA to the United Nations is maintained in New York.

Networks and partnerships with a wide variety of organizations and policy makers continue to be developed in many countries. Examples are United Nations agencies, regional political entities such as the European Union, the Organization of American States and the African Union, electoral management bodies, political parties and civil society organizations.

The Institute has four main areas of expertise:

  1. electoral processes
  2. constitution building processes
  3. political parties, participation and representation
  4. democracy and development.

Issues and themes that cut across the four areas are gender, conflict and security, and diversity. International IDEA’s State of Democracy assessment methodology is used by citizens to assess the quality of democracy in their country.

IDEA offers expertise and knowledge by providing comparative data derived from practical experience on democracy building processes from diverse contexts around the world

Transmachina has been working with International IDEA since 2004, assisting with database management, web publishing and translation management.