Website Proposal for Historians to Create Databases on
the Web
This essay proposes a website for
the creation of databases for historians on the web using industry standard
methodologies and crowd sourcing for the input of data. The essay is divided
into three main sections with a final section forming a conclusion. The first
section discusses the proposed site and metadata standards as well as giving
links to a dummy site for demonstration purposes. The second section discusses
some advantages and disadvantages of crowd sourcing and possible methods for
attracting database managers and contributors. The third section will discuss
some further aspects of crowd sourcing and solutions as well as ways to
fund such a site. The final segment will
review the previous ones and form a conclusion to this essay
The design for the site is based
along the lines of Wikipedia, however it is a specific database construction
site as opposed to a free encyclopaedia. The reasoning behind this is that many
small groups of historians do not necessarily have the technological expertise
to set up a website for a reasonably large set of data. If a template could be
constructed that would allow them and others with an interest in that subject
to create a database that was searchable via crowd sourcing it would reduce
costs and create a valuable resource. For example, the information of the 1851
census for a town could run to many pages of information. If via this site you
could cut it up into small packages and allow people to input it in a
controlled and searchable way it would breakdown the task immensely. This is
the basic idea, however it is expanded upon that not only alpha numeric
information could be entered. If we logically extended this to use the benefits
of the web such as images, video, links to books or other websites could be
included as pages in that database. Over time the site could become a federated
search for a subject (for an example of a federated search site please see www.nines.org) searching a number of different
databases that might share a similar interest. The controlling aspect would be
a form that a user could set up for their project that would describe each
entry and set up metadata that would be searchable and measureable within the
database. For example, for a site for World War I you could have option button
for images, highlight images and you would have a dropdown list for tanks,
soldiers or airplanes etc. This would create searchable data linked to that
page that can be quickly managed or indexed by a search engine. However an
option to add further searchable terms should be provided to aid this process
with permissions set by the project manager. Forms can have error checking
built into them, requiring fields to be completed before they can be uploaded.
Repeat items with small differences could be carried over from one page to
another with differences changed rather than filling out a completely new form.
Large scale data entry could be input via tables if this proved easier and
added to a data set in batches.
A
very basic site to show the potential layout of a project has been set up at Weebly.com,
with a description of the site on the homepage and mock up
login. A single project
page with an individual
item page have also been added for demonstration purposes and a form page with
tick boxes and drop down lists has been included. Depending on the size of a
particular database and whether wider public access is allowed with regard to
data input, it would probably be advisable to have a history
page with previous amendments or discussions on that particular item. Please
note that none of the
underlying code required for database creation has been added and these pages
are just to give an example for the purposes of this essay. In original
research, it was envisaged that the MARC 21
electronic library cataloguing system would be interlinked with the forms page,
thus limiting the individual form inputs via predictive text. The example used on
the website for MARC 21 is limiting
a book on 'CATS' to only advise the input of the word string 'CATS' rather than
'FELINE' or CAT. However, further research brought up the Open Archives
Initiative Protocol for Metadata Harvesting (OAI-PMH)
and the use of Dublin
Core. Whether an application of OAI-PMH can be used to build up cross
indexing across many databases would remain to be seen. However the adoption of
Dublin Core as at least a basis for setting up the site would perhaps make this
a future possibility. Dublin Core it was thought would be a better protocol
than MARC 21 as it is an XML metadata format that allows many resources
available on the web and beyond to be referenced. However it was also thought
that the majority of this formatting would be done by the application rather
than the user to try to keep the system as easy as possible to use. For
example, if the name of the page is input by the user on the form page, the application
and not the user would input this as the 'name' element with the relevant code
directly into the metadata. This way the page would be Dublin Core compliant
and the user would hardly be aware that it was happening. With regards to the
kind of database management system required I would expect the use of an
existing system modified to the sites needs. Perhaps MySQL or
some other variant would be better however a professional licence may be
required if not using an open source program. This in turn would add to costs
and possible fees which are discussed further on in this essay.
Although their are advantages in
cost with crowd sourcing, there are disadvantages in that input can have errors
within it that are difficult to find. The ability to edit an item other than
ones own entries could be limited by the project instigator, almost in the vain
of sub-editors checking publishing content (where this idea is similar to
Wikipedia). However this much control would be entirely dependent on the
particular database manager. It would make changing errors or contacting
someone to change them a potentially off putting element unless open to
everyone. This is where the history or discussion page could be helpful as it
would show previous incarnations of a particular entry or discussions about it.
Whether a lock out function would be necessary is debatable which is what
Wikipedia have done, however please see further on in this essay for
discussions about this. A positive aspect of crowd sourcing is the community
aspect where the user may find people with similar interests locally,
nationally or possibly even internationally. The crowd applications for the site
would make the dissemination of information hopefully easier. The amateur
historian who is active locally could have recourse to national expertise via
discussion boards or their contact details on other databases (twitter/email accounts).
Similarly, national historians could have access to current international trends
and possibly ongoing work. If historians are working on a database for
themselves, perhaps an automated interface which can translate their work and
put it online for all to see would be a significant additional application.
The
main difficulty would be to encourage not only database managers to set up a
site (when there are already blog sites or online community sites like twitter
or Facebook) but also casual users who might want to help input information. A
reward system could be emplaced whereby free web space on the system could be offered
for data entry on other databases. So for example input ten individual pages
and get one point, get ten points and get one gigabyte of web space or some
other reward. The key to attracting actual database managers would be towards
the service level offered; useful features and backup facilities, indexing and
system caching, gadgets and applications that make the project page look
attractive and display current progress, etc. Whether limitations to the size
of an individual database would be required would depend on whether the site
was provided charitably (i.e. via a educational grant of some kind) or required
funds to purchase space from a web space provider. A fee of some kind maybe
required (perhaps refundable if related to an educational establishment or
project) however please see the next section for a further discussion of this
topic. A subscription would make the site just like many other database hosting
providers, so perhaps donations could be asked for instead.
Further possible downsides to
crowd sourcing, apart from inaccurate data previously discussed are fake sites,
or sites deliberately set up to falsify information. Wikipedia suffered from
this in its early days, however whether the application of the website would
put such individuals off is a key question. The fact that the database would
have to be set up on a specific historical topic (if not already set up) and whether
setting up the form page would put such people off is debatable. The site would
have to rely on the community to report such fake projects and then the
management would have to remove them as quickly as possible. Perhaps a
requirement to send a text message or email verification so that people on a
black list would be refused an account is a possibility. However there are ways
around this (new SIM card or email address) and once again the community would
have to be relied on to report such actions. A further negative possibility is
deliberate input of false or inaccurate information on project databases
already in existence. This is more difficult to detect (like unintentional
inaccuracies) unless the project manager or other authorised user is checking
individual pages on a regular basis. As
with the previous problem the community would have to check for such activity
and report any deliberate vandalism. If an individual page within a database is
subject to deliberate successive attacks a lock out function maybe necessary
(as used by Wikipedia) to dissuade such attempts.
One
way around this would be to use a method used by peer to peer sharing of
information and that is to make the site like a club, whereby members are
referred by existing members. If a member has to be excluded from the site, the
member who referred them could also be excluded or downgraded in some way.
Perhaps a ratings or statistics system of some kind (like eBay) with regards
accuracy, number of pages input etc could be implemented as an incentive.
However with specific relation to historical subjects whereby there is
disagreement on many aspects of one subject whether this would work could be a
mute point. There could also be an argument that if a historian has the
technical knowledge to compile a database of some kind they will probably
already know how to set up a database on an existing hosting website. There are
also wiki sites available (e.g. http://www.wikispaces.com/)
which could be set up for an individual project, although they can be
limited by price or size. The only difference is the proposed site would probably
be open source, with an already existing user interface and the added advantage
of crowd sourcing of data. This does bring up the question of funding and with
the current economic climate it could be dubious as to whether a provider would
give free resources to a site of this kind which would require large amounts of
space and processing power. Setting up ones own server could be costly, however
perhaps a one off charge, or not for profit company could be set up to cover
such problems. Many internet service providers do like to be seen as charitable
so perhaps some sort of accommodation could be accomplished.
This website project was inspired
by Wikipedia although with a restructuring towards a more academic and
searchable application of history. The hypothesis for this is that not many
historians are familiar with the technicalities of creating an online database and
that an easy to use application would make this process much more simple. The site's
aim is of an IMBd style interface, with an
easy way to create searchable tables of data from many different sources on one
particular topic or locality. The linking of a form page to an individual page
with preset identifiers seemed to make sense and would make compiling a
database from them that much more straight forward. The use of Dublin Core in
the metadata from the outset would make any future progression of the site that
much easier. Crowd sourcing was an obvious solution, however as previously
discussed in the essay the downsides to the reliance on this are outweighed by
the use of a not for profit labour force. If the site and individual project was
set up correctly, I would envisage that the negative aspects could be kept to a
minimum. The major reservation that has re-occurred throughout this essay is
the subject of funding. Due to the current economic climate I do not
necessarily see that a site of this kind being easily set up. The necessary
programming skill to set up the user interface and database servers
requires funding. Perhaps a collaboration between a number of different sources
could make this a completely open source project, without the necessity to
charge a fee for its use. It would certainly make the task of inputting a
historians work that much more easier, displaying it and enabling it to reach as
wide an audience as possbile.
Total Word Count 2268
Bibliography
A Wiki-page set up site;
Consulted 3/4/12
Picture for website in Weebly (Newport rising)
http://www.cottontimes.co.uk/charto.htm
Consulted 3/4/12
Weebly example site
http://212926218626269637.weebly.com/
published 3/4/12
What historians don't know about database design...
Consulted 2/4/12
MARC 21 Standards and information
http://www.loc.gov/marc/
Consulted 6/4/12
http://www.loc.gov/marc/umb/
Consulted 6/4/12
http://tools.ietf.org/html/rfc5013
Consulted 7/4/12
http://dublincore.org/documents/dcmi-terms/
Consulted 7/4/12
http://wiki.dublincore.org/index.php/User_Guide
Consulted 9/4/12
Open Archives Initiative Protocol for Metadata Harvesting
(OAI-PMH)
Consulted 7/4/12
MySQL FAQ
http://dev.mysql.com/doc/refman/5.1/en/what-is-mysql.html
Consulted 13/4/12
IMBd - Internet Movies Database
http://www.imdb.com/ Consulted 13/4/12
Database website
http://www.databasecorner.com/
Consulted 13/4/12
No comments:
Post a Comment