Dissertation
on
ISSUES INVOLVED IN SETTING UP AN
INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE
UNIVERSITY OF KALYANI
Submitted by :
Chandan Saha
Course
: M.L.I.Sc
Session
: 2005
Examination Role:
99/MLI No.050009
Department of Library and
information science
The
University of Kalyani
Kalyani, Nadia
Dissertation
on
ISSUES INVOLVED IN SETTING UP AN
INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE
UNIVERSITY OF KALYANI
GUIDE: MR. ARUP
ROYCHOWDHURY
Submitted by :
Chandan Saha
Course
: M.L.I.Sc
Session
: 2005
Examination Role:
99/MLI No.050009
Department of Library and
information science
The
University of
Kalyani
Kalyani, Nadia
ABSTRACT
Day-by-day scholarly publications are becoming costlier and unmanageable
for any library or information centers to collect them all. There are other
factors also affecting access to scholarly publications. To overcome these
barriers, and to make access easier and barrier free, e-print repositories are
demand of time. There are lots of issues affecting setup of an institutional
e-print repository. But most important part of them is its policies. A well
thought policy could make it successful and acceptable to all concerned.
Policies and issues relating to institutional repositories are discussed with
special references to KU; and recommendations for setting up an e-print
repository are made. Some other key factors are also controlling setup of
institutional repositories. Advocacy methods are discussed in brief. Some
problems relating to setup-like legal issues, role of University/parent body,
metadata issues, control over archives, selection criteria, administration,
submissions and impact on staffs etc. are explained in brief. An attempt has
been made to formulate a model guideline for University of Kalyani to set up an institutional e-print repository. Some
possible solutions for problems are also indicated. Technical aspects including
LAN and software selections are also dealt with.
PREFACE
Researchers
publish their work to inform every interested person to know about their
findings. But publishers has created access barrier to them by demanding high
access toll. There are other types of barriers also. Information and communication
technology has developed to a great extent. Internet has become a popular media
of communication and exchange of views among peer groups can be very easy
through it. A movement has initiated world wide to make research output free
from grip of publishers and make access bearer free. Internet can act as a communication medium
for the purpose.
To
make computation free from monopoly of commercial organizations, open source
software came into play. This results various open source software, including
open source Operating Systems. These software and OS are downloadable free of
cost, and can be customized based on requirements-as they are open source. This
provided advocators of Open Access
Movements to go for seating up e-print repositories where authors can archive
an electronic copy of their research output for toll free access. This
generated concept of Institutional e-print repositories.
Institutional
e-print repositories are those repositories setup by any institution to archive
their research output. Various software are available over Internet freely
download able and can be used to setup an institutional repositories. But the
most important part of it is its policies. Only a full proof well thought
policy can built an Institutional e-print repository to go for a long run. Here
in this dissertation, these policies and factors that can affect it are
discussed based on existing literature on open access archives and
repositories. Technicalities and software related issues are also taken into
consideration. An attempt has been made to prepare a brief guideline relating
to policy issues for seating up an institutional repository for the
University of Kalyani. An attempt has been made to install Dspace software to show that it may serve the purpose.
This has done for the partial fulfillment of M.L.I.Sc course. Hence, this is a very vast job and
requires a long time to consider all factors. It is just next to impossible to
enumerate all aspects of policies and all related issues in a time limited project.
45 days are even not well enough to consider all aspects of technical issues,
so I had to restrict myself in just in installation. I could not found any time
to customize it for requirements of KU repository. A lot of scope has left for
further development relating to those issues and finding problems and their
solutions.
ACKNOWLEDGEMENT
I have taken my dissertation on 'Issues involved in setting up an
Institutional E-print Repository with special references to the
University of
Kalyani which was a vast
job for me, specifically within in just 45 days, a very short period. During
this preparation, my guide was Mr. Arup Roychowdhury-Deputy Librarian of Information and
Documentation
Center, ISI, B.T.Road Calcutta. He just not only guided me, but also
helped me to complete this vast job within a very short period. He helped me
most. I will remain ever thankful to him for his kind guidance and simultaneous
hard labor he gave for me.
I
want to express my cordial thanks to Dr. A.R.D. Prasad, Associate professor,
DRTC, ISI (Bangalore). Without his
help, technical problems could stop me to install Dspace
software. I want to thank him a lot for his kind directions to the package they
build for ease of installation.
I
want to thank my teachers of MLISc courses, Mr. Dibyendu Paul, Mr. Sabuj Dasgupta (former Head of the Dept.) and Mr. Bidhan Chandra Biswas (Head of
the Dept.) for their kind cooperation during the course of study. I also want
to thank Mr. Swapan Kumar Roy and other staffs of the
department for their cooperation during the course of study.
I
have to thank Mr. Mriganka Mondal,
Assistant Librarian (Library in-charge) and Mr. Swapan
Dasgupta of University Internet center for their kind
permission of using his personal information resource during the project. I
want to use this opportunity to thank Mr. Joydip
Chandra-our senior friend, and my other classmates who encouraged me in
different times during the course of study.
Name of the
student
(Chandan Saha)
Course: M.L.I.Sc
Session: 2004-05
Examination
Roll. 99/MLI No.050009
LIST OF CONTNTS
List of Abbreviations Used
Abbreviations Full
form
Archive e-print archive(here)
DCMES
Dublin Core Metadata Element Set
DSpace Disk
space software
EPrints Eprints software
e.g. Example
etc. etcetera
HDD Hard disk
drive
H/W Hard
Ware
Internet Inter-network
IRs Institutional
Repositories
KU the
University of Kalyani
OA Open
Archive
OAI Open
Archive Initiative
OAI-PMH Open
Archive Initiative Protocol For Metadata Harvesting
OS Operating
System
RAM Random
Access Memory
ROM Read Only
Memory
S/W Software
GLOSSARY
Dspace: Free
software for producing an archive of eprints.
Provided by http://sourceforge.net/projects/dspace/
eprint : An electronically published
research paper (or other literary item).
EPrints : Free software for producing an
archive of eprints. Provided by www.eprints.org/
eprint archive :An online archive
of preprints and post prints.
May or may not running using EPrints software.
OA: Open Access- restriction free
access to use documents for academic purpose (in electronic archives here).
OAI: Open Access Initiative. From
their mission statement "The Open Archives Initiative develops and
promotes interoperability standards that aim to facilitate the efficient
dissemination of content."
OAI-PMH: Open Access Initiative-Protocol for
Metadata Harvesting. A way for an archive to share it's metadata with
harvesters which will offer searches across the data of many OAI-Compliant
Archives.
OAI
compliant: An archive which has correctly implemented the OAI Protocol.
Post
print: The digital text of an
article that has been peer-reviewed and accepted for publication by a journal.
Preprint: The digital text of a paper that has
not yet been peer-reviewed and accepted for publication by a journal
INTRODUCTION
Writing is a method innovated by human beings to
preserve their intellect and carry it to the next generation. From ancient past
to present era of artificial computing, writing is a proven method to disperse
one’s experiences and share his knowledge to others. Through writing their
works and experiences, authors also want to gain fame among other people. Due
to invent of technology, dispersing of knowledge became easier. Publication of
journals for scholarly communication started. Its aim was to disseminate
research information, intellectual works and sharing knowledge among peer
groups. But, there are lots of barriers to serve the noble purpose.
Geographical distance, communication gap, lack of access, lack of awareness of
previous works etc. are some of them. To over come some of them, publishers
started commercially production and dissemination of scholarly journals all
over the world. This generated another barrier– access tolls. Now a day, most
of all the scholarly publications are controlled by commercial organizations
for profit making purpose. This controls access to scholarly publications by
mass, leading to duplication of works, wastage of time, money and energy.
During last decade of 20th century,
Internet became popular medium of communication, and become a newer platform of
publication. With advent of information and communication technology and
tremendous development in computing, Internet becomes very popular and
affordable to most of the people, even in developing countries. Simultaneously,
the world of publishing has undergone many changes. Purely paper based
publication is slanting towards electronic publications for some extra
advantage, like ease of retrieval and accessibility world wide. Publishers have
started e-copy service to patrons. Authors write their research outputs and
findings in articles. Government and other agencies fund for them. Authors
write for fame. Generally they do not get money from publishers for writing.
But publishers are the people who generate money for business with them. They
control access on those publications and force scholars to spend for it.
Library and information centers are the agencies responsible for providing
information services to scholars. To serve the purpose, they have to collect
articles and subscribe a very big amount to vendors as access toll. Increasing
cost of journals force library and information centers to cut their list of
preferred journals to cope up with their budget. This again generates barriers
to access the scholarly world of communications.
Open access is the only probable solution to this
problem. Different renowned persons worldwide has opined in favor of it.
Technologies have made it possible to setup electronic document archives
following internationally acceptable open standards and using open source
software. The output shows that, open access articles get more citation and
more read by authors. So impacts of open access articles are increasing in
scholarly communications. Though open
access movement has started worldwide, but till now, very poor in number and
volume. Many of the researchers are till now unaware of it. So, they are also
suffering from access problems. Library has an important role here to make the
activity popular. Institutions like Universities and research organizations can
also play a very important role. They can setup institutional repositories and
preserve their research output, and make it freely accessible worldwide. It
will again present their activities to the world. These can automatically act
as active components of a world wide scholarly storage area network, and in
future, will remove access barriers to scholarly communications to a large
extent.
The most important part of starting an institutional
repository is its policy and issues. Well planned policies can avoid many
unwanted problems generated during implementations of policies. More over,
policies should be made keeping eyes on future of the repository. Getting
advantage of latest technology and open source software may be an important
aspect to reduce costs and make it easy to start quickly.
Scope and coverage of the work:
In this dissertation, an attempt has been taken to
discuss different policies and issues relating to setup an institutional
repository including a touch to it’s technological
parts. After that, a brief guide line has been made regarding policies for
setup an institutional repository at ‘The University of Kalyani’.
Research
problems:
Prepare, “a guideline for setup an institutional
repository for ‘The University of
Kalyani”. This
includes discussion of different policies and issues including technical
aspects aiming that while going to setup the repository, university may not
need to do some major changes in it.
Literature
Study:
Various
literatures available over Internet on e-prints repositories, institutional
repositories, their various issues, policies, experiences gained by
practitioners are studied, including some forum news letters, too.
Relevance
of the Study:
Institutional repositories are demand of time. The
University of Kalyani authority is thinking of setting up it. They have to
discuss over various policies and issues relating to it. The dissertation has been prepared keeping
this in mind. If a model guideline can
be made for the University, they may adopt it without major changes in it. This
will help the University to avoid many unnecessary problems in future and save
a lot of time, too. This also includes some general discussions over various
issues and policies, which will help them to develop a primary idea about those
issues. This may be also helpful for other University or any institution going
for setup an Institutional e-print repository.
Objective:
Prepare a model guideline for setup an institutional
e-print repository for ‘The University of Kalyani’.
Hypothesis:
Preparation of model guidelines for setup of
institutional repository is possible based on studying existing literature
dealing with experiences of others.
Area of Study:
Open access, open source, e-print archives,
institutional repositories, standards related to them (OAI-MH), their legal
issues, administrative structure and policies,
Technicalities, open archive software, requirements
relating to hardware peripherals, etc. are studied. Then their scope of
applications in the University of
Kalyani is presented as a model guideline.
Tools
and Techniques of data collection:
Data are collected by searching Internet using Google search engine. Links of different institutional
repositories and articles available in some open access e-journals are also
used for data collection and software download. Personal resources are also
used.
Open
access:
Open access is where electronic versions of scholarly
materials are available free at the point of use to anyone who wants to read
the. Open access basically calls for scholarly publications are made freely
available to libraries and end users.
This can be done in two ways [Oppenheim,
Charles.2005. Open access and UK Science and Technology select committee report : free for all?. Journal of
librarianship and information science.
37,1. p4] :
Ø
Publishing in a
n open access journals, or
Ø
By depositing in
an electronic repository, which is searchable from remote locations with out
any restrictions in access to them, and use their resources for academic
purposes free of cost.
In
1989, the first open access (i.e. no subscription price) fully peer reviewed
electronic journal ‘Pcycoloqug’ was
launched. At present, there are around o thousand open access journals are
present over the web. Steven Harnad was the editor of
the journal.
At
present, Steven Harnad is one of the leading advocator
for open access e-print repositories. Repositories are good alternatives for
open access e-journals. E. M. Corrado [Corrado, E M. 2005.the importance of open access, open
source, and open standards for libraries. Issues in
science and technology librarianship. Available at http://www.istl.org/05-spring/article2.html
] sid that J. Willinsky has
identified nine aspects of open access as follows:
a)
E-print archives (author’s self-archiving pre or post
prints);
b)
Unqualified (immediate and full open access versions
of a journal);
c)
Dual mode (both
print subscription and open access version of a journal);
d)
Delay open access (open access is available after a
certain period);
e)
Author fee (authors pay a fee to support open
access);
f)
Partial open
access (some articles of a journal are available via open access);
g)
Per-capita (open access available to countries based
on per-capita income);
h)
Abstract (only
abstracts and table of contents are available for open access);
i)
Co-operational
(institutional members support open access journals).
The
advantage of open access is:
Ø
A moral /ethical
argument that it allows people all over the world to gain access at no cost.;
Ø
And, the
argument that it means the article is seen by more people & there fore, has
a greater impact.
Ø
It ensures long
time access to scholarly articles. Libraries and others can create a local copy
and repositories of such literatures, and can ensure continual access via their
repositories in distant future.
Ø
It’s message is diffused more widely than by subscription
based journals.
It
is observed that, articles that are available online free of cost are cited
many times more than those that are not available this way.
Open
access initiative
Open access movement is the worldwide effort to
provide free online access to scientific and scholarly literature, especially
peer-reviewed articles and their reprints (http://www.earlham.edu/~peters/fos/timeline.htm
). The concept was not very new, but the movement started in 1990. Steven Harnad, a renowned professor of Philosophy and first editor
of open access journal is the strongest advocator of it. LosAlmos
arxiv database – the oldest archive of pre and post
print of Physics is the oldest one and running successfully for more than 10
years. In the dawn of 21st century, it strengthened.
Due to the advent of Internet and telecommunication
technology, channels of communication among scholars worldwide have opened. So,
demand for access to all scholarly publications seems to become effective by
establishing e-print archives. Papers archived by authors in their
institutional archives & cross-search facility among such archives will
provide scholars access to the world of scholarly publications, irrespective of
their actual location. Institutional repositories will be interlinked to
produce global database of scholarly publications. To serve these purpose,
different archiving software are available to build such databases. To build a
data base, the most important thing is Meta data incorporation. Meta data are fields which are indexed and can be
searched.
Metadata:
Meta data is ‘data about data’. According to the Association of American
Publishers, “Metadata is information that describes content. An every day
example is a card catalog in a library, an entry in a book catalog, or the
information in an online index” [MICI Metadata Clearinghouse (Interactiv) (homepage). Available at http://www.metadatainformation.org/
] .But W3C thinks it as “Metadata is machine understandable information for the
web”[Metadata and Resource description. Available at http://www.w3c.org/metadata/
]. Though it does not bear the actual meaning, but for our purpose, this may
represent our need.
At presence, there are different metadata schemes at
work. But very popular of them is ‘The Dublin Core Metadata Element Set (DCMES)
‘. Some others are ‘The Visual Resource Association Core Categories (VRA Core)’ , ‘The Encoded Archival Description (EAD)’ etc. DCMES is a
simple set of descriptive data elements intended to generally applicable to all
types of resources. This is developed by Dublin Core Metadata Initiative. This
includes some qualifiers to enhance its’ scope of application. But till now, it
is not self sufficient to describe all types of bibliographic elements with all
necessary fields. So, for e-print
archives and repositories, it can not function to serve all the purposes. Local
variations can not be recommended for the sake of international data search and
interpretability.
OAI-PMH:
For the repository to provide access to the broader
research community, users outside the institution must be able to find and
retrieve information from the repository. Therefore, systems must be able to
support interoperability in order to provide access via multiple search engines
and other discovery tools. An institution does not necessarily need to implement
searching and indexing functionality to satisfy this demand [Crow, Rayam. SPARC institutional repository checklist &
resource guide. Available at http://www.arl.org/sparc/IR/ IR_Guide.html ]. it could simply maintain and
expose metadata, allowing other services to harvest and search the content.
This simplicity lowers the barrier to repository operation for many
institutions, as it only requires a file system to hold the content and the
ability to create and share metadata with external systems.
Interoperability
requires persistent naming, standardized metadata formats, and a metadata
harvesting protocol. The metadata harvesting protocol allows third-party
services to gather the metadata from distributed repositories and conduct
searches against the assembled metadata to identify and ultimately retrieve
documents. These mechanisms can be applied to any type of compliant e-print
repositories & digital library, creating a global network of digital
research materials.
The Open
Archives movement spawned the Open Archives Initiative (OAI), which was
established to develop and promote interoperability solutions to facilitate the
dissemination of content. The OAI is a
collaborative effort to develop interoperability mechanisms that facilitate
access to distributed digital content in the academic environment. The OAI
provides the framework for facilitating the discovery of content in distributed
repositories.
The OAI
developed a set of interoperability standards called the OAI Protocol for
Metadata Harvesting (OAI-PMH), which allows repositories to create metadata to
describe content stored in the repository and make it available to others who
wish to use it. The OAI OAI-PMH supports the
interoperability of digital repositories irrespective of type (institutional,
discipline-specific, commercial, etc.) or content.
Making repositories OAI compliant:
The OAI
maintains a list of OAI-compliant repositories from which OAI Service Providers
can harvest metadata. To participate in this process, a repository must
register with the OAI, once the institution's repository infrastructure is in
place. The OAI certifies that a repository is fully compliant by validating the
repository's metadata using a program that issues periodic OAI queries. Once
these checks are complete, the OAI confirms the registration with the
repository and adds the repository to the list of data providers.
The OAI
protocol requires that repositories offer the 15 metadata elements employed in
unqualified Dublin Core metadata. However, the
OAI protocol supports parallel metadata sets, allowing repositories to expose
additional metadata specific to the repository's specific needs. Repositories
that add domain-specific metadata sets to the Dublin Core should do so in
consultation with other repositories to ensure a standardized presentation of
these extended metadata sets.
Metadata
harvesting means gathering metadata. Data providers collect metadata from
archived e-prints. Again, service providers collect these metadata for
preparing a combined large searchable user-friendly interface. But, they can
gather metadata if archives are OAI-compliant. This whole process is popularly
known as metadata harvesting.
Data
providers and service providers
The OAI
framework posits a publishing model that separates data providers (including
institutional repositories) from service providers (metadata harvesters,
search/retrieval, and other value-added access tools). Institutional repositories
may serve both roles. Data providers provide metadata for harvesting. Service
providers gather all those metadata together and provide service with it. They
provide search facility for users. The efficiency of service provider thus
depends upon data providers also. So, it is data providers’ responsibility to
make their archives OAI-PMH compliant. Thus, together, both data providers and
service provides play crucial role to serve users.
The term eprint/e-print bears different meanings to different
people. EPrints glossary at http://www.eprint.org/glossary
says e print as “An electronically published research paper (or other literary
item).” They are electronic copies of academic research papers. Budapest Open
Access Archive FAQ says e-prints are the digital texts of peer reviewed
research articles, before and after refereeing.
These eprints are divided in to two
categories:
Pre-prints:
The
digital text of a paper that has not yet been peer-reviewed and accepted for
publication by a journal.
Post
prints: The digital text of an article that has been peer-reviewed and accepted
for publication by a journal. This includes
the author's own final, revised, accepted digital draft, the
publisher's, edited, marked-up version, possibly in PDF , any subsequent
revised, corrected updates of the peer-reviewed final draft. The watershed
separating preprints from post prints is whether they are before or after
peer-review and acceptance for publication
E-print includes both preprints and post prints.
They may be:
·
Journal articles,
·
Conference papers,
·
Research reports,
·
Book chapters bearing research output, and
·
Other forms of research outputs, etc.
An e-print archive is
simply an on-line repository of research output, either in preprint or in post
print form. These are collection of digital documents. Eprint.org defines
‘e-print archive’ as ‘an online archive of preprints and post prints. Possibly,
but not necessarily, running on Eprints
software’. Generally they are available
free of cost over the web. OAI compliant e-print archives share the same
metadata, making their contents interoperable with one another. Their metadata
can then be harvested in to global “virtual” archives that are seamlessly
navigable by one another.
E-print archives may be
institutionally –located and administered, in which they are usually called
institutional e-print archives. Or they may be subject specific archives
physically located at a suitable side and, commonly mirrored elsewhere. The
content is open to access by all. They
may be pre-print only archive. Or contain both pre-print and post prints.
Main purpose of e-print archives
is to provide access to scholarly publications archived there in. Institutional
e-print repositories provide scope of archiving:
·
Animation;
·
Article;
·
Book;
·
Book chapter;
·
Course materials;
·
Conference papers, posters, proceedings etc.
·
Dataset;
·
Learning Object;
·
Image ;Image,3-D;
·
Map;
·
News letters;
·
Plan, Blueprints etc.
·
Preprint;
·
Presentations;
·
Recording, acoustical; Recording, musical; Recording,
oral;
·
Research reports;
·
Software;
·
Technical Report;
·
Thesis and Dissertations;
·
Video;
·
Working Paper;
·
Others;
It helps to:
·
Preserving materials,
·
Self archiving,
·
Increase impact of research outputs,
·
Shows productivity of the organization,
·
Increase access to archived materials,
·
Disseminate information in a faster way,
·
Provides scope for enhanced citation analysis, etc.
Thus, e-print
archives may become another face of the institution as well as research workers
to the web of scholarly publications.
We know, e-print = pre-print +
post-print. Post-prints are those articles published in some peer-reviewed
articles some time in somewhere, or accepted for publication. That means the
write-up has gone through some screening and reviewing process. This means
information content in that piece of writing is authentic, and accepted by a
group of peers. Researcher may rely upon them without hesitation.
Those pre-prints who, yet have not
published or accepted for publication in any peer-reviewed journal implies,
authenticity of the content may subject to criticism. As the content is not
discussed among reviewers and peers have not comment on it, researchers
hesitate to use those data because it may arise questions of authenticity and
reliability of their own work. Thus, it
loose citations. To handle this problem, institutional repositories may follow
some reviewing policy like
But, this
will arise some questions like-
It may be
subject specialists or teachers of that subject in that institution. Specialists and educationalists from other
institutions may be involved, if possible. This may be totally voluntary
service for the sake of knowledge enhancement.
This
should be prepared by a group of experts and experienced reviewers, and subject
to be revised in course of time.
Ø
Date line of review
Ø
No. of specialists engaged in review process
Ø
Impartiality of reviewer,(this
question arises in case of institutional repositories. As reviewer from that
institution will know each subscriber personally; this may influence him to be soften or harden in case of some subscriber. Involving some
specialists from different institutions or reviewing by more than one
specialist can reduce this chance.)
From perspectives of archiving agencies and
materials, e-print may be distinguished as
1.
Subject based archives/E-print archives,
2.
Institution based e-print repositories.
In subject
based archives, only documents dealing with the particular subject are
archived. Their model of collection is centralized, and they try to collect the
entire document published on that subject. A good example of this is Los Almos Arxiv Database – a pre and
post print repository of articles covering various branches of physics.
But,
institutional repositories are setup to archive and providing access to
publications of institutional members. This is a way of measuring total
productivity of that institution. Distributed, institution-based self-archiving
benefits institutions in different ways [p.28…..]:
a)
It maximizes the visibility and impact of their
refereed research output.
b)
It maximizes researcher’s access to the fully
peer-reviewed research output.
c)
By providing such access, library can reduce their
subscription to serials to some extent.
d)
This highlights research activity of that institution
at a glance.
Administrative Issues
An
institutional repository (IR) is a digital archive of an academic institution's
intellectual output. Institutional Repositories adhere to an open access model,
by centralizing and preserving the knowledge of an academic institution and
making it accessible to anyone with internet access..
But setting up IR is not a very tough job. The most important part of it is
preparing a fool proof plan and then executing them. This involves various
steps enlisted below.
First, we have to decide it’s
purpose. Institutional repositories are not discipline-specific, and aim to
archive the entire range of a university's intellectual output. So, specific
requirements are to be jot down. Some of them are:
Ø
Open access nature: the archive should be accessible
to any person have internet connection.
Ø
To form part of a larger global system of
repositories,
Ø
Should be indexed in a standardized way,
Ø
Should be searchable using one interface and be user
friendly,
Ø
Should provide foundation for a new model of
scholarly publication,
Ø
Should help to built a worldwide distributed database
of scholarly publication,
Ø
Should help to development of knowledge,
Ø
Can submit from remote places,
Software:
Based on all the needs of the institution,
software should be selected. At presence, different software are
available for this purpose, e.g. CDSWare, Dspace, Eprints, Fedora etc. Some
price based customized software are also available and distributed by vendors
on specific conditions. But during last decade, a lot of open source software
became available free of cost over internet. Their supporting software (web
server, programming language, compiler, database builder etc.) are also
available over internet free of cost, and most of them are open source too.
Most of the webs browsers are also support them. [A brief comparison of different
institutional repository software is available in ‘OSI Guide to Institutional
Repository Software v2.0]’. These are working very well and lots of different
repositories and open access e-print archives are running for a long time. As
they are pen source, one can customize them as per requirements. So choice of
software and corresponding operating system will not be a very trivial job.
Hardware:
While once the software and operating system
is decided, corresponding hardware requirements are to be checked from their
web sites. It is found that, in general no specific requirements are mentioned
in different software’s sites. But service speed and reliability of archive
depends quality of hardware peripherals. So, it can be opined that, a latest
configuration with a very high speed processor, big volume of Random Access
Memory (more than 1 GB), high capacity SCSI Hard Disk Drive with high rotation
speed (7200 R.P.M) may ease its functionality. More than one physical hard disk
drive is recommended for security purose. More than
one HD may be used. In that case, crash of one physical disk will not damage
all data. A good speed modem is essential also.
Network
Infrastructure:
Existing
network infrastructure should be considered. IR server requires a 24 x 7 connection,
with high internet speed. This server is required to be available over LAN and
Storage Area Network (SAN) [OSI Guide to Institutional Repository Software
v2.0] also.
Customizing Software:
While
selecting software, some more points [OSI Guide to Institutional Repository
Software v2.0] should also be considered-like
Ø
Programming in which it has been developed: they are
mostly PERL , PHP and java.
Ø
Staff requirements: UNIX systems administrator, Java
programmer, PERL programmer, Python programmer, Network knowledge etc.
Ø
How much software are to be installed,
Ø
Avail able in package or requires separate download,
Ø
System registration,
Ø
Allowing registration with specific interests,
Ø
Help for pass word recovery for users forgotten their
account password,
Ø
E-mail alerts to registered members about presence of
new archived material related to their interest,
Ø
Requires distribution license or not,
Ø
Content submission, administration etc.
Ø
Submission support: through e-mail notification to
administer or personalized system access to registered users for submission
etc.
Ø
Ease of content export – import, size of file
uploading restrictions, file format restrictions, support of multiple comprised
files together, etc.
Ø
Metadata support,
Ø
Indexing facilities : limited in words or supports
full-text indexing,
Ø
Supports modification of user interface or not,
Ø
Multiple language interface supports or not,
Ø
Discussion forum support,
Ø
Search facilities like: Boolean logic, truncation,
wildcard in metadata and in full texts etc.
Ø
Browsing by: authors, title, subject, issue date,
collection type etc. should support.
Ø
Scope of customizing search facilities by
administrator for whole database,
Ø
Cross searching in multiple databases at a time,
Ø
Searching in more than one language database at a time,
Ø
Availability of help desk, etc.
Availability
of all these are not mandatory, but will be helpful to provide best services
available to users.
Metadata:
Institutional
repositories contain various types of bibliographic materials, like articles,
dissertations, thesis, research reports or even study materials. To make them
searchable, Institutional repositories must incorporate, index, and search
items from diverse collections in diverse formats. They have to deal with
writings of different levels (e.g. dissertation for Masters Degree and for M.Phil, PhDs etc.). They have to deal with standard
vocabularies from many different fields of study; and include metadata to all
types of contents. Unqualified Dublin Core (http://www.dublincore.org/
) is the minimum metadata required for OAI interoperability [A Guide to
Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/projects/institutional_repositories/setup_guide-e.html];
however, depending on the type of content in the repository, may include other
metadata sets.
OAI is based
on exchange of metadata. So, to make the archive effectively OAI compliant,
right metadata incorporation is essential. Most of the repository software is
OAI compliant, so Dublin Core Meta data element set can be used in
general. But it may not work well for
some types of publications like research papers, thesis etc. In India, UGC in a
Higher Education Information Systems Project (HISP) [available at http://www.ugc.ac.in/new_initiatives/hisp.html
] has planned to develop a knowledge repository. They opined to develop a
mechanism for tracking academic information resources such as learning
resources, curricula, question banks, national theses etc., published in
various formats through systematic, internationally used metadata data
framework for tagging such resources. They have taken initiative to create a
Knowledge Repository for communities of teachers and researchers in the
Country. So, this could be helpful to rid over this problem, as this will
implement that metadata set all over India.
Document
Types To Be Archived:
Institutional archives may decide to
incorporate documents like:
Ø
E-prints,
Ø
E-books,
Ø
Working papers,
Ø
Journal articles,
Ø
Pre-prints and post prints,
Ø
Thesis and Dissertations (of various levels),
Ø
Research and technical reports,
Ø
Departmental and news centers newsletters and
bulletins,
Ø
Project reports,
Ø
Seminar volumes,
Ø
Conference reports,
Ø
Important guidelines/instructions,
Ø
Committee reports and memoranda,
Ø
Papers in support of grant applications,
Ø
Surveys,
Ø
Technical documentations,
Ø
Study materials/course materials for different level
,
Ø
Photographs,
Ø
Audio/video recordings,
Ø
Statistical reports,
Ø
Different supplementary information’s of University
publications, etc.
All these may be collection types of
the university Institutional repository.
But all of them should have following
[Crow, Raym. 2002. Institutional Repository:checklist & resource Guide.(Washington,
DC: SPARC).
Available from http://www.arl.org/sparc/
] characters:
Ø
Scholarly—the material is research- or teaching-oriented;
Ø
Produced, submitted, or sponsored by an institution’s
faculty (and, optionally, Students) or other authorized agent;
Ø
Non-ephemeral—the work must be in a complete form,
ready for dissemination;
Ø
Licensable in perpetuity—the author must be able and
willing to grant the institution the right to preserve and distribute the work
via the repository.
Materials that satisfy the above requirements might
include working papers; conference presentations; monographs; course materials;
annotated series of images; audio and video clips; published (or pre-published)
peer-reviewed research papers; and supporting material for published or
unpublished papers (for example, datasets, models, and simulations) etc. While
repository content may thus be defined broadly, some repositories may elect to
focus initially on text-based materials, even though they anticipate broadening
coverage over time. Additionally, in the interest of encouraging participation
and acquiring material to populate pilot and demonstration projects, some
repositories may choose to adopt more relaxed (and possibly temporary)
guidelines for content in the repository’s initial stages.
Document Format:
Generally it is found that archive software
supports Postscript, PDF, ASCII, HTML, etc.
Ø
Postscript: PostScript (PS) is a page description
language used primarily in the electronic and desktop publishing areas. There
are a number of advantages to using PS as the display system. It helps in
printing the document, allows for the "dumping down" of printers. But
the main advantage in using PostScript as a windowing system is that it allows
one to write desktop publishing (DTP) and other graphically-intensive
applications with a single set of graphics routines. The same code that is
drawing to the window can be used to draw to the printer without any
translation. DTP applications on traditional systems require the programmer to
construct the GUI editor in the platform's own graphics system (for example,
QuickDraw on the Macintosh, or GDI on Microsoft Windows) and then write
additional code to translate the graphics into proper PostScript for printing.
This often takes up the majority of the programming effort on such projects and
is a major source of bugs [Postscript from Wikipedia,
the free encyclopedia available at http://en.wikipedia.org/wiki/PostScript.HTML
] .
Ø
PDF : Portable
Document Format (PDF) is a file format developed by Adobe Systems for
representing documents in a manner that is independent of the original
application software, hardware, and operating system used to create those
documents. A PDF file can describe documents containing any combination of
text, graphics, and images in a device independent and resolution independent
format. These documents can be one page or thousands of pages, very simple or
extremely complex with a rich use of fonts, graphics, colour,
and images. PDF is an open standard, and anyone may write applications that can
read or write PDFs royalty-free.
In addition to encapsulating text and
graphics, PDF files are most appropriate for encoding the exact look of a
document in device-independent way.Free readers for
many platforms are available for download from the Adobe website (www.adobe.com/products/
acrobat/ ).PDF is primarily the combination of three technologies: a
cut-down form of PostScript for generating the layout and graphics, a
font-embedding/replacement system to allow fonts to travel with the documents,
and a structured storage system to bundle these elements into a single file,
with data compression where appropriate. [Portable Document Format from Wikipedia, the free encyclopedia. available at http://en.wikipedia.org/wiki/PostScript.HTML]
Ø
ASCII : the term
stands for American Standard Codes for Information Interchange. This is
independent of platforms and application software. Any piece of writing can be
done in this format, but has some limitations too. It can’t embed image or
graphics and links in it, and can’t be made looking attractive.
Ø
HTML: stands for hypertext Markup Language. This is
an open standard used to create web page. The major advantage of it is
Hyperlinks to connected pages. This is the most widely used web formatting
language and easy to use. But the structure varies depending on browsers.
All these formats can be used to
accept writings . XML or MSWord formats can also be
accepted. PDF should be preferred and encouraged. Else, the administrator can
change other formats to a PDF file format. This should be clearly mentioned in
the guide to submission section.
Subject
Headings:
Incorporation of subject headings is
the most crucial job. Depending on its efficiency, recall precision ratio of
search through subjects largely depends. So, Identifying useful set subject
headings is one of the major challenges for repository implementers. In
institutional repositories, various subjects are to come based on nature of
publication. More over this is going to archive research papers which generally
deal with very micro thoughts, some times again of inter disciplinary subjects.
So, no existing list of subject heading can exhaustively produce exact subject
headings for them. Broad subject headings may be appropriate for a single
institutional repository. In this case, LCSH can be used. However, as access to
institutional repositories becomes federated, it becomes more problematic [A
Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/
projects/institutionalrepositories/setupguide-e.html]. A user can’t
profitably browse papers from a variety of repositories that use very different
subject terminologies for representing a single concept. So, while considering
world wide accessibility and cross searching facility, one has to think over
internationally acceptable and widely used subject headings of any subject. Therefore,
international conferences or discussions should be done on that matter.
Another way
to rich uniformity is developing an open Subject heading list cumulating widely
used subject terms in standard forms in international level. This should be
accompanied with an exhaustive Vocabulary control device (e.g. thesaurus)
containing all local variations of the standard term used. The software should
be compliant with that subject heading list and should include that thesaurus
within it. Whenever a query comes in nonstandard form, it should simultaneously
convert it into standard term and recall all the entries done in that terms
associated. Another option of incorporating a list of standard terms should be
included with hat software interface, so that one can select terms from that
list.
In India, UGC may
take initiatives to prepare guidelines for incorporating subject heading lists
for institutional repositories. Only this way, they can bring uniformity among
different institutional repositories spread over India. Another way
is incorporating a set of descriptors that are helpful for both specialists of
that field and new learner of that subject [A Guide to Setting-Up an
Institutional Repository, available at http://www.carl-abrc.ca/
projects/institutionalrepositories/ setupguide-e.html].
But the chance of high recall exists in that situation. Until any initiatives
are taken to form freely available popular list of subject headings in all
subjects reflecting even up to micro thoughts, this seems to be the only
acceptable solution.
Some
software, like Eprints loads their subject headings
hierarchy in the database and it is very laborious job to alter them after
uploading some entries on them. So, before starting uploading, a good number of
collections should be collected together. This can help to rid over
repetitiveness and corrections on Meta tags of
entries. So, selection of subject heading list is essential and any modification
should be done at starting phase. Moreover, before selection some existing
archives should be checked for their information of selection of subject
heading lists. This can help a lot to develop policies.
Preparing Committees
Keeping the
above said reasons in view, committee structure should be chalked out. The committee may be structured in two
layers. The highest and most powerful body-the Executive Committee should
consists of VC of the university, Selected members of the faculties and Union
representatives of Teachers, students and researchers. High level
administrative officials, Finance officials, legal officer and other selected
member from administrative body should be included in this Committee. Chief
Librarian will represent library staffs there.
In working
group also, some faculty members and legal officials has to be incorporated.
But this committee will be headed by chief librarian and assisted by
experienced library staffs who will operate the
program. (Details included in Committee structure recommended for KU). [Crow, Raym. Institutional Repository:checklist & resource Guide. (Washington,
DC: SPARC). Available from http://www.arl.org/sparc/
].
Securing
Faculty Participation & Administration Support
Institutional repositories offer considerable
benefits to the institutions that sponsor them and to the faculty, researchers,
students, librarians, and others that participate in them. At the same time,
institutional repositories might encounter resistance from administrators,
faculty, and others who either fail to understand the benefits that such
repositories can deliver.. Equally, understanding and
systematically addressing the objections raised to repositories will prove
crucial to faculty participation and to the ultimate success of each repository
implementation.
The perceptions and attitudes of university
administrators are critical to gaining the support necessary to validate a
repository’s standing within an institution. Even where a repository is
implemented and managed entirely as a library initiative, the nature and extent
of the efforts required to gain faculty awareness and participation in the
repository presuppose the buy-in of an institution’s administration and its
willingness to reallocate resources and/or provide additional funding. The rationale for universities and colleges
implementing institutional repositories rests on two interrelated propositions (SPARC) one that supports a
broad, future oriented benefit and another that offers direct and immediate
benefits to each institution that implements a repository. Administrators
secure fund for any type of initiative. They can take decisions for taking new
proposals for advancement of the institutions. They can permit or deny it. They
have the power of implementation of rules. The highest body of the University
in its’ court meeting can modify rules according to their requirement. So,
their role in building Institutional repository in a University is very
important. They can be interested to setup Institutional Repositories in the
University if the library can convince them about its advantages. Some of the
issues may be:
Ø
Increasing costs of journals: Libraries subscribe for
different journals publishing on specific subjects. They want to provide
researchers with latest developments of their area of interest. But the major
barrier to it is high cost of journal subscription. Both printed and online
journals are used today, which demands a very big amount to spend every year.
Thus, high cost of journals forces libraries to restrict them within a very
short list of choice. Even big libraries can’t go for every journals of any
specific discipline. Those articles, which are published in those journals, not
purchased remains unavailable to researchers. Thus they loose a very big number
of publications which may be relevant to them. If every institution sets up an
institutional repository and make them OAI-PMH compliant, so that, every
archives could be searched, then the access to research outputs will be easier
and almost free. This will help to reduce libraries’ journal budget. This can
act as a potential future cost savings as the marketplace responds to
institutional initiatives; adducing the direct benefits—both tangible and
intangible—that a successful repository delivers to its host institution. This
can help institutions reaching corresponding industries to come for their help
in R&D, and recruiting their students/ scholars for troubleshooting. After
all, the administrators have to pay something if the institution is to retain
its high stature and reputation for innovation.
Governments and institutions fund for research.
Publishers publish them in journals and sell them to make profit. In most of
the cases, authors are getting no monetary benefits from that article. But
while the library goes, they have to pay to purchase that journal (including
that article). Thus the same agency that funded the researcher for research
work has to pay again for the same output in published form. Here publishers
are getting profit for just publishing and distributing them on demand. This
duplicate expenditure can be avoided if an institutional repository is set up.
The researcher can publish the output in any journal, but he has to submit one
copy in IRs, which will be freely available to all,
and thus reduces the expenditure in long run.
Ø
Ensuring barrier free access: Hence, this IRs will be OAI compliant, every person having internet
connection can access to it. This will be indexed in index of web crawlers and
will be accessible to everyone. Cross searching among different repositories
and different databases connected through worldwide registration will be
accessible. Thus, every body interested can access to resources archived in IRs. Moreover, it can be said that the repository as a
long-term investment in changing the structure of scholarly communication helps
change the current scholarly communication model—and weaken publisher
monopolies on faculty generated contents. That can ensure barrier free access
to members of that institution, and in future it can reduce restrictions on
access to scholarly publications.
Ø
Institutional visibility and prestige: As producers
of primary research, it is only to be expected that academic institutions would
take an interest in capturing, disseminating, and preserving the intellectual
output of their faculty, students, and staff. Currently, much of each
institution’s intellectual output is diffused through thousands of scholarly
journals. While faculty publication in these journals reflects positively on
the host university, an institutional repository concentrates the intellectual
product created by a university’s researchers, making a clearer demonstration
of its scientific, educational, social, and economic value. This brings the
institution to the world. Those universities having IRs
will be enlisted in repositories soft ware registration list like those are
running in developed countries. This will make all aware of existence,
productivity and relevance of the research work from different organization. An
institutional repository and supporting metrics provide university
administrators with demonstrable evidence of the institution’s quality.
Institutional repositories help university and college administrators—including
Development and Marketing officers—reinforce an institution’s brand position
and prestige.
Ø
New platform of getting to the world: While
institutional repositories centralize, preserve, and make accessible an
institution’s intellectual capital, at the same time they will—ideally—form
part of a global system of distributed, interoperable repositories that
provides the foundation for a new disaggregated model of scholarly publishing.
Ø
Ultimate future of the publications: Experts says
that Institutional repositories have a bright future. It is considered to be a
well known platform of archiving research output and making it accessible
barrier free to all interested for a long time. To form a bridge of global knowledge base,
institutional repositories will work as bricks of them.
Administrators
can support IRs by :
Ø
Funding for setup institutional repositories. This
includes startup cost and continuous expenditure for internet services and
hardware peripherals, staff trainings, new recruitments (if necessary), digitization
of older thesis/dissertations, advertising, organizing seminars etc.
Ø
Preparing new rules: this may be essential to gather
all scholarly publications by the authors. Such as, submission of articles’ one
e-copy could be mandatory for getting next allotted fund for research to
scholars. This will force them to submit one e-copy of their writing to the
university’s institutional repository.
Ø
Implementing those rules to every concern-student,
Research scholars, teachers, other staffs etc. Watching whether some body is
avoiding submitting a copy.
Ø
Modify rules according to needs: if it is found that
enough scope has left to bypass the rules by any concerned, then those rules
may required to modify. For e.g., a student writes some article during his course
of study and publishes it in some journal, but does not willing to submit a
copy in IRs. These types of situations can be avoided
with well thought rules, strong implementations of them, and a very good user
education. There are different ways to
make people concern about the benefits of IRs.
Ø
Helps to rid over intellectual property issues: This
is a headache of a lot of people in this electronic era. Publications become
easier over internet in this time. So, one can prepare any document on any
topic by coping from others and do not mentioning them in references. This is
simply theft this can be avoided by making persons aware about what references
are and why they are to be added. But
the most important question lies in other section. Author writes and sends to
publishers for publication. For publication, they have to signature in some
sort of declarations. Publishers generally sign them in such statement that the
author has not submitted any other copy for publications else where, and can’t
publish it somewhere else without prior written permission of publisher. This
may stop authors for subscribing the same piece of writing in IRs. But the fact is , IRs are archive of the article, not a publication.
Else, to avoid unwanted situations, preprints may be
accepted. After review and publication in any journal, authors can modify
preprints or add some more relevant information and update their database. They may also add some sort of addenda to
show changes of rectification. This trick can’t be protected by publishers and
seems to make output freely available. Some publishers allow submitting post
print articles in some specific conditions. So, while selecting a publisher for
sending an article, one can check his policies. Library can compile a list of
publishers allowing post print submission.
v
Software administration policies: this involves
various aspects and policies, but largely depends up on nature of software
used. As IRs software is open source and permits
customization, specific requirements can be adjusted as per local requirements.
So local variations are possible and relating policy changes.
Ø
Author’s registration policy: every author has to
register by filling up a simple form send from the software. This is an
authentic process of communicating to a person concerned. Here comes another
question. Who can register? in case of institutional repositories, it may be
restricted
·
within students, teachers, research scholars and
staffs of that university;
·
may extend to old students or faculty members;
·
may include concerns of colleges undertaken of the
university;
·
Or may permit any person within a geographical area
(say within the State) having a minimum specified qualification; etc.
For the first
three cases, university registration number may be the parity to be asked for
with residential address and communication number. For the forth case, It may be any proven identity card’s e -copy (like electoral
id card or passport or a letter from the employer of that person proving
authenticity of that person’s skills and qualification, or from where he got
his PhD etc) may be asked for. But this may seems to be a barrier for
submission and recommended not to imply such barriers.
Ø
Submission policy: distributed submission and
centralized uploading. This means authors having registration can submit their
copy, and administrator will check metadata incorporated, standardize subject
terms, change file format if required (authors should permit that in shake of
technical ease and policies). Then that writing may be uploaded to the database
by the administrator. Simultaneously, it should also be informed to author
through e-mail.
Ø
Editing policy: Any types of post print do not
require any editing. Dissertations, thesis etc. are presented and verified by a
well organized body of academicians. So, they again do not need to be edited.
But, if authors want to modify some portions recommended during presentation
and evaluation process, it could be done as errata/additional chapter and
attached separately with the document. As IRs is
proposed to incorporate preprints, editing becomes an issue of discussion. Any
preprint that has accepted for publication and just a mater of time to come
out, again need not to edit. Because, it has undergone through a screening
process by some authorized body (incase, submitted in some peer reviewed
journal). But while, it is just submitted, and has not gone through review
process, this requires comments of editorial board.
·
Editorial board: The University should form an
editorial board consisting of senior teachers, research guides, academicians
having editing experiences and subject specialists of the University. They may
incorporate specialists from outside the institution. But the work will be
totally voluntary and interest of concerns is highly expected.
·
As, various
subjects are taught in a University, and researches are a part of its
activities, it will face a lot of different types of writings in different
subjects. So, it is simply not possible to form a very large body of editors
comprising subject specialists from every subject field in more than one
number. So, a core editing committee with experienced editors and senior
professors/deans of the faculty are recommended to form. They will send
requests to other concerns to help as guest editor while required. A list of
potential editors/subject specialists have to prepare for that purpose.
·
Editing should be done by the core committee and at
least two different specialists, one from the University and another from
outside the University. This will make the process more acceptable, and avoid
any types clash in view with existing specialist and the author (as they might
be known to each other and their view may not be matched. Two is better than
one.) The committee may ask for some sort of changes before uploading.
·
This is a lengthy process and a trivial job, too.
This is important to keep the standard of materials in the database. But
another question arises simultaneously- whether this will be considered as
University publication or not, because this is edited by a body formed by the
university authority and it can recommend for changes in the writing.
·
This total process of editing can be avoided by
simply denying accepting pre-prints that has not accepted for publication till
date of submission. But this will hinder purpose of archiving. More over,
publication is a very lengthy and time consuming process. Delay in publication
may lead to duplication of work. Another way
of bypassing the trivial process is to mention them as preprints. But
again, researchers would not rely upon their data, and may be misguided, if
published some where.
Therefore,
before selecting types of materials accepted, the highest committee has to
decide editorial policies and prepare a clear management policy for archiving.
This would also add some advice/conditions to the authors about updating them
after publications. They also have to decide weeding out policy. If post print
or updated version is archived, they can remove preprints. They can decide to
remove some sort of publications seems to be invalid, for e.g. older rules and
regulations while newer comes and implemented.
Ø
Metadata incorporation: who will incorporate
metadata, is an important question.
o
Authors can do it by simply filling up some standard
form available in the data base, and administrators will check them. This can
be done while authors become aware of how to it. But in the beginning time,
Library personnel can do it for authors and show them how to do it..
o
If enough library staffs are available, the total process
can be done by them. They just get required information from authors and fill
up Meta tags. This will lead to more
authenticity, and can avoid verifying.
o
Library staffs can also work for authors as ‘proxy’. In such situations where Meta tags are to
be filled in by authors, and they are unable to do so.
v
Standardized indexing: this is a process required to
make data base effectively searchable. A good index can enhance
recall-precession ratio. These features are come bundled with the software. It can
be also customized based on its’ requirements. As a library and information
science student, I won’t recommend for free text indexing, as it will produce
high recall and remove effectiveness of good subject headings.
v
Searching: Searching can be broadly divided into two
sections. First, Meta data searching; and then full text
searching. Meta data searching will be done by
service providers to facilitate cross searching among all databases connected.
Full text searching will be done by users. Both should support Boolean,
truncation, wildcard and any term searching. Search for Meta tags not
indexed should also be facilitating.
v
Maintaining, backup creation in a regular interval,
updating backup, mirroring sites, indexing through widely used search engines
and directories –like google, yahoo; enlisting in
scholarly publication search like Google scholars,
etc should be ensured.
Advocacy
Methods
Advocacy
methods may be distributed in two regions.
A. Within Institution
B. Outside the
Institution
This advocacy
is not a one-time job. Libraries and institutions should have to do it
continuously every year as a part of their user education activities. This will
make it aware every new comer to the institution. In Universities, fresher
welcome ceremony may work as a platform for informing new students about the
repositories. Every departmental head
may inform students about it in their first address to new batch. Library may
handover them a leaf let when they go for their user’s card. With user education, library may include
discussions about IRs.
Library may
place a notice board above their computerized catalog /catalog cabinet written
in attractive colors describing how to use/subscribe in IRs.
Library
should take initiative to help/guide writers to post their writings in e-print
in beginning.
Authors write to share their
experience and knowledge on a particular issue/branch of knowledge aiming to be
known among peer groups. They want to be considered as human resources on that
particular area of study. They want opinions of their peer group on their work.
That leads them to the hall of fame. To day, an author’s success is measured by
not only volume of work they produce or number of publications on peer-reviewed
journals, but also through the number of citations they received. An author,
while writing a research paper takes help of a number of documents and finally
quotes them with it’s bibliographic details. Citation
implies a relationship between a part or the whole of the cited document and a
part or the whole of citing document. Thus citation is acknowledgement that one
document receives from another. [Bibliometric studies : on Indian library & information science
literature / Gayatri Mahapatra.
– New Delhi : Crest, 2000 p7] Therefore, one of the aim of authors is
getting more citation.
One of the primary conditions of getting more
citation is to reach almost every person interested on that topic for a long
time. If authors go for traditional print version only, due to limited
circulation they can’t reach a major portion of peer groups. Again subscription
based online publications also have limited access problem. Raising
cost of serials/database/ online journals has created ‘serial crisis’. [Callan(Paula).The development and implementation of a
university-wide self-archiving policy at
Queensland
University of Technology (QUT): Insights from the frontline. In
Institutional Repositories: The Next Stage. Workshop
presented by SPARC & SPARC EUROPE, November 18–19, 2004, Washington,
D.C.]. Even large
institutions can’t afford all the core journals of any specific subject. So
they have to cut list of journals to cope up with their budget. This causes
access barrier to scholars. Beside cost, limited circulation, restricted
access, use condition, retrieval problem, time restrictions etc. are other
barriers of disseminating writings among peers. More over they
can’t satisfy needs of future user community for limited archiving policy. Open
access journals are widely used, but they also have limited archiving policy.
It is found that, after a certain period, they only provide content pages of
back volumes instead of their full text. Thus it looses some extent of
citation.
Again,
readers can’t access or even be aware of existence of many publications of
their interest. So, a chance of repetition of work, loss of time, money and
energy, and wastage of manpower and intellect slows down development of society
and knowledge. Thus readers also suffer a lot. Libraries can’t gather all
publications and so, their services are also restricted within a very narrow
lane.
Open
access journals have brought some fresh air in this restricted environment. But
different Open Access Journals (OAJs) have
their own policies, own conditions and limitations- for e.g., a limited
archiving period. So, it is next to impossible to publish all works of any
institution in any OAJ. Moreover, they do not publish dissertations and thesis
etc. Some official decisions, important work guides (e.g. guidelines for Ph.D), tutorials, etc. may also needed to be archived and
accessed by all.
Open
Access Institutional e-print archives/ Repositories are the only solution in
this situation. Open Access institutional repositories can:
Ø
Collect all publications of the members of that
institutions,
Ø
Organize/process them in a standard way,
Ø
Show total output of the institution,
Ø
Made them accessible to all through internet,
Ø
Accessibility will be of 24x7 type, unconditional,
barrier free,
Ø
Could be indexed by search engine spiders, and
indexed in their web index,
Ø
Could be registered and allow users to search from
any part of the world,
Ø
Archive them for longer period,
Thus, IRs confirms enhanced accessibility of publications. It
also helps readers to find all relevant things together. It also enhances scope
of getting more citations for authors. Steve Lawrence [Lawrence(Steve).
Online or invisible. Available at :
http://www.neci.nec.com/~lawrence/papers/online-nature01/
] investigated the impact of free online availability by analyzing citation
rates. He observed that, more cited articles on ‘Computer Science’ are
available online. He said, online articles may be more highly cited because
they are easier to access and thus, more visible and more likely to be read. He
opined that free online availability facilitate access in multiple ways,
including online archives, direct connections between scientists or research
groups, hassle-free links from e-mails, discussion groups, and other services,
indexing by web search engines, and the citation of third party search
services. Free online availability of scientific literature offers substantial
benefits to science and society. In IRs, all the work will be freely available to all
searchers. Again this will enrich ability of library and information centers to
find scholarly publications over the world and produce tailored personalized
services to each individual user.
In future, while all institutions will
setup their own repositories and enable cross search facility among them, all
intellectual output/ production of this planet will form a large,
exhaustive and exclusive bank of
knowledge and make access totally
barrier free.
Intellectual Property Rights
This is a
major concern about IRs. Generally, following issues
rises with it:
·
Publisher’s permission to submit a copy somewhere
else;
·
Copy or
duplication/theft of thought content;
Generally,
most of the publishers do not allow submitting a copy elsewhere. This arise a
problem to authors. Some times, they demand subscriptions from authors to make
their work freely available.
Duplication
of thought content without citing the work is just theft of the original work.
While institutional e-print repositories are going to available to every person
worldwide, it will increase scope of such works called plagiarisms. These will
thus, violets intellectual property rights. Only awareness and truthfulness of
users are the solution of the problem. They have to make aware that, they can
use data from those writings, but need just a citation.
Research
institutions, Government of a country fund for R&D activities. So, they may
form rules to subscribe a copy in institutional repositories.
What will be
the essential quality of the writing to be archived- is a major issue. It may be :
Preservation Policy
Preservation
is another important question for archiving issues. In case of printed media,
their preservation ability is proven to last long in course of time. But
digital Medias are of a very small period, and not proven as more reliable than
printed media. Here come two types of factors while preserving in digital
media, longevity and technological support. Longevity has not been yet proven
as it is a newer one. File format is an important issue relating to
technological issues. Tremendous growth
of technology brings newer version of same software in almost every year. It is
feared that, after a decade or two, present formats may not be readable any
more due to lack of technological support. Hard ware peripherals may also
change to a large extent. More over, issues like 9/11 and Tsunami also proved
that everything in the web will last for ever- is wrong interpretation. So, an issue relating to remote backup also
rises. Preservation through multiple copies in distant places is also another
thought. Some thinks, a large scale power failure and viral activities, hacking
and intentional human activities may destroy the database.
But, most of the fears are of
accidents or factors of chances. Even incase of paper media, nobody can
ascertain that it will last for ever and provide access to all concerned. It is
found that, after a long time, paper works requires special preservation techniques
and restricted careful handling. This hinders one of the most important factors
of preservation-accessibility. If anybody interested can’t use the document,
then what is the utility of preserving them? Digital media is better option in
this case. Library may decide to preserve one in digital media without
restricting its accessibility, because it is easier to copy and circulate
without affecting the original archive copy. More over, retrieval in archives are much easier than finding out one printed article from a
heap of back volumes.
In institutional repositories, data
are stored in a software independent format and migrated through successive
hardware regimes. Data is stored together with the hard ware and software
required to make or use it. So, it is found that, data preservation is easier
in institutional repositories/archives than in individual digital medium.
But, there
should be a strong backup policy. It should be done in a certain interval on a
regular basis. It can be relied on optical media, or magnetic media or remote
hard disc backup connected through network. There should be some good quality
antivirus and fore walls for protecting data. To avoid unnecessary situation
relating to power failure, a large inverter backup should be there. If data
looses due to some unexpected situations, server in-charge should try to update
it from preserved data base, or may consult with specialists to recover data in
case of loss of backup, too(unexpected situations –like natural calamity).
Faculty Workload
The work of
starting a repository is a vast job and the workload needs to distribute among
faculties and library professionals while dealing with back logs of author’s
writings.. Although the archiving software is associated with author
self-archiving, self-posting through the system requires several steps. Given
the significant disparity of technical proficiency amongst faculty, potential
contributors might not have the expertise— or the inclination— to deposit
materials themselves. Not surprisingly, then, early repository implementers
consider library mediation of content submissions to be the only practical
method of managing the archive, at least initially. This library management of the
document contribution process typically includes:
Raym Crow opines that [ SPARC] one way to ease and encourage faculty and departmental
participation is to frame participation in a manner that it addresses a problem
the faculty wishes to solve. By helping collect and host papers for a
university-sponsored conference, assuming responsibility for departmental
working paper series, or taking on digital production and archiving
responsibility for existing programs, repository implementers can lessen the
workload of faculty while actively encouraging their participation. At the same time, such projects will
have to be sensitive to the perceptions and apprehensions of the departmental
support staff currently responsible for them. The user community orientation
adopted by DSpace provides another alternative: each DSpace community designs a workflow process that
accommodates the needs of its faculty and staff. In this way, administrative
and technical responsibilities can be shared by the community’s resources,
coordinated with the library.
Development & Operational Costs
Expanses are
of labor (and the equivalent if some skill requirements are met via
out-sourcing), Software, hardware, network, etc..
The technical
support costs of developing and operating an institutional repository will
depend on the service level agreement the repository has with the institution’s
technical support operations, and possibly, with third parties. Implementers of
EPrints software indicate that the staff time
required to install and configure the software is approximately four to five
FTE days. While other library staff can perform much of the policy-based
component of the repository, setting up the repository technical
infrastructure—even using a largely turn-key solution such as the EPrints software—requires the assistance of a technical
systems administrator. In KU, Faculty stuffs from Computer Application Section
may take the responsibility at initiation stage.
Software
costs will depend on a basic “build or buy” (or “borrow”) decision, which has
economic, strategic, and many practical considerations. At present, a number of
proven, dependable, flexible, low-cost software solutions are available.
EPrints and DSpace are proven to work
good in this purpose, and both are freely downloadable. They are open source,
and could be customized. Their supporting software is also open source. So it
won’t need any additional expenditure.
Hardware
costs depend on the performance, storage, and other attributes of the
Configuration
selected. EPrints can run on a basic hardware
configuration, although disk storage, server capacity, and perhaps other
specifications would need to be upgraded as the repository moved from a pilot
stage into public operation and heavy use. Hardware specifications for DSpace are not yet available. However, system hardware
costs for either system will vary with the fault tolerance that the repository
is willing to accept (for example, low downtime tolerance might require an
inventory of replacement drives, etc.), backup capabilities, and other
requirements. The cost of such services will typically depend on the existing
capabilities of such units and the extent to which the repository
implementation can achieve operating efficiencies with existing technical
operations. The same is true of networking, which should be a modest
incremental expense to the institution’s existing network.
On-going
technology labor costs, such as for system administration, are generally
allocated as an increment of existing human resources and programs. Initially,
non-technical staffing may also be handled via resource allocation, although
larger initiatives will need to commit to staffing long-term program management
positions.
Obviously,
proponents of the new institutional repository will need to present
a full budget and probably multi-year forecasts at some point
in their interaction with university and library administration.
The software
should support switching over to newer versions or even in totally different
software through a common data structure. But in-house customization may demand
for expert’s help which may demand for additional expenditure.
LIBRARY’S ROLE IN INSTITUTIONAL REPOSITORIES
In institutes
like universities, libraries always play an important role to provide
information services to every concerned – students, research scholars, staffs,
research guides, teachers. So library is the common place where every concerned
has to come for fulfillment of their information needs. Libraries serve them
through their resources. In this era of electronic resources, their performance
through use of electronic resources has enhanced. They subscribe for e-journals
and provide search ability to users in those databases. Day by day, costs of
journals are raising forcing libraries to squeeze their list of preferred
journals to cope up within their budget. Thus, scope of providing services also
decreases.
As an inevitable part of research
assistance, libraries can take most responsibility relating to setup and
maintain institutional e-print repository, and advocator of the method of self
archiving. Steven Harnad [HARNAD (Steven). For whom
the gate tolls?... Available at http://www.cogsi.soton.ac.uk/~harnad/
] has divided the work of setup in two waves, first- setup the archives and
work as proxy of the author, and secondly, to maintain and popularize them.
Library could be advocator to setup
institutional e-print repository. This will give library professionals another
scope to serve patrons by using IT. This will enlarge their image to every
concerned. They can apply their collective, consortia power to maintain
archives, day to day problems, arrange them properly and prepare proposals to
overcome them. They can help authors to archive their writings at the first
stage. In future, they may instruct how to do it. They also may play as proxy
to the authors in case of very busy, old persons in exchange of a minimum
negotiable charge. This is a solely policy matter and the executive body should
prepare clear instructions relating to that. But it is expected that, with
personalized, individual attention and ease of filling up forms and doing steps
of archiving, the matter may seem to be as easy as writing an e-mail.
Library stuffs can administer the
server. They have enough skills of organization of knowledge. With their
professional skills, they can manage Meta data factors
more easily. But simultaneously, it demands some additional skills-like:
ü
Maintaining the server,
ü
Troubleshooting relating to error comes while data
handling,
ü
Maintain uniformity in standards and selection of
metadata,
ü
Provide enhanced scope of searching within the
campus,
ü
Making them searchable by spiders of search engines,
ü
Making them OAI-PMH compliant and cross search
facilities,
ü
Educate users how to get maximum benefit from this
archive,
ü
Gather information of publisher’s policy and inform
authors about them as well as how to bypass those problems and submit in
archives,
ü
Take initiatives to gather copies of author’s
previous copies and attach them to archives,
ü
Keep their eyes open about regularity of submission
by authors, etc.
Besides professional knowledge, this
also requires some more skills. Library professionals have enough patience and
ability to talk to every individual interested. They can make others understood
about utility of this archive. But this maintenance of sever needs some
advanced IT skills. Here library staffs
may feel insecure while handling a server. But proper training in short
intervals will help them grow confidence on it. They have to gather knowledge
of installation, LAN operations, Internet connectivity, virus problems, access
control, backup techniques, working knowledge of programming and server
administration, etc.
There
fore, it is recommended that, more than one library staffs (one assistant
librarian at least) may be trained in a regular interval about these factors.
Librarian is proposed as a member of both Executive Committee and working group
so that he can convince Executive Committee about need of training library
staffs for that purpose instead of selecting one computer professional for the
work. Here, for K.U., working group members are proposed by keeping eyes on the
troubleshooting factors relating to IT skills. Charge of server maintenance is
proposed to other than librarians’ because he is not only a highly experienced
professional, but also act as administrative officer of the library. He has
already under a burden of library operations. He will function as high level
supervisor and get reports from server in-charge, and carry them to Executive Committee.
He will bring decisions to working group and supervise implementing policies.
He may advice server in-charge in case of critical situations, but should not
take all burdens of maintaining it.
Technical Aspects
Open Source software
Open source software
is software that includes source code and is usually available at no charge [Corrado,Edward M. Spring 2005.The
importance of open access, open source, and open standards for libraries. Issues
in Science and
Technology
Librarianship. Available at http://www.istl.org/05-spring/article2.html
]. There are additional requirements besides the availability of source
code that a program must meet before it is considered open source including:
Libraries can
realize many advantages by using open source software. One of the most obvious
advantages is the initial cost. Open source software is generally available for
free (or at a minimal cost) and it is not necessary to purchase additional
licenses for every computer that the program is to be installed on or for every
person who is going to use the software. Open source software not only has a
lower acquisition cost than proprietary software, it often has lower
implementation and support costs as well.
It is easier
to evaluate open source software then proprietary software. Since open source
software is typically freely available to download, librarians and systems
administrators can install complete production-ready versions of software and
evaluate competing packages. This can be done not only without any license
fees, but also without having to stick to a vendor's trial period, evaluate a
limited version of the software, or deal with the vendor's sales personnel. If
the library likes an overall open source package but would like a few added
features they can add these features themselves. This is possible because the
source code is available. Even if a library does not have in-house expertise
they can benefit from source code availability because another library may be
able to provide them the fix or they can hire a consultant to make the changes
that they desire. It is to be noted that if a proprietary program "is
deficient in some way [the user] must wait until the vendor decides it is
financially viable to develop the enhancement -- an event that may never
occur." With open source software the user can develop the enhancement
themselves.
Open source
software allows for more support options. Proprietary software vendors often
package service with the product. This is particularly true of proprietary
library-specific software. When support from a vendor is inadequate it is an
additional expense to purchase another tier of support, assuming that it is
even available. Open source software allows for different vendors to compete
for support contracts based on quality of service and on price. Access to the
source code also allows for self-support when practical and desired.
The amount of
vendor lock-in is dramatically reduced with open source software. The large
initial costs often associated with proprietary software makes it difficult to
reevaluate the choice of software when it does not live up to expectations.
Proprietary software can lead to a single point of failure. If a vendor goes
out of business or decides not to support a program anymore there is often
nothing an user can do. Organizations using the
software could provide self support or other vendors can come in and fill the
void left by the previous vendor if the program were available as open source
software.
For the
present purpose, Library can use all open source software to do it. There are
lots of software are available, but in practice, it is found that EPrints and DSpace are most
widely used and discussed for their functionalities. So, discussions are
bounded here with these two. EPrints and DSpace both run in Operating systems – Linux (7.2 -9.3) or
Fedora core (1-4). Both are open source software. EPrints
(www.eprints.org/)
and DSpace (http://libraries.mit.edu/dspace-mit/technology/
) themselves are open source. Their supporting software can also download from
their respective links.
The
Ability to Migrate and Survive
When considering a technical
implementation for an institutional repository, it is important to remember
that the explicit expectation is that the content managed by the system will
survive the system itself and can migrate as new technologies evolve. In any
event, switching costs from one repository technical solution to another would
typically be high. Also, switching systems and solutions can be quite risky.
Therefore, institutions will want to select their implementation path
carefully. Even though several of the solutions are open source, they still
involve database mapping and other customizations that would require additional
investment if the infrastructure were changed.
Therefore, the system must be
content-centric: applying standards and protocols that facilitate ongoing
access to the information itself must be central to the system’s conception.
The design and implementation of both the EPrints
software and the DSpace system have been based on
such standards. EPrints can export the archive
metadata in XML in a structured format that facilitates migrating to a
subsequent system. Both EPrints and DSpace are based on
open source software licensing principles.
EPrints and DSpace offer off-the-shelf systems that allow an
institution to implement a complete framework for an OAI-compliant repository
without resorting to in-house technical development. Both systems can be
customized to meet local requirements, allowing an institution to configure
metadata formats, design subject hierarchies, define acceptable file formats,
and register with OAI.
Open archive forum provides following
comparison in characteristics and basic features (http://www.oaforum.org/resources/tvtoolscomp.php )between EPrints and DSpace
Feature
|
Eprints
|
Dspace
|
Installation
|
Eprints is easy to
set up: An installation script automates most of the installation processes.
It is
possible to chose between a source- or
binary-installation. With the source one the software has to be compiled by
the programmer. The binary one is precompiled for special architectures like
Solaris Sparc systems. The programmer only needs to
configure the software.
MySQL, Apache
and mod_Perl, the components which are necessary
for implementation are smooth installations - no matter if source- or
binary-installation is chosen. The installation of additional required Perl modules need more time to resolve the dependencies.
There are
two possibilities to support the system: One installation variant is a
Solaris environment. The second variant, Linux, is easier to maintain.
If any
installation problems are arising a comprehensive support is ensured. GNU Eprints has a separate website containing documentation,
downloads, demonstration server and mailing lists: http://software.eprints.org/
|
The installation of DSpace
requires a little more effort. But in fact DSpace
is easy to run and maintain for any experienced systems engineer.
In order to run DSpace the
following list of Software is necessary to be installed and configured
before: Java 1.3, Tomcat 4.0+, Apache 1.3, PostgreSQL
7.3+, Ant 1.5. Details of the requirements can be
viewed at: http://dspace.org/technology/system-docs/install.html#prerequisite
If the programmer follows step by step the installation
documentation, Java, Ant and PostgreSQL are easy to
install successfully.
To set up DSpace man needs to
compile the DSpace source code with java tool Ant. The Tomcat
server must be started by user "dspace"
and user "dspace" should then create a
database named "dspace".
With the installation some common problems arose, e.g.
that Tomcat doesn't work when the DSpace is
connected to Tomcat. Some changes in the configuration script solved that
problem.
There is no support service for the DSpace
installation. But there is detailed system documentation at: http://dspace.org/technology/system-docs/index.html.
And also a public mailing list for the installation questions is supported.
|
Programming
language
|
Perl
|
Java
|
Operation
system
|
Both
environment variants had been tested: Solaris and Linux.
Furthermore
it is also possible to install Eprints2 on any computer that is running with
GNU/Linux or UNIX operating system.
|
DSpace had been tested on Linux Suse
7.3.
In general DSpace can run on Solaris, Linux and Windows systems
|
Functions
|
EPrints is free software which creates online archives.
It is
possible to store documents in any common format that the archive
administrator defined to be accepted. Each individual research paper/ eprint/ ... can be stored in more than one document
format.
The archive
can use any metadata schema; the administrator decides what metadata fields
are held about each eprint. This is specified in
three or four stages:
- Definition
of a maximal set of metadata fields that should be stored (e.g. authors,
title, journal, journal volume, etc.)
- Definition
of different types of eprints (e.g. refereed
journal article, thesis, technical report, unpublished preprint, etc.)
- Specification
for each type which metadata fields should be stored, and which of those
fields are mandatory.
- Decide
how these metadata fields should be projected into the Open Archives
world. (If necessary, interoperability can be switched off, but this is
strongly discouraged.)
More
functions can be viewed at http://software.eprints.org/
|
DSpace can be used for self archiving by
institutions and faculties. It provides long-term physical storage and
management of digital items in a repository.
DSpace is organised
into "Communities" and "Collections", each of which
retains its identity within the repository. It supports a variety of digital
formats and content types including text, images, audio, and video and allows
contributors to limit access to items in DSpace.
All these items can be organised by an
administration interface.
DSpace supports the OAI protocol 2.0 as a
data provider. This OAI support was implemented using OCLC's
OAICat open-source software to make DSpace item records available for harvesting.
Currently DSpace supports only
the Dublin Core metadata element set with a few qualifications conforming to
the library application profile. But there are still developing plans to
support a subset of the IMS/SCORM element set (for describing education
material) in the coming year.
More details of DSpace
functionality can be founded at http://libraries.mit.edu/dspace-mit/technology/functionality.pdf
|
Re
usage
|
Eprints is widespread all over the world. In August 2003
there are 72 worldwide archives running Eprints
software officially listed (http://software.eprints.org/).
|
It is not reported how many archives are running DSpace software. One example of an
European repository that implemented DSpace is
"Erasmus
University: Research
Online".
|
Technology
|
Eprints uses traditional technologies and runs on pure
Open Source systems: mySQL is the world's most
popular open source database, recognized for its speed and reliability and
Apache has been the most popular web server on the Internet since April of
1996.
Eprints is programmed by using the script language "Perl", that is low level but powerful.
|
DSpace operates with new technologies such as the Postgres database, that is more advanced than mySQL and Tomcat for jsp/java
web application, that has higher performance than eprints.
Dspace supports and includes also handle server, which ensures
that each document has unique and persistent URL.
Optionally,
DSpace can be protected by the security features
(SSL) of Tomcat. It is also possible to use the redirect function (port
number can be omitted) from Apache referring to Tomcat.
|
Interoperability
|
Eprints is freely distributable and subject to the GNU
General Public License. This means that its source code is open and freely
modifiable by any programmer who wishes to modify it (on condition that
modifications are all free and open).
Therefore
in principle an adjustment to every environment is possible even if it is
different than the recommended. Naturally this may be connected with
substantial expenditure.
However Eprints offers no supporting documents there are
nevertheless mailing lists for support.
|
The DSpace system is freely available as open-source
software. This allows to make any necessary changes
to the downloaded copy. The system was designed to make adaptations for
individual organisations as easy as possible.
In fact,
several modules in DSpace will probably be customised by organizations using this tool (e.g. it
might be necessary to get authorization and authentication for more than one
person). Or some organisations may want to adapt a
different environment than recommended (e.g. replace postgreSQL
by mySQL or Oracle). At the moment, substituting a
different relational database than postgreSQL will
require just a few changes to the system's Browse module.
Java
provides documented Java APIs that can be enhanced to allow interoperation
with other systems that an institution might be running (e.g. auto-depositing
in DSpace a department's web document system, or the campus data warehouse).
|
Search
|
Eprints allows to scan each of
the metadata field types in the database by simple or advanced search. Any
metadata field can be searched with fine granularity by SQL querying the
database.
|
DSpace offers two levels of text search: simple and advanced
search. It's submission process also allows to use a
qualified version of the Dublin Core metadata schema for the description of
each item. These descriptions are stored in a relational database, which is
used by the search engine to retrieve items.
|
Open Standards
The term
"open standard" means different things to different people. Three key
characteristics [Corrado,Edward
M. Spring 2005.The importance of open access, open source, and open standards
for libraries. Issues in Science and Technology
Librarianship. Available at http://www.istl.org/05spring/article2.html ]of open standards are :
1) That
anyone can use the standards to develop software,
2) Anyone can
acquire the standards for free or without a significant cost, and
3) The standard has been developed in a way in
which anyone can participate. When a standard has the first two of these
characteristics (the ability to use the standard and to obtain it with out a
significant cost) it can be said to be an open standard in a utility
sense. That is to say that an open
standard is a standard that is not encumbered by a patent, does not require
proprietary software, and can be utilized by anyone without cost.
Proprietary
standards can sometimes be expensive and it may be cost prohibited to purchase
access to a proprietary standard if it is ever needed. Many people consider a
standard to be sufficiently open as long as it is open in a utility sense.
Others take this a step further and consider a standard to be open only if the
process meets the criteria of being created and modified in an open process as
well. Dublin Core is a completely open
standard that is open both in utility and in process. All one has to do is show
up and participate in order to contribute to the development of Dublin Core.
It is
important for libraries and other cultural institutions to ensure long-term
access to digital information. The rapid growth in digital technologies has led
to new and improved applications for digital preservation. However at the same
time it has also led to some problems as well. Two of these problems are
obsolescence and dependency issues. The obsolescence problem is caused by the
advances in hardware and software making many computers obsolete within a very
few years. Dependency problems can arise if tools that are needed to
communicate between systems or read file formats become unavailable. In order
to account for obsolescence and dependency problems organizations must be able
for migration of data into new systems. Data migration, however, cannot occur
without access to data file formats.
Properly
created open standards for file formats are less likely to become obsolete and
are more reliable and stable then proprietary formats. In the event that an
open standard file format does become obsolete, having access to the file
format would allow anyone to easily, and legally, create a data conversion utility. File formats that
use open standards can assist in long-term archiving because they allow for software and hardware independence.
Open standards help alleviate issues caused by obsolescence or dependency
problems since files created in formats that adhere to open standards are more likely than proprietary formats to be
readable twenty or fifty years from now. This allows for greater flexibility
and easy migration to different systems in the future.
The use of open standards can help
assure interoperability of diverse systems. There are various software packages
that are being used to create digital libraries, online library catalogs, and
other resources that libraries relay on. These various systems need to be able
to interact in order to provide the best possible service to patrons. The way
to make certain that these diverse systems, and any future systems, can
communicate with each other is by using open standards to help achieve the
"free flow of information through interoperability" (The Open Group.
2005. Developer Declaration of Independence. Available: http://www.opengroup.org/declaration/declaration.htm
).
Some
library-centric initiatives, including the Open Archives Institute (OAI), also
support open standards. OAI's mission is to develop
and promote interoperability standards that aim to facilitate the efficient
dissemination of content. DCMES is also supported open standard for OAI.
Hard ware and Operating System:
From the above study, it is found that
hardware peripherals are not specified in both cases. To start with, following
may be used:
1. Processor : Intel P4 2.8 GHz
2. Corresponding Intel original mother
board, Monitor, etc.
3. Hard Disk : SCSI 120 GB HDD with 7200RPM
speed
4. RAM : 1GB DDR RAM,
5. Multimedia objects : 52 X CD Writer/DVD writers,
6. High speed Internet connection.
7. Tape drive may be used for data backup.
8. Audio output facility is also required
in case of audio data backup.
More than one
physical hard disk drive may secure minimum loss of data in case of crash.
Operating
System: Red Hat Linux 9.3 or Fedora Core 4
Present India
India is a vast
country and developing fast towards being a developed country. UGC
(ttp://www.ugc.ac.in/new_initiatives/hisp.htm) says that India has a large
and complex Higher Education System. This comprises of nearly 310 universities
and, a large group of research and development organizations. Universities in India are either
set up by an Act of Parliament or the State Legislatures. In addition, some
institutions are also conferred deemed to be University status by the Central
Government. Universities are either unitary or affiliating. 131 universities in
the Country are affiliating type. They together affiliate around 15,500
colleges. Total student enrolment is around 92 lakh.
In addition, there are several professional councils that maintain standards in
their respective fields. Some of these professional councils also maintain a
Central Register of Professionals in their fields. In 1956, the Central
Government had set up the University Grants Commission (UGC) to discharge its
constitutional mandate of coordination, determination, and maintenance of
standards in higher education.
Higher
Education system in the Country is a loose configuration of heterogeneous
organizational units - universities, colleges, professional councils etc. This
diversity is a source of excellence and makes it vibrant. Coordination of such
a diverse system of education is tricky yet necessary to ensure its
credibility. So, they have meant to create a Knowledge Repository for
communities of teachers and researchers in the Country.
UGC is
developing a mechanism for tracking academic information resources such as
learning resources, curricula, question banks, national theses etc., published
in various formats through systematic, internationally used metadata data
framework for tagging such resources.
In
such situation, institutional repository will be a very good, timely and
logistic effort for the University of
Kalyani.
Proposed Line of Work
Proposed Administrative Committee Structure
Each
and every work needs a strong well thought policy to organize anything
successfully. Institutional repository setup needs a well-organized structure
of decision-making body so that proper implementation of policies could be
possible. So before going for other factors, one committee should be formed to
study its possibility, scope and coverage of the work, as well as to administer
over it. The committee should be at least in two levels: One- Executive
committee and an Advisory committee.
The executive committee should/may consist of at
least:
Other
members:
·
One member from Teacher’s council,
The Working Group may consist of:
Other
members:
Purpose of
the Executive committee:
- To
decide scope of the archive,
- To
discuss problems related to serial crisis,
- To
discuss budgetary constrains relating to serial purchase,
- To
enhance scope of access to scholarly publications,
- To
discuss motto of the archive,
- To
discuss standards of archived materials,
- To
discuss over existing rules and regulations of the University relating to
such archiving,
- Taking
decisions relating to change/modify existing rules relating to submission
of papers,/Thesis/Dissertations
- Implementing rules over every concerned,
- To
discus over incoming problems relating to the new rules,
- To
ensure budget/funding for the work,
- To
decide level of work (as project or in large scale) and setting up
deadline for evaluating its’ progress,
- To
confirm continuous advocacy,
- To
ensure service standards,
- To
assure subscribers about safe guarding their intellectual contributions,
- To
discuss over problems coming from author’s side,
- To
ensure world wide accessibility,
- To
ensure standardized services to all concerned,
- To
evaluate archive with existing one in developed countries,
- To
discuss advantages and disadvantages of this archive comparing existing
worldwide archives,
- To
discuss over legal issues,
- To
ensure assistance to authors/subscribers,
- Making
policies relating to submission,
- Selecting
staffs for working/Administering on Server of Institutional repository,
- Solving
staff problems,
- Preparing
policies relating to training facilities for staff development (if
necessary),
- Meeting
in a certain interval for discussions over existing situations,
- Keeping
eyes on development of archiving worldwide and developing own archive to
cope up with them,
- Preparing
future plans for archive,
- Setting
goals and time lines for the work etc.
The Working
group will discuss over:
- Software
selection criteria,
- Existing
hardware and hard ware requirements,
- Condition
of within campus LAN structure and recommend for any necessary
modifications,
- Server
setup and maintenance issues,
- Meta data
selection criteria,
- Technical
problems related to archiving,
- Methods
of archiving,
- Advocacy
methods and it’s impacts,
- Troubleshooting
relating to software/ network problems,
- Proposed
modifications on structure of archives,
- Format /
way of archiving,
- Impact
of this repository on citations of scholarly publications,
- Processing,
- Minimum
standard format of subscription materials,
- How to
overcome IPR issue problems,
- Format
of display,
- File
format of subscriptions,
- Ease of
understanding the policies,
- Ensuring
each staff/research guide and scholar is aware about this rule,
This is a
proposed structure, and subject to change as per requirements. The Executive
Council may take responsibilities instead of forming Executive Committee on
this mater. But formation of working group is strongly recommended, as they
will handle day to day activities and deal with problems in a regular basis.
The main
activity of Executive committee is at the beginning, while policies are going
to take shape. After then, they may meet once in a year or twice to discuss
over its’ progress and suggesting developments. The working group should
together at least once in a Month. This is important because they will be
responsible for it’s’ success or failure.
Proposed
Workload Distribution in KU
The early repository implementers
consider library mediation of content submissions to be the only practical
method of managing the archive, at least initially. The work load needs to distribute
among faculties and library professionals while dealing with back logs of
author’s writings. This library management of the document contribution
process typically includes:
Although the
archiving software is associated with author self-archiving, self-posting
through the system requires several steps. Given the significant disparity of
technical proficiency amongst faculty, potential contributors might be expected
from them.
Raym Crow opines that [ SPARC] one way to ease and encourage faculty and departmental
participation is to frame participation in a manner that it addresses a problem
the faculty wishes to solve.
In KU,
Students From Department of LIS can be assigned some
project works as part of their academic curricula to collect and host papers
for a university-sponsored conference, or taking responsibility for
departmental working paper series, or taking on digital production and
archiving a number of backlogs of different nature of bibliographic materials
waiting to go in archive. Repository implementers can lessen the workload of
faculty while actively encouraging their participation.
The user community orientation adopted by DSpace provides another alternative: each DSpace community designs a workflow process that
accommodates the needs of its faculty and staff. In this way, administrative
and technical responsibilities can be shared by the community’s resources,
coordinated with the library.
Advocacy Methods
1.
Within
Institution
·
By
distributing literature: Distribution of literature describing advantages of IRs and how others can facilitate with it-may encourage
writers for submission in institutional repositories.
·
By
distributing leaf let: This can reach every member of the institution, and will
enhance their awareness as well as use of IRs.
·
Through
university magazines: This may work as a good platform to reach every member of
that institution.
·
Through
library newsletters: This will encourage library users.
·
Through
notice to each department’s notice board: Sending notice to each department’s
notice board will make teachers/staffs aware about it.
·
Through
amending some rules in university rules: University should amend new
regulations about subscriptions of articles to institutional e-print
repository, as researchers are funded by the university authority. This may be
an essential condition for getting fund for research. This is also applicable
to research guides, staffs and other students of that institution.
·
Through user
education in libraries: this is a regular process and will make each member
aware about advantages of IRs.
·
By explaining
in researcher’s meetings: This gives scope to discuss face-to-face with
researchers and convince them about larger usage of their works.
·
By inspiring
research guides/teachers to encourage students to submit e-prints in
institutional repositories: Research guides can instruct scholars to submit an
e-copy of their work to IRs.: Teachers can
encourage students to write about some topics and post a copy to IRs. Internal seminars also produce a lot of literature.
Teachers can encourage students to publish them in some journal and send a copy
to institutional e-print repository. This will help students to achieve an
identity among others.
·
Through
internal seminars, departmental meetings etc. :These
may be used to inform students/specialists/staffs of the institution.
·
Incorporating
information in library user’s card/instruction sheet etc.:
This will make them conscious about the
issue.
·
Organizing
special advocacy events for university staffs: Like annual meetings, debating
competitions, annual sports day etc.
2.
Outside The
Institution
·
Collaborating
with other existing e-print archives,
·
Registering
in to OAI registration list,
·
Sending
information about establishment of e-print archives to discussion forums, news
groups, specialist associations, research organizations.
·
Posting news
letters in LIS forums to make other librarians aware about it’s existence,
·
Reading papers about
establishments of institutional repositories in different seminars,
·
Holding
banners about its’ existence in regional/national/international level seminars,
·
Sending
information about it to professional associations etc.
This advocacy is not a one-time job.
Libraries and institutions should have to do it continuously every year as a
part of their user education activities. This will make it aware every new
comer to the institution. In Universities, fresher welcome ceremony may work as
a platform for informing new students about the repositories. Every departmental head may inform students
about it in their first address to new batch. Library may handover them a leaf
let when they go for their user’s card.
With user education, library may include discussions about IRs.
Library may
place a notice board above their computerized catalog /catalog cabinet written
in attractive colors describing how to use/subscribe in IRs.
Library
should take initiative to help/guide writers to post their writings in e-print
in beginning.
Proposed Submission Process:
Distributed
submission with centralized management is recommended policy for the purpose.
At initiation, to free users from problems, library may take depositions in CD,
then convert the files in to PDF form and upload after proper incorporation of
metadata. In that case, metadata of the articles should be collected in the
same format the author has to fill in, while posting to archive himself. Working as proxy for authors may also be practiced,
in charge of a nominal amount for older people or who faces a lot of problems
and can’t solve them himself. The charge should be very low, so that authors
can’ gets away listening the amount.
Proposed Preservation Policy
Keeping Long
term preservation , KU may follow Open Archive
information System model (OAIS model available at http://ssdoo.gsfc.nasa.gov/host/isoas/
). Switchover policies are to be kept in mind.
Weeding out policies should be
there. It is recommended that,
·
if the post
print is deposited, preprint should be removed.
·
After a new version of rules and regulations, and
after implementations of them, older versions can be removed.
·
After new edition comes, old edition books can also
be removed.
·
Thesis and dissertations have lost their relevancy
due to long time period- should be removed. But their metadata and abstract
must be present to ensure their existence.
Those
documents, removed from archives due to space congestion, must be preserved in
somewhere else for historical purpose and service on demand.
Metadata Selections
The archive has to be OAI compliant.
DCMES may be used for articles. In case of Thesis and Dissertations, UGC norms
should be followed.
Meta
data Checking
Meta data
incorporated by authors should be very carefully checked by the administrator.
They can edit, change or even stop uploading at that stage. They have to notify
the author through e-mail about errors in metadata they have incorporated.
Subject heading:
There are lots of subjects taught
and discussed in KU. Researchers work in different subjects’ problems, as well
as interdisciplinary subjects. Any existing Subject heading List can’t serve
the purpose. It is not possible for KU at present to collect special subject
heading lists on every subjects. So, until any international open standards are
formed and any international level open access subject heading list covering
every micro thoughts comes (or permits to form new standard term lists for
inter-disciplinary subjects) LCSH may be a handy one.
Who can Participate
As this is an
institutional repository, so members of the institutions (students, Teachers,
Staffs, Research Scholars) may only have the access. There are some colleges
under the jurisdiction of KU. As an extended family member of KU, they also
will have the permission to submit here.
At present,
there are a very few repositories at work. There are lots of people writing a
good number of articles every year, though they are not directly connected to
the University at present. I recommend permitting them to deposit in the
archive. The university may ask for authentication of author’s qualifications
and identity. Even (though should not) they may ask for a very little amount
for each deposition.
Registration
Each and
every user should have to register them selves by filling a vey
small form through e-mail. This will provide him an account for searching in
the archive.
Confirming Global Access
Again, the
archive itself should register to open archive (http://www.openarchive.org/
) / eprints’ archive (http://www.eprints.org/
). They will enlist it to their list of archives. This will help metadata
harvesters to harvest their archives and made it accessible. Links to search
engine sites should also submit. DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp)
is a software which can translate OAI compliant metadata into search engine
friendly data.
Costs
A server is
essential to start the archive. Hardware peripherals are not going to be very
costly one. The cost for manpower will be maximum, followed by advocacy costs.
Software are freely download able and requires almost no cost. Backup and
networking will also demand for a good amount.
For
Kalyani University, Library
professionals with help of computer experts from Computer application
Departments may reduce labor cost. If rules are made to submit e-copy in PDF
(preferable) version to the library/ departments in CDs, in future, no
conversion ill be necessary. . But retrospective conversion of backlogs will
demand a big amount and time and manpower.
Handling
Backlogs: a tricky proposal
If some
students are assigned this conversion as their project work (specially to
students of Library and Information Sciences), and made it mandatory for every
LIS student to upload a certain number of back volumes of thesis and
dissertations, then with time, the work load can be reduced to a great extent.
Use of Hard wares and Software:
A latest
machine that supports SCSI bus HDD is recommended. The configuration may be as
follows (as a test bed):
1. Processor:
Intel latest
processor
.2. Hard
Disk: SCSI 120 GB
HDD with 7200RPM speed
3. RAM : 2GB
DDR RAM,
4. Multimedia
objects: 52 X CD Writer/DVD
writers,
5
Corresponding Intel original mother board,
6. Color
monitor,
7. Internet
Key board,
8. Optical
mouse etc
9. High speed
Internet connection.
10. Tape
Drive (optional)
(This is not
a rigid configuration, and can be customized as per requirements)
In case of Software, I recommend Open source
software as both OS and archival purpose. Red Hat Linux 9.3 or Fedora Core 4
may be used as OS.
A brief comparison of Feature &
Functionality between DSpace and EPrints
are at a glance are given below. But it is optional to select any of them.
In practice,
I have tried with both EPrints and DSpace. I felt a lot of problems with EPrints
installation. I could not establish relations between EPrints
and MySQL server. At last, I had to move to Dspace. I used a
script available from DRTC and pg73jdbc2.jar file to install it.
if test ! -f /etc/rc.d/init.d/postgresql
then echo " postgreSQL not
loaded, load from Linux CDs; Exiting ..."
exit
fi
#if test -f /usr/share/java/*jdbc1.jar
#then jdbcdir="/usr/share/java"
# else
#if test -f /usr/share/pgsql/*jdbc1.jar
#then jdbcdir="/usr/share/pgsql"
# else
# echo "postgreSQL jdbc drivers are not
loaded"
# echo
"Use Linux CDs to load them"
# exit
#fi
#fi
echo "JDBC
drivers are in $jdbcdir"
echo
read -p
"Enter your mail server hostname ($HOSTNAME): " mailhost
·
Then I extracted the .jar file under ‘/root/dspace/dspace-1.3.1-source/lib’.
·
Then I run the setup (./setup)
from the dspace folder.
·
It asked for some parity checking. Then for mail
server host name. I put here my machine’s predefined IP.
·
I avoided password protection token to avoid
complexity.
·
Then it creates a user with default password “dspace”.
·
Then it asked for DNS server name. I put there
pre-assigned machine’s name.
·
It went through different installations modules
automatically and asked for administrator’s email id and password. Then
processed them.
·
At last it provided two links to put in browser. One
for using dspace as user, and other for working as
administrator.
·
The whole process completed within a very few minutes
(2 min. and 9 seconds here).
·
I repeated it in different types of machines (PII,
PIII, P4 with different hard disk capability –from 10 GB to 40 GB assigned for
Red Hat Linux, with 128 MB and 256 MB RAM). They all worked well, and took less
than 5 minutes.
·
Access was tested through LAN. It provides access
through LAN. Due to lack of infrastructure, accessibility through Internet
could not be judged.
·
Due to lack of time, its’ functionality and other
activities could not worked out. Clear instructions are available through
documentation bundled with it. Information regarding various aspects may also
be collected from DSpace
System Documentation http://libraries.mit.edu/dspace-it/technology/system-docs/
No.
|
References
(unedited version)
|
Page Number
|
|
Oppenheim, Charles.2005. Open access and UK Science and Technology
select committee report : free for all?. Journal
of librarianship and information science. 37,1. p4
|
6
|
|
Corrado, E M. 2005.the importance of open access, open
source, and open standards for libraries. Issues in science and technology
librarianship. Available at http://www.istl.org/05-spring/article2.html
|
6
|
|
http://www.earlham.edu/~peters/fos/timeline.htm
|
7
|
|
MICI
Metadata Clearinghouse (Interactiv) (homepage).
Available at http://www.metadatainformation.org/
|
8
|
|
Metadata and Resource description.
Available at http://www.w3c.org/metadata/
|
8
|
|
Crow, Rayam. SPARC
institutional repository checklist & resource guide. Available at http://www.arl.org/sparc/IR/ IR_Guide.html
|
9
|
|
at http://www.eprint.org/glossary/
|
10
|
|
A brief comparison of different
institutional repository software is available in ‘OSI Guide to Institutional
Repository Software v2.0
|
16
|
|
OSI Guide to Institutional
Repository Software v2.0
|
17
|
|
OSI Guide to Institutional
Repository Software v2.0
|
17
|
|
(http://www.dublincore.org/ )
|
18
|
|
A Guide to Setting-Up an Institutional Repository,
available at http://www.carl-abrc.ca/projects/institutional_repositories/setup_guide-e.html
|
19
|
|
HISP
project [available at http://www.ugc.ac.in/new_initiatives/hisp.html
]
|
19
|
|
Crow, Raym. 2002. Institutional Repository:checklist & resource Guide.(Washington,
DC: SPARC).
Available from http://www.arl.org/sparc/ ]
|
20
|
|
Postscript from Wikipedia,
the free encyclopedia available at http://en.wikipedia.org/wiki/PostScript.HTML
] .
|
21
|
|
[Portable Document Format from Wikipedia, the free encyclopedia.availableathttp://en.wikipedia.org/wiki/PostScript.HTML]
|
22
|
|
A Guide to Setting-Up an
Institutional Repository, available at http://www.carl-abrc.ca/ projects/institutionalrepositories/setupguide-e.html].
|
23
|
|
A Guide to
Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/ projects/institutionalrepositories/
setupguide-e.html
|
24
|
|
Crow, Raym. Institutional Repository:checklist & resource Guide. (Washington,
DC: SPARC).
Available fromhttp://www.arl.org/sparc/ ].
|
24,25,39,55
|
|
[HARNAD (Steven). For whom the gate tolls?... Available at http://www.cogsi.soton.ac.uk/~harnad/ ]
|
32,42
|
|
Mahapatra, Gayatri.Bibliometric studies :
on Indian library & information science literature /. – New Delhi : Crest,
2000 p7
|
34
|
|
[Callan(Paula).The development and implementation of a university-wide
self-archiving policy at Queensland University of Technology (QUT): Insights
from the frontline. In Institutional
Repositories: The Next Stage. Workshop presented by SPARC & SPARC EUROPE,
November 18–19, 2004,
Washington,
D.C.].
|
34
|
|
Lawrence(Steve).
Online or invisible. Available at : http://www.neci.nec.com/~lawrence/papers/online-nature01/
]
|
35
|
|
Corrado,Edward M. Spring
2005.The importance of open access, open source, and open standards for
libraries. Issues in Science and
Technology
Librarianship. Available at http://www.istl.org/05-spring/article2.html
].
|
45,51
|
|
The Open Group. 2005. Developer
Declaration of Independence.
Available: http://www.opengroup.org/declaration/declaration.htm
|
52
|
|
(http://www.openarchive.org/ )
|
63
|
|
(http://www.eprints.org/ ).
|
63
|
|
|
|
*******************************************************************************************
N.B. this version is not exactly what
I submitted for the examination purpose in the University. It is a preprint
version which was edited in one or two places after this version was copied.
Due to some personal reason, I could not submit the exact copy I submitted
there. This version is out of TOC and
some other specific areas.
***********************************************************************
*****Do not hesitate to mail me if any query.*****
Visit my sites at
http://chandansaha.tripod.com/
http://chandans-ejournal.tripod.com/
https://chandansezone.tripod.com/
or
My blog at http://chandans-ejournal.blogspot.com/