ISSUES INVOLVED IN SETTING UP AN
INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE
Submitted by :
Course : M.L.I.Sc
Session : 2005
Examination Role: 99/MLI No.050009
Department of Library and information science
ISSUES INVOLVED IN SETTING UP AN
INSTITUTIONAL E-PRINT REPOSITRY WITH SPECIAL REFERENCE TO THE
GUIDE: MR. ARUP ROYCHOWDHURY
Submitted by :
Course : M.L.I.Sc
Session : 2005
Examination Role: 99/MLI No.050009
Department of Library and information science
Day-by-day scholarly publications are becoming costlier and unmanageable
for any library or information centers to collect them all. There are other
factors also affecting access to scholarly publications. To overcome these
barriers, and to make access easier and barrier free, e-print repositories are
demand of time. There are lots of issues affecting setup of an institutional
e-print repository. But most important part of them is its policies. A well
thought policy could make it successful and acceptable to all concerned.
Policies and issues relating to institutional repositories are discussed with
special references to KU; and recommendations for setting up an e-print
repository are made. Some other key factors are also controlling setup of
institutional repositories. Advocacy methods are discussed in brief. Some
problems relating to setup-like legal issues, role of University/parent body,
metadata issues, control over archives, selection criteria, administration,
submissions and impact on staffs etc. are explained in brief. An attempt has
been made to formulate a model guideline for
Researchers publish their work to inform every interested person to know about their findings. But publishers has created access barrier to them by demanding high access toll. There are other types of barriers also. Information and communication technology has developed to a great extent. Internet has become a popular media of communication and exchange of views among peer groups can be very easy through it. A movement has initiated world wide to make research output free from grip of publishers and make access bearer free. Internet can act as a communication medium for the purpose.
To make computation free from monopoly of commercial organizations, open source software came into play. This results various open source software, including open source Operating Systems. These software and OS are downloadable free of cost, and can be customized based on requirements-as they are open source. This provided advocators of Open Access Movements to go for seating up e-print repositories where authors can archive an electronic copy of their research output for toll free access. This generated concept of Institutional e-print repositories.
e-print repositories are those repositories setup by any institution to archive
their research output. Various software are available over Internet freely
download able and can be used to setup an institutional repositories. But the
most important part of it is its policies. Only a full proof well thought
policy can built an Institutional e-print repository to go for a long run. Here
in this dissertation, these policies and factors that can affect it are
discussed based on existing literature on open access archives and
repositories. Technicalities and software related issues are also taken into
consideration. An attempt has been made to prepare a brief guideline relating
to policy issues for seating up an institutional repository for the
This has done for the partial fulfillment of M.L.I.Sc course. Hence, this is a very vast job and requires a long time to consider all factors. It is just next to impossible to enumerate all aspects of policies and all related issues in a time limited project. 45 days are even not well enough to consider all aspects of technical issues, so I had to restrict myself in just in installation. I could not found any time to customize it for requirements of KU repository. A lot of scope has left for further development relating to those issues and finding problems and their solutions.
I have taken my dissertation on 'Issues involved in setting up an
Institutional E-print Repository with special references to the
want to express my cordial thanks to Dr. A.R.D. Prasad, Associate professor,
DRTC, ISI (
I want to thank my teachers of MLISc courses, Mr. Dibyendu Paul, Mr. Sabuj Dasgupta (former Head of the Dept.) and Mr. Bidhan Chandra Biswas (Head of the Dept.) for their kind cooperation during the course of study. I also want to thank Mr. Swapan Kumar Roy and other staffs of the department for their cooperation during the course of study.
I have to thank Mr. Mriganka Mondal, Assistant Librarian (Library in-charge) and Mr. Swapan Dasgupta of University Internet center for their kind permission of using his personal information resource during the project. I want to use this opportunity to thank Mr. Joydip Chandra-our senior friend, and my other classmates who encouraged me in different times during the course of study.
Name of the student
Examination Roll. 99/MLI No.050009
LIST OF CONTNTS
List of Abbreviations Used
Abbreviations Full form
Archive e-print archive(here)
DSpace Disk space software
EPrints Eprints software
HDD Hard disk drive
H/W Hard Ware
IRs Institutional Repositories
OA Open Archive
OAI Open Archive Initiative
OAI-PMH Open Archive Initiative Protocol For Metadata Harvesting
OS Operating System
RAM Random Access Memory
ROM Read Only Memory
Dspace: Free software for producing an archive of eprints. Provided by http://sourceforge.net/projects/dspace/
eprint : An electronically published research paper (or other literary item).
EPrints : Free software for producing an archive of eprints. Provided by www.eprints.org/
eprint archive :An online archive of preprints and post prints.
May or may not running using EPrints software.
OA: Open Access- restriction free access to use documents for academic purpose (in electronic archives here).
OAI: Open Access Initiative. From their mission statement "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content."
OAI-PMH: Open Access Initiative-Protocol for Metadata Harvesting. A way for an archive to share it's metadata with harvesters which will offer searches across the data of many OAI-Compliant Archives.
OAI compliant: An archive which has correctly implemented the OAI Protocol.
Post print: The digital text of an article that has been peer-reviewed and accepted for publication by a journal.
Preprint: The digital text of a paper that has not yet been peer-reviewed and accepted for publication by a journal
Writing is a method innovated by human beings to preserve their intellect and carry it to the next generation. From ancient past to present era of artificial computing, writing is a proven method to disperse one’s experiences and share his knowledge to others. Through writing their works and experiences, authors also want to gain fame among other people. Due to invent of technology, dispersing of knowledge became easier. Publication of journals for scholarly communication started. Its aim was to disseminate research information, intellectual works and sharing knowledge among peer groups. But, there are lots of barriers to serve the noble purpose. Geographical distance, communication gap, lack of access, lack of awareness of previous works etc. are some of them. To over come some of them, publishers started commercially production and dissemination of scholarly journals all over the world. This generated another barrier– access tolls. Now a day, most of all the scholarly publications are controlled by commercial organizations for profit making purpose. This controls access to scholarly publications by mass, leading to duplication of works, wastage of time, money and energy.
During last decade of 20th century, Internet became popular medium of communication, and become a newer platform of publication. With advent of information and communication technology and tremendous development in computing, Internet becomes very popular and affordable to most of the people, even in developing countries. Simultaneously, the world of publishing has undergone many changes. Purely paper based publication is slanting towards electronic publications for some extra advantage, like ease of retrieval and accessibility world wide. Publishers have started e-copy service to patrons. Authors write their research outputs and findings in articles. Government and other agencies fund for them. Authors write for fame. Generally they do not get money from publishers for writing. But publishers are the people who generate money for business with them. They control access on those publications and force scholars to spend for it. Library and information centers are the agencies responsible for providing information services to scholars. To serve the purpose, they have to collect articles and subscribe a very big amount to vendors as access toll. Increasing cost of journals force library and information centers to cut their list of preferred journals to cope up with their budget. This again generates barriers to access the scholarly world of communications.
Open access is the only probable solution to this problem. Different renowned persons worldwide has opined in favor of it. Technologies have made it possible to setup electronic document archives following internationally acceptable open standards and using open source software. The output shows that, open access articles get more citation and more read by authors. So impacts of open access articles are increasing in scholarly communications. Though open access movement has started worldwide, but till now, very poor in number and volume. Many of the researchers are till now unaware of it. So, they are also suffering from access problems. Library has an important role here to make the activity popular. Institutions like Universities and research organizations can also play a very important role. They can setup institutional repositories and preserve their research output, and make it freely accessible worldwide. It will again present their activities to the world. These can automatically act as active components of a world wide scholarly storage area network, and in future, will remove access barriers to scholarly communications to a large extent.
The most important part of starting an institutional repository is its policy and issues. Well planned policies can avoid many unwanted problems generated during implementations of policies. More over, policies should be made keeping eyes on future of the repository. Getting advantage of latest technology and open source software may be an important aspect to reduce costs and make it easy to start quickly.
Scope and coverage of the work:
In this dissertation, an attempt has been taken to discuss different policies and issues relating to setup an institutional repository including a touch to it’s technological parts. After that, a brief guide line has been made regarding policies for setup an institutional repository at ‘The University of Kalyani’.
Prepare, “a guideline for setup an institutional
repository for ‘The
Various literatures available over Internet on e-prints repositories, institutional repositories, their various issues, policies, experiences gained by practitioners are studied, including some forum news letters, too.
Relevance of the Study:
Institutional repositories are demand of time. The
Prepare a model guideline for setup an institutional e-print repository for ‘The University of Kalyani’.
Preparation of model guidelines for setup of institutional repository is possible based on studying existing literature dealing with experiences of others.
Area of Study:
Open access, open source, e-print archives, institutional repositories, standards related to them (OAI-MH), their legal issues, administrative structure and policies,
Technicalities, open archive software, requirements
relating to hardware peripherals, etc. are studied. Then their scope of
applications in the
Tools and Techniques of data collection:
Data are collected by searching Internet using Google search engine. Links of different institutional repositories and articles available in some open access e-journals are also used for data collection and software download. Personal resources are also used.
Open access is where electronic versions of scholarly materials are available free at the point of use to anyone who wants to read the. Open access basically calls for scholarly publications are made freely available to libraries and end users. This can be done in two ways [Oppenheim, Charles.2005. Open access and UK Science and Technology select committee report : free for all?. Journal of librarianship and information science. 37,1. p4] :
Ø Publishing in a n open access journals, or
Ø By depositing in an electronic repository, which is searchable from remote locations with out any restrictions in access to them, and use their resources for academic purposes free of cost.
In 1989, the first open access (i.e. no subscription price) fully peer reviewed electronic journal ‘Pcycoloqug’ was launched. At present, there are around o thousand open access journals are present over the web. Steven Harnad was the editor of the journal.
At present, Steven Harnad is one of the leading advocator for open access e-print repositories. Repositories are good alternatives for open access e-journals. E. M. Corrado [Corrado, E M. 2005.the importance of open access, open source, and open standards for libraries. Issues in science and technology librarianship. Available at http://www.istl.org/05-spring/article2.html ] sid that J. Willinsky has identified nine aspects of open access as follows:
a) E-print archives (author’s self-archiving pre or post prints);
b) Unqualified (immediate and full open access versions of a journal);
c) Dual mode (both print subscription and open access version of a journal);
d) Delay open access (open access is available after a certain period);
e) Author fee (authors pay a fee to support open access);
f) Partial open access (some articles of a journal are available via open access);
g) Per-capita (open access available to countries based on per-capita income);
h) Abstract (only abstracts and table of contents are available for open access);
i) Co-operational (institutional members support open access journals).
The advantage of open access is:
Ø A moral /ethical argument that it allows people all over the world to gain access at no cost.;
Ø And, the argument that it means the article is seen by more people & there fore, has a greater impact.
Ø It ensures long time access to scholarly articles. Libraries and others can create a local copy and repositories of such literatures, and can ensure continual access via their repositories in distant future.
Ø It’s message is diffused more widely than by subscription based journals.
It is observed that, articles that are available online free of cost are cited many times more than those that are not available this way.
Open access movement is the worldwide effort to provide free online access to scientific and scholarly literature, especially peer-reviewed articles and their reprints (http://www.earlham.edu/~peters/fos/timeline.htm ). The concept was not very new, but the movement started in 1990. Steven Harnad, a renowned professor of Philosophy and first editor of open access journal is the strongest advocator of it. LosAlmos arxiv database – the oldest archive of pre and post print of Physics is the oldest one and running successfully for more than 10 years. In the dawn of 21st century, it strengthened.
Due to the advent of Internet and telecommunication
technology, channels of communication among scholars worldwide have opened. So,
demand for access to all scholarly publications seems to become effective by
establishing e-print archives. Papers archived by authors in their
institutional archives & cross-search facility among such archives will
provide scholars access to the world of scholarly publications, irrespective of
their actual location. Institutional repositories will be interlinked to
produce global database of scholarly publications. To serve these purpose,
different archiving software are available to build such databases. To build a
data base, the most important thing is
At presence, there are different metadata schemes at work. But very popular of them is ‘The Dublin Core Metadata Element Set (DCMES) ‘. Some others are ‘The Visual Resource Association Core Categories (VRA Core)’ , ‘The Encoded Archival Description (EAD)’ etc. DCMES is a simple set of descriptive data elements intended to generally applicable to all types of resources. This is developed by Dublin Core Metadata Initiative. This includes some qualifiers to enhance its’ scope of application. But till now, it is not self sufficient to describe all types of bibliographic elements with all necessary fields. So, for e-print archives and repositories, it can not function to serve all the purposes. Local variations can not be recommended for the sake of international data search and interpretability.
For the repository to provide access to the broader research community, users outside the institution must be able to find and retrieve information from the repository. Therefore, systems must be able to support interoperability in order to provide access via multiple search engines and other discovery tools. An institution does not necessarily need to implement searching and indexing functionality to satisfy this demand [Crow, Rayam. SPARC institutional repository checklist & resource guide. Available at ]. it could simply maintain and expose metadata, allowing other services to harvest and search the content. This simplicity lowers the barrier to repository operation for many institutions, as it only requires a file system to hold the content and the ability to create and share metadata with external systems.
Interoperability requires persistent naming, standardized metadata formats, and a metadata harvesting protocol. The metadata harvesting protocol allows third-party services to gather the metadata from distributed repositories and conduct searches against the assembled metadata to identify and ultimately retrieve documents. These mechanisms can be applied to any type of compliant e-print repositories & digital library, creating a global network of digital research materials.
The Open Archives movement spawned the Open Archives Initiative (OAI), which was established to develop and promote interoperability solutions to facilitate the dissemination of content. The OAI is a collaborative effort to develop interoperability mechanisms that facilitate access to distributed digital content in the academic environment. The OAI provides the framework for facilitating the discovery of content in distributed repositories.
The OAI developed a set of interoperability standards called the OAI Protocol for Metadata Harvesting (OAI-PMH), which allows repositories to create metadata to describe content stored in the repository and make it available to others who wish to use it. The OAI OAI-PMH supports the interoperability of digital repositories irrespective of type (institutional, discipline-specific, commercial, etc.) or content.
The OAI maintains a list of OAI-compliant repositories from which OAI Service Providers can harvest metadata. To participate in this process, a repository must register with the OAI, once the institution's repository infrastructure is in place. The OAI certifies that a repository is fully compliant by validating the repository's metadata using a program that issues periodic OAI queries. Once these checks are complete, the OAI confirms the registration with the repository and adds the repository to the list of data providers.
The OAI protocol requires that repositories offer the 15 metadata elements employed in unqualified Dublin Core metadata. However, the OAI protocol supports parallel metadata sets, allowing repositories to expose additional metadata specific to the repository's specific needs. Repositories that add domain-specific metadata sets to the Dublin Core should do so in consultation with other repositories to ensure a standardized presentation of these extended metadata sets.
Metadata harvesting means gathering metadata. Data providers collect metadata from archived e-prints. Again, service providers collect these metadata for preparing a combined large searchable user-friendly interface. But, they can gather metadata if archives are OAI-compliant. This whole process is popularly known as metadata harvesting.
The OAI framework posits a publishing model that separates data providers (including institutional repositories) from service providers (metadata harvesters, search/retrieval, and other value-added access tools). Institutional repositories may serve both roles. Data providers provide metadata for harvesting. Service providers gather all those metadata together and provide service with it. They provide search facility for users. The efficiency of service provider thus depends upon data providers also. So, it is data providers’ responsibility to make their archives OAI-PMH compliant. Thus, together, both data providers and service provides play crucial role to serve users.
The term eprint/e-print bears different meanings to different people. EPrints glossary at http://www.eprint.org/glossary says e print as “An electronically published research paper (or other literary item).” They are electronic copies of academic research papers. Budapest Open Access Archive FAQ says e-prints are the digital texts of peer reviewed research articles, before and after refereeing. These eprints are divided in to two categories:
Post prints: The digital text of an article that has been peer-reviewed and accepted for publication by a journal. This includes the author's own final, revised, accepted digital draft, the publisher's, edited, marked-up version, possibly in PDF , any subsequent revised, corrected updates of the peer-reviewed final draft. The watershed separating preprints from post prints is whether they are before or after peer-review and acceptance for publication
An e-print archive is simply an on-line repository of research output, either in preprint or in post print form. These are collection of digital documents. Eprint.org defines ‘e-print archive’ as ‘an online archive of preprints and post prints. Possibly, but not necessarily, running on Eprints software’. Generally they are available free of cost over the web. OAI compliant e-print archives share the same metadata, making their contents interoperable with one another. Their metadata can then be harvested in to global “virtual” archives that are seamlessly navigable by one another.
E-print archives may be institutionally –located and administered, in which they are usually called institutional e-print archives. Or they may be subject specific archives physically located at a suitable side and, commonly mirrored elsewhere. The content is open to access by all. They may be pre-print only archive. Or contain both pre-print and post prints.
We know, e-print = pre-print + post-print. Post-prints are those articles published in some peer-reviewed articles some time in somewhere, or accepted for publication. That means the write-up has gone through some screening and reviewing process. This means information content in that piece of writing is authentic, and accepted by a group of peers. Researcher may rely upon them without hesitation.
Those pre-prints who, yet have not published or accepted for publication in any peer-reviewed journal implies, authenticity of the content may subject to criticism. As the content is not discussed among reviewers and peers have not comment on it, researchers hesitate to use those data because it may arise questions of authenticity and reliability of their own work. Thus, it loose citations. To handle this problem, institutional repositories may follow some reviewing policy like
It may be subject specialists or teachers of that subject in that institution. Specialists and educationalists from other institutions may be involved, if possible. This may be totally voluntary service for the sake of knowledge enhancement.
Ø Impartiality of reviewer,(this question arises in case of institutional repositories. As reviewer from that institution will know each subscriber personally; this may influence him to be soften or harden in case of some subscriber. Involving some specialists from different institutions or reviewing by more than one specialist can reduce this chance.)
In subject based archives, only documents dealing with the particular subject are archived. Their model of collection is centralized, and they try to collect the entire document published on that subject. A good example of this is Los Almos Arxiv Database – a pre and post print repository of articles covering various branches of physics.
But, institutional repositories are setup to archive and providing access to publications of institutional members. This is a way of measuring total productivity of that institution. Distributed, institution-based self-archiving benefits institutions in different ways [p.28…..]:
An institutional repository (IR) is a digital archive of an academic institution's intellectual output. Institutional Repositories adhere to an open access model, by centralizing and preserving the knowledge of an academic institution and making it accessible to anyone with internet access.. But setting up IR is not a very tough job. The most important part of it is preparing a fool proof plan and then executing them. This involves various steps enlisted below.
First, we have to decide it’s purpose. Institutional repositories are not discipline-specific, and aim to archive the entire range of a university's intellectual output. So, specific requirements are to be jot down. Some of them are:
Based on all the needs of the institution, software should be selected. At presence, different software are available for this purpose, e.g. CDSWare, Dspace, Eprints, Fedora etc. Some price based customized software are also available and distributed by vendors on specific conditions. But during last decade, a lot of open source software became available free of cost over internet. Their supporting software (web server, programming language, compiler, database builder etc.) are also available over internet free of cost, and most of them are open source too. Most of the webs browsers are also support them. [A brief comparison of different institutional repository software is available in ‘OSI Guide to Institutional Repository Software v2.0]’. These are working very well and lots of different repositories and open access e-print archives are running for a long time. As they are pen source, one can customize them as per requirements. So choice of software and corresponding operating system will not be a very trivial job.
While once the software and operating system is decided, corresponding hardware requirements are to be checked from their web sites. It is found that, in general no specific requirements are mentioned in different software’s sites. But service speed and reliability of archive depends quality of hardware peripherals. So, it can be opined that, a latest configuration with a very high speed processor, big volume of Random Access Memory (more than 1 GB), high capacity SCSI Hard Disk Drive with high rotation speed (7200 R.P.M) may ease its functionality. More than one physical hard disk drive is recommended for security purose. More than one HD may be used. In that case, crash of one physical disk will not damage all data. A good speed modem is essential also.
Existing network infrastructure should be considered. IR server requires a 24 x 7 connection, with high internet speed. This server is required to be available over LAN and Storage Area Network (SAN) [OSI Guide to Institutional Repository Software v2.0] also.
Institutional repositories contain various types of bibliographic materials, like articles, dissertations, thesis, research reports or even study materials. To make them searchable, Institutional repositories must incorporate, index, and search items from diverse collections in diverse formats. They have to deal with writings of different levels (e.g. dissertation for Masters Degree and for M.Phil, PhDs etc.). They have to deal with standard vocabularies from many different fields of study; and include metadata to all types of contents. Unqualified Dublin Core (http://www.dublincore.org/ ) is the minimum metadata required for OAI interoperability [A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/projects/institutional_repositories/setup_guide-e.html]; however, depending on the type of content in the repository, may include other metadata sets.
OAI is based
on exchange of metadata. So, to make the archive effectively OAI compliant,
right metadata incorporation is essential. Most of the repository software is
OAI compliant, so Dublin Core Meta data element set can be used in
general. But it may not work well for
some types of publications like research papers, thesis etc. In
Materials that satisfy the above requirements might include working papers; conference presentations; monographs; course materials; annotated series of images; audio and video clips; published (or pre-published) peer-reviewed research papers; and supporting material for published or unpublished papers (for example, datasets, models, and simulations) etc. While repository content may thus be defined broadly, some repositories may elect to focus initially on text-based materials, even though they anticipate broadening coverage over time. Additionally, in the interest of encouraging participation and acquiring material to populate pilot and demonstration projects, some repositories may choose to adopt more relaxed (and possibly temporary) guidelines for content in the repository’s initial stages.
Ø Postscript: PostScript (PS) is a page description language used primarily in the electronic and desktop publishing areas. There are a number of advantages to using PS as the display system. It helps in printing the document, allows for the "dumping down" of printers. But the main advantage in using PostScript as a windowing system is that it allows one to write desktop publishing (DTP) and other graphically-intensive applications with a single set of graphics routines. The same code that is drawing to the window can be used to draw to the printer without any translation. DTP applications on traditional systems require the programmer to construct the GUI editor in the platform's own graphics system (for example, QuickDraw on the Macintosh, or GDI on Microsoft Windows) and then write additional code to translate the graphics into proper PostScript for printing. This often takes up the majority of the programming effort on such projects and is a major source of bugs [Postscript from Wikipedia, the free encyclopedia available at http://en.wikipedia.org/wiki/PostScript.HTML ] .
Ø PDF : Portable Document Format (PDF) is a file format developed by Adobe Systems for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents. A PDF file can describe documents containing any combination of text, graphics, and images in a device independent and resolution independent format. These documents can be one page or thousands of pages, very simple or extremely complex with a rich use of fonts, graphics, colour, and images. PDF is an open standard, and anyone may write applications that can read or write PDFs royalty-free.
In addition to encapsulating text and graphics, PDF files are most appropriate for encoding the exact look of a document in device-independent way.Free readers for many platforms are available for download from the Adobe website (www.adobe.com/products/ acrobat/ ).PDF is primarily the combination of three technologies: a cut-down form of PostScript for generating the layout and graphics, a font-embedding/replacement system to allow fonts to travel with the documents, and a structured storage system to bundle these elements into a single file, with data compression where appropriate. [Portable Document Format from Wikipedia, the free encyclopedia. available at http://en.wikipedia.org/wiki/PostScript.HTML]
Ø ASCII : the term stands for American Standard Codes for Information Interchange. This is independent of platforms and application software. Any piece of writing can be done in this format, but has some limitations too. It can’t embed image or graphics and links in it, and can’t be made looking attractive.
Ø HTML: stands for hypertext Markup Language. This is an open standard used to create web page. The major advantage of it is Hyperlinks to connected pages. This is the most widely used web formatting language and easy to use. But the structure varies depending on browsers.
All these formats can be used to accept writings . XML or MSWord formats can also be accepted. PDF should be preferred and encouraged. Else, the administrator can change other formats to a PDF file format. This should be clearly mentioned in the guide to submission section.
Incorporation of subject headings is the most crucial job. Depending on its efficiency, recall precision ratio of search through subjects largely depends. So, Identifying useful set subject headings is one of the major challenges for repository implementers. In institutional repositories, various subjects are to come based on nature of publication. More over this is going to archive research papers which generally deal with very micro thoughts, some times again of inter disciplinary subjects. So, no existing list of subject heading can exhaustively produce exact subject headings for them. Broad subject headings may be appropriate for a single institutional repository. In this case, LCSH can be used. However, as access to institutional repositories becomes federated, it becomes more problematic [A Guide to Setting-Up an Institutional Repository, available at http://www.carl-abrc.ca/ projects/institutionalrepositories/setupguide-e.html]. A user can’t profitably browse papers from a variety of repositories that use very different subject terminologies for representing a single concept. So, while considering world wide accessibility and cross searching facility, one has to think over internationally acceptable and widely used subject headings of any subject. Therefore, international conferences or discussions should be done on that matter.
Another way to rich uniformity is developing an open Subject heading list cumulating widely used subject terms in standard forms in international level. This should be accompanied with an exhaustive Vocabulary control device (e.g. thesaurus) containing all local variations of the standard term used. The software should be compliant with that subject heading list and should include that thesaurus within it. Whenever a query comes in nonstandard form, it should simultaneously convert it into standard term and recall all the entries done in that terms associated. Another option of incorporating a list of standard terms should be included with hat software interface, so that one can select terms from that list.
software, like Eprints loads their subject headings
hierarchy in the database and it is very laborious job to alter them after
uploading some entries on them. So, before starting uploading, a good number of
collections should be collected together. This can help to rid over
repetitiveness and corrections on
Keeping the above said reasons in view, committee structure should be chalked out. The committee may be structured in two layers. The highest and most powerful body-the Executive Committee should consists of VC of the university, Selected members of the faculties and Union representatives of Teachers, students and researchers. High level administrative officials, Finance officials, legal officer and other selected member from administrative body should be included in this Committee. Chief Librarian will represent library staffs there.
group also, some faculty members and legal officials has to be incorporated.
But this committee will be headed by chief librarian and assisted by
experienced library staffs who will operate the
program. (Details included in Committee structure recommended for KU). [Crow, Raym. Institutional Repository:checklist & resource Guide. (
Institutional repositories offer considerable benefits to the institutions that sponsor them and to the faculty, researchers, students, librarians, and others that participate in them. At the same time, institutional repositories might encounter resistance from administrators, faculty, and others who either fail to understand the benefits that such repositories can deliver.. Equally, understanding and systematically addressing the objections raised to repositories will prove crucial to faculty participation and to the ultimate success of each repository implementation.
The perceptions and attitudes of university administrators are critical to gaining the support necessary to validate a repository’s standing within an institution. Even where a repository is implemented and managed entirely as a library initiative, the nature and extent of the efforts required to gain faculty awareness and participation in the repository presuppose the buy-in of an institution’s administration and its willingness to reallocate resources and/or provide additional funding. The rationale for universities and colleges implementing institutional repositories rests on two interrelated propositions (SPARC) one that supports a broad, future oriented benefit and another that offers direct and immediate benefits to each institution that implements a repository. Administrators secure fund for any type of initiative. They can take decisions for taking new proposals for advancement of the institutions. They can permit or deny it. They have the power of implementation of rules. The highest body of the University in its’ court meeting can modify rules according to their requirement. So, their role in building Institutional repository in a University is very important. They can be interested to setup Institutional Repositories in the University if the library can convince them about its advantages. Some of the issues may be:
Ø Increasing costs of journals: Libraries subscribe for different journals publishing on specific subjects. They want to provide researchers with latest developments of their area of interest. But the major barrier to it is high cost of journal subscription. Both printed and online journals are used today, which demands a very big amount to spend every year. Thus, high cost of journals forces libraries to restrict them within a very short list of choice. Even big libraries can’t go for every journals of any specific discipline. Those articles, which are published in those journals, not purchased remains unavailable to researchers. Thus they loose a very big number of publications which may be relevant to them. If every institution sets up an institutional repository and make them OAI-PMH compliant, so that, every archives could be searched, then the access to research outputs will be easier and almost free. This will help to reduce libraries’ journal budget. This can act as a potential future cost savings as the marketplace responds to institutional initiatives; adducing the direct benefits—both tangible and intangible—that a successful repository delivers to its host institution. This can help institutions reaching corresponding industries to come for their help in R&D, and recruiting their students/ scholars for troubleshooting. After all, the administrators have to pay something if the institution is to retain its high stature and reputation for innovation.
Governments and institutions fund for research. Publishers publish them in journals and sell them to make profit. In most of the cases, authors are getting no monetary benefits from that article. But while the library goes, they have to pay to purchase that journal (including that article). Thus the same agency that funded the researcher for research work has to pay again for the same output in published form. Here publishers are getting profit for just publishing and distributing them on demand. This duplicate expenditure can be avoided if an institutional repository is set up. The researcher can publish the output in any journal, but he has to submit one copy in IRs, which will be freely available to all, and thus reduces the expenditure in long run.
Ø Ensuring barrier free access: Hence, this IRs will be OAI compliant, every person having internet connection can access to it. This will be indexed in index of web crawlers and will be accessible to everyone. Cross searching among different repositories and different databases connected through worldwide registration will be accessible. Thus, every body interested can access to resources archived in IRs. Moreover, it can be said that the repository as a long-term investment in changing the structure of scholarly communication helps change the current scholarly communication model—and weaken publisher monopolies on faculty generated contents. That can ensure barrier free access to members of that institution, and in future it can reduce restrictions on access to scholarly publications.
Ø Institutional visibility and prestige: As producers of primary research, it is only to be expected that academic institutions would take an interest in capturing, disseminating, and preserving the intellectual output of their faculty, students, and staff. Currently, much of each institution’s intellectual output is diffused through thousands of scholarly journals. While faculty publication in these journals reflects positively on the host university, an institutional repository concentrates the intellectual product created by a university’s researchers, making a clearer demonstration of its scientific, educational, social, and economic value. This brings the institution to the world. Those universities having IRs will be enlisted in repositories soft ware registration list like those are running in developed countries. This will make all aware of existence, productivity and relevance of the research work from different organization. An institutional repository and supporting metrics provide university administrators with demonstrable evidence of the institution’s quality. Institutional repositories help university and college administrators—including Development and Marketing officers—reinforce an institution’s brand position and prestige.
Ø New platform of getting to the world: While institutional repositories centralize, preserve, and make accessible an institution’s intellectual capital, at the same time they will—ideally—form part of a global system of distributed, interoperable repositories that provides the foundation for a new disaggregated model of scholarly publishing.
Ø Ultimate future of the publications: Experts says that Institutional repositories have a bright future. It is considered to be a well known platform of archiving research output and making it accessible barrier free to all interested for a long time. To form a bridge of global knowledge base, institutional repositories will work as bricks of them.
Ø Funding for setup institutional repositories. This includes startup cost and continuous expenditure for internet services and hardware peripherals, staff trainings, new recruitments (if necessary), digitization of older thesis/dissertations, advertising, organizing seminars etc.
Ø Preparing new rules: this may be essential to gather all scholarly publications by the authors. Such as, submission of articles’ one e-copy could be mandatory for getting next allotted fund for research to scholars. This will force them to submit one e-copy of their writing to the university’s institutional repository.
Ø Modify rules according to needs: if it is found that enough scope has left to bypass the rules by any concerned, then those rules may required to modify. For e.g., a student writes some article during his course of study and publishes it in some journal, but does not willing to submit a copy in IRs. These types of situations can be avoided with well thought rules, strong implementations of them, and a very good user education. There are different ways to make people concern about the benefits of IRs.
Ø Helps to rid over intellectual property issues: This is a headache of a lot of people in this electronic era. Publications become easier over internet in this time. So, one can prepare any document on any topic by coping from others and do not mentioning them in references. This is simply theft this can be avoided by making persons aware about what references are and why they are to be added. But the most important question lies in other section. Author writes and sends to publishers for publication. For publication, they have to signature in some sort of declarations. Publishers generally sign them in such statement that the author has not submitted any other copy for publications else where, and can’t publish it somewhere else without prior written permission of publisher. This may stop authors for subscribing the same piece of writing in IRs. But the fact is , IRs are archive of the article, not a publication.
Else, to avoid unwanted situations, preprints may be accepted. After review and publication in any journal, authors can modify preprints or add some more relevant information and update their database. They may also add some sort of addenda to show changes of rectification. This trick can’t be protected by publishers and seems to make output freely available. Some publishers allow submitting post print articles in some specific conditions. So, while selecting a publisher for sending an article, one can check his policies. Library can compile a list of publishers allowing post print submission.
v Software administration policies: this involves various aspects and policies, but largely depends up on nature of software used. As IRs software is open source and permits customization, specific requirements can be adjusted as per local requirements. So local variations are possible and relating policy changes.
Ø Author’s registration policy: every author has to register by filling up a simple form send from the software. This is an authentic process of communicating to a person concerned. Here comes another question. Who can register? in case of institutional repositories, it may be restricted
For the first three cases, university registration number may be the parity to be asked for with residential address and communication number. For the forth case, It may be any proven identity card’s e -copy (like electoral id card or passport or a letter from the employer of that person proving authenticity of that person’s skills and qualification, or from where he got his PhD etc) may be asked for. But this may seems to be a barrier for submission and recommended not to imply such barriers.
Ø Submission policy: distributed submission and centralized uploading. This means authors having registration can submit their copy, and administrator will check metadata incorporated, standardize subject terms, change file format if required (authors should permit that in shake of technical ease and policies). Then that writing may be uploaded to the database by the administrator. Simultaneously, it should also be informed to author through e-mail.
Ø Editing policy: Any types of post print do not require any editing. Dissertations, thesis etc. are presented and verified by a well organized body of academicians. So, they again do not need to be edited. But, if authors want to modify some portions recommended during presentation and evaluation process, it could be done as errata/additional chapter and attached separately with the document. As IRs is proposed to incorporate preprints, editing becomes an issue of discussion. Any preprint that has accepted for publication and just a mater of time to come out, again need not to edit. Because, it has undergone through a screening process by some authorized body (incase, submitted in some peer reviewed journal). But while, it is just submitted, and has not gone through review process, this requires comments of editorial board.
· Editorial board: The University should form an editorial board consisting of senior teachers, research guides, academicians having editing experiences and subject specialists of the University. They may incorporate specialists from outside the institution. But the work will be totally voluntary and interest of concerns is highly expected.
· As, various subjects are taught in a University, and researches are a part of its activities, it will face a lot of different types of writings in different subjects. So, it is simply not possible to form a very large body of editors comprising subject specialists from every subject field in more than one number. So, a core editing committee with experienced editors and senior professors/deans of the faculty are recommended to form. They will send requests to other concerns to help as guest editor while required. A list of potential editors/subject specialists have to prepare for that purpose.
· Editing should be done by the core committee and at least two different specialists, one from the University and another from outside the University. This will make the process more acceptable, and avoid any types clash in view with existing specialist and the author (as they might be known to each other and their view may not be matched. Two is better than one.) The committee may ask for some sort of changes before uploading.
· This is a lengthy process and a trivial job, too. This is important to keep the standard of materials in the database. But another question arises simultaneously- whether this will be considered as University publication or not, because this is edited by a body formed by the university authority and it can recommend for changes in the writing.
· This total process of editing can be avoided by simply denying accepting pre-prints that has not accepted for publication till date of submission. But this will hinder purpose of archiving. More over, publication is a very lengthy and time consuming process. Delay in publication may lead to duplication of work. Another way of bypassing the trivial process is to mention them as preprints. But again, researchers would not rely upon their data, and may be misguided, if published some where.
Therefore, before selecting types of materials accepted, the highest committee has to decide editorial policies and prepare a clear management policy for archiving. This would also add some advice/conditions to the authors about updating them after publications. They also have to decide weeding out policy. If post print or updated version is archived, they can remove preprints. They can decide to remove some sort of publications seems to be invalid, for e.g. older rules and regulations while newer comes and implemented.
o Authors can do it by simply filling up some standard form available in the data base, and administrators will check them. This can be done while authors become aware of how to it. But in the beginning time, Library personnel can do it for authors and show them how to do it..
If enough library staffs are available, the total process
can be done by them. They just get required information from authors and fill
v Standardized indexing: this is a process required to make data base effectively searchable. A good index can enhance recall-precession ratio. These features are come bundled with the software. It can be also customized based on its’ requirements. As a library and information science student, I won’t recommend for free text indexing, as it will produce high recall and remove effectiveness of good subject headings.
Searching: Searching can be broadly divided into two
v Maintaining, backup creation in a regular interval, updating backup, mirroring sites, indexing through widely used search engines and directories –like google, yahoo; enlisting in scholarly publication search like Google scholars, etc should be ensured.
This advocacy is not a one-time job. Libraries and institutions should have to do it continuously every year as a part of their user education activities. This will make it aware every new comer to the institution. In Universities, fresher welcome ceremony may work as a platform for informing new students about the repositories. Every departmental head may inform students about it in their first address to new batch. Library may handover them a leaf let when they go for their user’s card. With user education, library may include discussions about IRs.
Authors write to share their
experience and knowledge on a particular issue/branch of knowledge aiming to be
known among peer groups. They want to be considered as human resources on that
particular area of study. They want opinions of their peer group on their work.
That leads them to the hall of fame. To day, an author’s success is measured by
not only volume of work they produce or number of publications on peer-reviewed
journals, but also through the number of citations they received. An author,
while writing a research paper takes help of a number of documents and finally
quotes them with it’s bibliographic details. Citation
implies a relationship between a part or the whole of the cited document and a
part or the whole of citing document. Thus citation is acknowledgement that one
document receives from another. [Bibliometric studies : on Indian library & information science
literature / Gayatri Mahapatra.
One of the primary conditions of getting more citation is to reach almost every person interested on that topic for a long time. If authors go for traditional print version only, due to limited circulation they can’t reach a major portion of peer groups. Again subscription based online publications also have limited access problem. Raising cost of serials/database/ online journals has created ‘serial crisis’. [Callan(Paula).The development and implementation of a university-wide self-archiving policy at
Institutional Repositories: The Next Stage. Workshop
presented by SPARC & SPARC EUROPE,
Again, readers can’t access or even be aware of existence of many publications of their interest. So, a chance of repetition of work, loss of time, money and energy, and wastage of manpower and intellect slows down development of society and knowledge. Thus readers also suffer a lot. Libraries can’t gather all publications and so, their services are also restricted within a very narrow lane.
Open access journals have brought some fresh air in this restricted environment. But different Open Access Journals (OAJs) have their own policies, own conditions and limitations- for e.g., a limited archiving period. So, it is next to impossible to publish all works of any institution in any OAJ. Moreover, they do not publish dissertations and thesis etc. Some official decisions, important work guides (e.g. guidelines for Ph.D), tutorials, etc. may also needed to be archived and accessed by all.
Thus, IRs confirms enhanced accessibility of publications. It also helps readers to find all relevant things together. It also enhances scope of getting more citations for authors. Steve Lawrence [Lawrence(Steve). Online or invisible. Available at : http://www.neci.nec.com/~lawrence/papers/online-nature01/ ] investigated the impact of free online availability by analyzing citation rates. He observed that, more cited articles on ‘Computer Science’ are available online. He said, online articles may be more highly cited because they are easier to access and thus, more visible and more likely to be read. He opined that free online availability facilitate access in multiple ways, including online archives, direct connections between scientists or research groups, hassle-free links from e-mails, discussion groups, and other services, indexing by web search engines, and the citation of third party search services. Free online availability of scientific literature offers substantial benefits to science and society. In IRs, all the work will be freely available to all searchers. Again this will enrich ability of library and information centers to find scholarly publications over the world and produce tailored personalized services to each individual user.
In future, while all institutions will setup their own repositories and enable cross search facility among them, all intellectual output/ production of this planet will form a large, exhaustive and exclusive bank of knowledge and make access totally barrier free.
Duplication of thought content without citing the work is just theft of the original work. While institutional e-print repositories are going to available to every person worldwide, it will increase scope of such works called plagiarisms. These will thus, violets intellectual property rights. Only awareness and truthfulness of users are the solution of the problem. They have to make aware that, they can use data from those writings, but need just a citation.
Preservation is another important question for archiving issues. In case of printed media, their preservation ability is proven to last long in course of time. But digital Medias are of a very small period, and not proven as more reliable than printed media. Here come two types of factors while preserving in digital media, longevity and technological support. Longevity has not been yet proven as it is a newer one. File format is an important issue relating to technological issues. Tremendous growth of technology brings newer version of same software in almost every year. It is feared that, after a decade or two, present formats may not be readable any more due to lack of technological support. Hard ware peripherals may also change to a large extent. More over, issues like 9/11 and Tsunami also proved that everything in the web will last for ever- is wrong interpretation. So, an issue relating to remote backup also rises. Preservation through multiple copies in distant places is also another thought. Some thinks, a large scale power failure and viral activities, hacking and intentional human activities may destroy the database.
But, most of the fears are of accidents or factors of chances. Even incase of paper media, nobody can ascertain that it will last for ever and provide access to all concerned. It is found that, after a long time, paper works requires special preservation techniques and restricted careful handling. This hinders one of the most important factors of preservation-accessibility. If anybody interested can’t use the document, then what is the utility of preserving them? Digital media is better option in this case. Library may decide to preserve one in digital media without restricting its accessibility, because it is easier to copy and circulate without affecting the original archive copy. More over, retrieval in archives are much easier than finding out one printed article from a heap of back volumes.
In institutional repositories, data are stored in a software independent format and migrated through successive hardware regimes. Data is stored together with the hard ware and software required to make or use it. So, it is found that, data preservation is easier in institutional repositories/archives than in individual digital medium.
But, there should be a strong backup policy. It should be done in a certain interval on a regular basis. It can be relied on optical media, or magnetic media or remote hard disc backup connected through network. There should be some good quality antivirus and fore walls for protecting data. To avoid unnecessary situation relating to power failure, a large inverter backup should be there. If data looses due to some unexpected situations, server in-charge should try to update it from preserved data base, or may consult with specialists to recover data in case of loss of backup, too(unexpected situations –like natural calamity).
The work of starting a repository is a vast job and the workload needs to distribute among faculties and library professionals while dealing with back logs of author’s writings.. Although the archiving software is associated with author self-archiving, self-posting through the system requires several steps. Given the significant disparity of technical proficiency amongst faculty, potential contributors might not have the expertise— or the inclination— to deposit materials themselves. Not surprisingly, then, early repository implementers consider library mediation of content submissions to be the only practical method of managing the archive, at least initially. This library management of the document contribution process typically includes:
Raym Crow opines that [ SPARC] one way to ease and encourage faculty and departmental participation is to frame participation in a manner that it addresses a problem the faculty wishes to solve. By helping collect and host papers for a university-sponsored conference, assuming responsibility for departmental working paper series, or taking on digital production and archiving responsibility for existing programs, repository implementers can lessen the workload of faculty while actively encouraging their participation. At the same time, such projects will have to be sensitive to the perceptions and apprehensions of the departmental support staff currently responsible for them. The user community orientation adopted by DSpace provides another alternative: each DSpace community designs a workflow process that accommodates the needs of its faculty and staff. In this way, administrative and technical responsibilities can be shared by the community’s resources, coordinated with the library.
The technical support costs of developing and operating an institutional repository will depend on the service level agreement the repository has with the institution’s technical support operations, and possibly, with third parties. Implementers of EPrints software indicate that the staff time required to install and configure the software is approximately four to five FTE days. While other library staff can perform much of the policy-based component of the repository, setting up the repository technical infrastructure—even using a largely turn-key solution such as the EPrints software—requires the assistance of a technical systems administrator. In KU, Faculty stuffs from Computer Application Section may take the responsibility at initiation stage.
Software costs will depend on a basic “build or buy” (or “borrow”) decision, which has economic, strategic, and many practical considerations. At present, a number of proven, dependable, flexible, low-cost software solutions are available.
EPrints and DSpace are proven to work good in this purpose, and both are freely downloadable. They are open source, and could be customized. Their supporting software is also open source. So it won’t need any additional expenditure.
Configuration selected. EPrints can run on a basic hardware configuration, although disk storage, server capacity, and perhaps other specifications would need to be upgraded as the repository moved from a pilot stage into public operation and heavy use. Hardware specifications for DSpace are not yet available. However, system hardware costs for either system will vary with the fault tolerance that the repository is willing to accept (for example, low downtime tolerance might require an inventory of replacement drives, etc.), backup capabilities, and other requirements. The cost of such services will typically depend on the existing capabilities of such units and the extent to which the repository implementation can achieve operating efficiencies with existing technical operations. The same is true of networking, which should be a modest incremental expense to the institution’s existing network.
On-going technology labor costs, such as for system administration, are generally allocated as an increment of existing human resources and programs. Initially, non-technical staffing may also be handled via resource allocation, although larger initiatives will need to commit to staffing long-term program management positions.
The software should support switching over to newer versions or even in totally different software through a common data structure. But in-house customization may demand for expert’s help which may demand for additional expenditure.
In institutes like universities, libraries always play an important role to provide information services to every concerned – students, research scholars, staffs, research guides, teachers. So library is the common place where every concerned has to come for fulfillment of their information needs. Libraries serve them through their resources. In this era of electronic resources, their performance through use of electronic resources has enhanced. They subscribe for e-journals and provide search ability to users in those databases. Day by day, costs of journals are raising forcing libraries to squeeze their list of preferred journals to cope up within their budget. Thus, scope of providing services also decreases.
As an inevitable part of research assistance, libraries can take most responsibility relating to setup and maintain institutional e-print repository, and advocator of the method of self archiving. Steven Harnad [HARNAD (Steven). For whom the gate tolls?... Available at http://www.cogsi.soton.ac.uk/~harnad/ ] has divided the work of setup in two waves, first- setup the archives and work as proxy of the author, and secondly, to maintain and popularize them.
Library could be advocator to setup institutional e-print repository. This will give library professionals another scope to serve patrons by using IT. This will enlarge their image to every concerned. They can apply their collective, consortia power to maintain archives, day to day problems, arrange them properly and prepare proposals to overcome them. They can help authors to archive their writings at the first stage. In future, they may instruct how to do it. They also may play as proxy to the authors in case of very busy, old persons in exchange of a minimum negotiable charge. This is a solely policy matter and the executive body should prepare clear instructions relating to that. But it is expected that, with personalized, individual attention and ease of filling up forms and doing steps of archiving, the matter may seem to be as easy as writing an e-mail.
Library stuffs can administer the
server. They have enough skills of organization of knowledge. With their
professional skills, they can manage
Besides professional knowledge, this also requires some more skills. Library professionals have enough patience and ability to talk to every individual interested. They can make others understood about utility of this archive. But this maintenance of sever needs some advanced IT skills. Here library staffs may feel insecure while handling a server. But proper training in short intervals will help them grow confidence on it. They have to gather knowledge of installation, LAN operations, Internet connectivity, virus problems, access control, backup techniques, working knowledge of programming and server administration, etc.
There fore, it is recommended that, more than one library staffs (one assistant librarian at least) may be trained in a regular interval about these factors. Librarian is proposed as a member of both Executive Committee and working group so that he can convince Executive Committee about need of training library staffs for that purpose instead of selecting one computer professional for the work. Here, for K.U., working group members are proposed by keeping eyes on the troubleshooting factors relating to IT skills. Charge of server maintenance is proposed to other than librarians’ because he is not only a highly experienced professional, but also act as administrative officer of the library. He has already under a burden of library operations. He will function as high level supervisor and get reports from server in-charge, and carry them to Executive Committee. He will bring decisions to working group and supervise implementing policies. He may advice server in-charge in case of critical situations, but should not take all burdens of maintaining it.
Open source software is software that includes source code and is usually available at no charge [Corrado,Edward M. Spring 2005.The importance of open access, open source, and open standards for libraries. Issues in Science and
Technology Librarianship. Available at http://www.istl.org/05-spring/article2.html ]. There are additional requirements besides the availability of source code that a program must meet before it is considered open source including:
Libraries can realize many advantages by using open source software. One of the most obvious advantages is the initial cost. Open source software is generally available for free (or at a minimal cost) and it is not necessary to purchase additional licenses for every computer that the program is to be installed on or for every person who is going to use the software. Open source software not only has a lower acquisition cost than proprietary software, it often has lower implementation and support costs as well.
It is easier to evaluate open source software then proprietary software. Since open source software is typically freely available to download, librarians and systems administrators can install complete production-ready versions of software and evaluate competing packages. This can be done not only without any license fees, but also without having to stick to a vendor's trial period, evaluate a limited version of the software, or deal with the vendor's sales personnel. If the library likes an overall open source package but would like a few added features they can add these features themselves. This is possible because the source code is available. Even if a library does not have in-house expertise they can benefit from source code availability because another library may be able to provide them the fix or they can hire a consultant to make the changes that they desire. It is to be noted that if a proprietary program "is deficient in some way [the user] must wait until the vendor decides it is financially viable to develop the enhancement -- an event that may never occur." With open source software the user can develop the enhancement themselves.
Open source software allows for more support options. Proprietary software vendors often package service with the product. This is particularly true of proprietary library-specific software. When support from a vendor is inadequate it is an additional expense to purchase another tier of support, assuming that it is even available. Open source software allows for different vendors to compete for support contracts based on quality of service and on price. Access to the source code also allows for self-support when practical and desired.
The amount of vendor lock-in is dramatically reduced with open source software. The large initial costs often associated with proprietary software makes it difficult to reevaluate the choice of software when it does not live up to expectations. Proprietary software can lead to a single point of failure. If a vendor goes out of business or decides not to support a program anymore there is often nothing an user can do. Organizations using the software could provide self support or other vendors can come in and fill the void left by the previous vendor if the program were available as open source software.
For the present purpose, Library can use all open source software to do it. There are lots of software are available, but in practice, it is found that EPrints and DSpace are most widely used and discussed for their functionalities. So, discussions are bounded here with these two. EPrints and DSpace both run in Operating systems – Linux (7.2 -9.3) or Fedora core (1-4). Both are open source software. EPrints (www.eprints.org/) and DSpace (http://libraries.mit.edu/dspace-mit/technology/ ) themselves are open source. Their supporting software can also download from their respective links.
When considering a technical implementation for an institutional repository, it is important to remember that the explicit expectation is that the content managed by the system will survive the system itself and can migrate as new technologies evolve. In any event, switching costs from one repository technical solution to another would typically be high. Also, switching systems and solutions can be quite risky. Therefore, institutions will want to select their implementation path carefully. Even though several of the solutions are open source, they still involve database mapping and other customizations that would require additional investment if the infrastructure were changed.
Therefore, the system must be content-centric: applying standards and protocols that facilitate ongoing access to the information itself must be central to the system’s conception. The design and implementation of both the EPrints software and the DSpace system have been based on such standards. EPrints can export the archive metadata in XML in a structured format that facilitates migrating to a subsequent system. Both EPrints and DSpace are based on open source software licensing principles.
EPrints and DSpace offer off-the-shelf systems that allow an institution to implement a complete framework for an OAI-compliant repository without resorting to in-house technical development. Both systems can be customized to meet local requirements, allowing an institution to configure metadata formats, design subject hierarchies, define acceptable file formats, and register with OAI.
It is possible to chose between a source- or binary-installation. With the source one the software has to be compiled by the programmer. The binary one is precompiled for special architectures like Solaris Sparc systems. The programmer only needs to configure the software.
MySQL, Apache and mod_Perl, the components which are necessary for implementation are smooth installations - no matter if source- or binary-installation is chosen. The installation of additional required Perl modules need more time to resolve the dependencies.
If any installation problems are arising a comprehensive support is ensured. GNU Eprints has a separate website containing documentation, downloads, demonstration server and mailing lists: http://software.eprints.org/
In order to run DSpace the following list of Software is necessary to be installed and configured before: Java 1.3, Tomcat 4.0+, Apache 1.3, PostgreSQL 7.3+, Ant 1.5. Details of the requirements can be viewed at: http://dspace.org/technology/system-docs/install.html#prerequisite
There is no support service for the DSpace installation. But there is detailed system documentation at: http://dspace.org/technology/system-docs/index.html. And also a public mailing list for the installation questions is supported.
It is possible to store documents in any common format that the archive administrator defined to be accepted. Each individual research paper/ eprint/ ... can be stored in more than one document format.
DSpace is organised into "Communities" and "Collections", each of which retains its identity within the repository. It supports a variety of digital formats and content types including text, images, audio, and video and allows contributors to limit access to items in DSpace. All these items can be organised by an administration interface.
Currently DSpace supports only the Dublin Core metadata element set with a few qualifications conforming to the library application profile. But there are still developing plans to support a subset of the IMS/SCORM element set (for describing education material) in the coming year.
Eprints uses traditional technologies and runs on pure Open Source systems: mySQL is the world's most popular open source database, recognized for its speed and reliability and Apache has been the most popular web server on the Internet since April of 1996.
Eprints is freely distributable and subject to the GNU General Public License. This means that its source code is open and freely modifiable by any programmer who wishes to modify it (on condition that modifications are all free and open).
The DSpace system is freely available as open-source software. This allows to make any necessary changes to the downloaded copy. The system was designed to make adaptations for individual organisations as easy as possible.
In fact, several modules in DSpace will probably be customised by organizations using this tool (e.g. it might be necessary to get authorization and authentication for more than one person). Or some organisations may want to adapt a different environment than recommended (e.g. replace postgreSQL by mySQL or Oracle). At the moment, substituting a different relational database than postgreSQL will require just a few changes to the system's Browse module.
Java provides documented Java APIs that can be enhanced to allow interoperation with other systems that an institution might be running (e.g. auto-depositing in DSpace a department's web document system, or the campus data warehouse).
DSpace offers two levels of text search: simple and advanced search. It's submission process also allows to use a qualified version of the Dublin Core metadata schema for the description of each item. These descriptions are stored in a relational database, which is used by the search engine to retrieve items.
The term "open standard" means different things to different people. Three key characteristics [Corrado,Edward M. Spring 2005.The importance of open access, open source, and open standards for libraries. Issues in Science and Technology Librarianship. Available at http://www.istl.org/05spring/article2.html ]of open standards are :
3) The standard has been developed in a way in which anyone can participate. When a standard has the first two of these characteristics (the ability to use the standard and to obtain it with out a significant cost) it can be said to be an open standard in a utility sense. That is to say that an open standard is a standard that is not encumbered by a patent, does not require proprietary software, and can be utilized by anyone without cost.
Proprietary standards can sometimes be expensive and it may be cost prohibited to purchase access to a proprietary standard if it is ever needed. Many people consider a standard to be sufficiently open as long as it is open in a utility sense. Others take this a step further and consider a standard to be open only if the process meets the criteria of being created and modified in an open process as well. Dublin Core is a completely open standard that is open both in utility and in process. All one has to do is show up and participate in order to contribute to the development of Dublin Core.
It is important for libraries and other cultural institutions to ensure long-term access to digital information. The rapid growth in digital technologies has led to new and improved applications for digital preservation. However at the same time it has also led to some problems as well. Two of these problems are obsolescence and dependency issues. The obsolescence problem is caused by the advances in hardware and software making many computers obsolete within a very few years. Dependency problems can arise if tools that are needed to communicate between systems or read file formats become unavailable. In order to account for obsolescence and dependency problems organizations must be able for migration of data into new systems. Data migration, however, cannot occur without access to data file formats.
Properly created open standards for file formats are less likely to become obsolete and are more reliable and stable then proprietary formats. In the event that an open standard file format does become obsolete, having access to the file format would allow anyone to easily, and legally, create a data conversion utility. File formats that use open standards can assist in long-term archiving because they allow for software and hardware independence. Open standards help alleviate issues caused by obsolescence or dependency problems since files created in formats that adhere to open standards are more likely than proprietary formats to be readable twenty or fifty years from now. This allows for greater flexibility and easy migration to different systems in the future.
The use of open standards can help
assure interoperability of diverse systems. There are various software packages
that are being used to create digital libraries, online library catalogs, and
other resources that libraries relay on. These various systems need to be able
to interact in order to provide the best possible service to patrons. The way
to make certain that these diverse systems, and any future systems, can
communicate with each other is by using open standards to help achieve the
"free flow of information through interoperability" (The Open Group.
2005. Developer Declaration of
Some library-centric initiatives, including the Open Archives Institute (OAI), also support open standards. OAI's mission is to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content. DCMES is also supported open standard for OAI.
Higher Education system in the Country is a loose configuration of heterogeneous organizational units - universities, colleges, professional councils etc. This diversity is a source of excellence and makes it vibrant. Coordination of such a diverse system of education is tricky yet necessary to ensure its credibility. So, they have meant to create a Knowledge Repository for communities of teachers and researchers in the Country.
UGC is developing a mechanism for tracking academic information resources such as learning resources, curricula, question banks, national theses etc., published in various formats through systematic, internationally used metadata data framework for tagging such resources.
Each and every work needs a strong well thought policy to organize anything successfully. Institutional repository setup needs a well-organized structure of decision-making body so that proper implementation of policies could be possible. So before going for other factors, one committee should be formed to study its possibility, scope and coverage of the work, as well as to administer over it. The committee should be at least in two levels: One- Executive committee and an Advisory committee.
This is a proposed structure, and subject to change as per requirements. The Executive Council may take responsibilities instead of forming Executive Committee on this mater. But formation of working group is strongly recommended, as they will handle day to day activities and deal with problems in a regular basis.
The main activity of Executive committee is at the beginning, while policies are going to take shape. After then, they may meet once in a year or twice to discuss over its’ progress and suggesting developments. The working group should together at least once in a Month. This is important because they will be responsible for it’s’ success or failure.
The early repository implementers consider library mediation of content submissions to be the only practical method of managing the archive, at least initially. The work load needs to distribute among faculties and library professionals while dealing with back logs of author’s writings. This library management of the document contribution process typically includes:
Although the archiving software is associated with author self-archiving, self-posting through the system requires several steps. Given the significant disparity of technical proficiency amongst faculty, potential contributors might be expected from them.
In KU, Students From Department of LIS can be assigned some project works as part of their academic curricula to collect and host papers for a university-sponsored conference, or taking responsibility for departmental working paper series, or taking on digital production and archiving a number of backlogs of different nature of bibliographic materials waiting to go in archive. Repository implementers can lessen the workload of faculty while actively encouraging their participation.
The user community orientation adopted by DSpace provides another alternative: each DSpace community designs a workflow process that accommodates the needs of its faculty and staff. In this way, administrative and technical responsibilities can be shared by the community’s resources, coordinated with the library.
· Through amending some rules in university rules: University should amend new regulations about subscriptions of articles to institutional e-print repository, as researchers are funded by the university authority. This may be an essential condition for getting fund for research. This is also applicable to research guides, staffs and other students of that institution.
· By inspiring research guides/teachers to encourage students to submit e-prints in institutional repositories: Research guides can instruct scholars to submit an e-copy of their work to IRs.: Teachers can encourage students to write about some topics and post a copy to IRs. Internal seminars also produce a lot of literature. Teachers can encourage students to publish them in some journal and send a copy to institutional e-print repository. This will help students to achieve an identity among others.
This advocacy is not a one-time job. Libraries and institutions should have to do it continuously every year as a part of their user education activities. This will make it aware every new comer to the institution. In Universities, fresher welcome ceremony may work as a platform for informing new students about the repositories. Every departmental head may inform students about it in their first address to new batch. Library may handover them a leaf let when they go for their user’s card. With user education, library may include discussions about IRs.
Distributed submission with centralized management is recommended policy for the purpose. At initiation, to free users from problems, library may take depositions in CD, then convert the files in to PDF form and upload after proper incorporation of metadata. In that case, metadata of the articles should be collected in the same format the author has to fill in, while posting to archive himself. Working as proxy for authors may also be practiced, in charge of a nominal amount for older people or who faces a lot of problems and can’t solve them himself. The charge should be very low, so that authors can’ gets away listening the amount.
There are lots of subjects taught and discussed in KU. Researchers work in different subjects’ problems, as well as interdisciplinary subjects. Any existing Subject heading List can’t serve the purpose. It is not possible for KU at present to collect special subject heading lists on every subjects. So, until any international open standards are formed and any international level open access subject heading list covering every micro thoughts comes (or permits to form new standard term lists for inter-disciplinary subjects) LCSH may be a handy one.
As this is an institutional repository, so members of the institutions (students, Teachers, Staffs, Research Scholars) may only have the access. There are some colleges under the jurisdiction of KU. As an extended family member of KU, they also will have the permission to submit here.
At present, there are a very few repositories at work. There are lots of people writing a good number of articles every year, though they are not directly connected to the University at present. I recommend permitting them to deposit in the archive. The university may ask for authentication of author’s qualifications and identity. Even (though should not) they may ask for a very little amount for each deposition.
Again, the archive itself should register to open archive (http://www.openarchive.org/ ) / eprints’ archive (http://www.eprints.org/ ). They will enlist it to their list of archives. This will help metadata harvesters to harvest their archives and made it accessible. Links to search engine sites should also submit. DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp) is a software which can translate OAI compliant metadata into search engine friendly data.
A server is essential to start the archive. Hardware peripherals are not going to be very costly one. The cost for manpower will be maximum, followed by advocacy costs. Software are freely download able and requires almost no cost. Backup and networking will also demand for a good amount.
If some students are assigned this conversion as their project work (specially to students of Library and Information Sciences), and made it mandatory for every LIS student to upload a certain number of back volumes of thesis and dissertations, then with time, the work load can be reduced to a great extent.
In practice, I have tried with both EPrints and DSpace. I felt a lot of problems with EPrints installation. I could not establish relations between EPrints and MySQL server. At last, I had to move to Dspace. I used a script available from DRTC and pg73jdbc2.jar file to install it.
· I repeated it in different types of machines (PII, PIII, P4 with different hard disk capability –from 10 GB to 40 GB assigned for Red Hat Linux, with 128 MB and 256 MB RAM). They all worked well, and took less than 5 minutes.
· Due to lack of time, its’ functionality and other activities could not worked out. Clear instructions are available through documentation bundled with it. Information regarding various aspects may also be collected from DSpace System Documentation http://libraries.mit.edu/dspace-it/technology/system-docs/
Corrado, E M. 2005.the importance of open access, open source, and open standards for libraries. Issues in science and technology librarianship. Available at http://www.istl.org/05-spring/article2.html
[Callan(Paula).The development and implementation of a university-wide
self-archiving policy at Queensland University of Technology (QUT): Insights
from the frontline. In Institutional
Repositories: The Next Stage. Workshop presented by SPARC & SPARC EUROPE,
N.B. this version is not exactly what I submitted for the examination purpose in the University. It is a preprint version which was edited in one or two places after this version was copied. Due to some personal reason, I could not submit the exact copy I submitted there. This version is out of TOC and some other specific areas.