The Lin|gu|is|tik portal

The Lin|gu|is|tik portal is a research tool for the general and comparative linguistics as well as the linguistics of European and non-European languages. You can find here different types of discipline-specific, scientific resources: conventional, printed and digital secondary literature, online resources, and research data.

Furthermore, the Lin|gu|is|tik portal offers services in the areas Linked Open Data, electronic publishing, acquisition of literature, and licensing of online resources.

The Lin|gu|is|tik portal is developed and maintained by the University Library Johann Christian Senckenberg (University Library Frankfurt) with the financial support of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation). The Lin|gu|is|tik portal is the official web presence of the Fachinformationsdienst Linguistik (FID Linguistik, Specialised Information Service Linguistics).

The Lin|gu|is|tik portal as a virtual library

The Lin|gu|is|tik portal originates from the Special Subject Collection General Linguistics, Comparative Linguistics. From 1950 to 2015, the University Library Frankfurt hosted the Special Subject Collection and acquired international literature from the field as extensively as possible. The initial goal of the Lin|gu|is|tik portal was to enhance the visibility of the collected publications. It was conceptualised as a virtual library (Virtuelle Fachbibliothek) and developed with the support of the DFG.

The development of the virtual library started 2012 as a collaborative project between the University Library Frankfurt, the Leibniz-Institut für Deutsche Sprache (IDS) Mannheim, and the LINSE Linguistik-Server of the University Duisburg-Essen. Various modules were implemented during the first project phase (2012-2014). Directories comprising websites, research projects, and online dictionaries were created. Linguistically relevant, electronic journals and online databases were manually selected from the EZB and DBIS respectively, and indexed according to object language and subject area. An index-based search function that combines the directories of online resources, several library catalogues, bibliographies and repositories was implemented.

The development of the virtual library continued during the second project phase (2015-2017). The directories of online resources were expanded selectively; twelve additional library catalogues and bibliographies were integrated in the virtual catalogue. Thanks to Prof. Chiarcos from the Applied Computational Linguistics Lab at the Goethe University Frankfurt who joined the project during that period, the Lin|gu|is|tik portal embraced Semantic Web technologies as a new focal point. The interlinking of the Lin|gu|is|tik portal with Linked Open Data (LOD) started which facilitated the integration of linguistically relevant LOD resources in the search index.

The Lin|gu|is|tik portal and Linked Open Data

An increasing number of open, linguistically relevant resources have been created and published in accordance with the Linked Data principles. The Linguistic Linked Open Data (LLOD) cloud is designed and maintained by the Open Linguistics Working Group of the Open Knowledge Foundation. It comprises dictionaries, terminologies and knowledge bases, corpora and typological databases. The LLOD cloud has the goal to establish interoperability of resources and thus enable automatic information retrieval.

In order to establish a connection between the Lin|gu|is|tik portal and the LLOD cloud the thesaurus of the Bibliography of Linguistic Literature (BLL Thesaurus) has been linked to a repository in the cloud. The BLL Thesaurus provides the thematic classification as well as the standardised vocabulary used for classifying and indexing within the Lin|gu|is|tik portal. The Ontologies of Linguistic Annotations (OLiA) are used as a connecting point in the LLOD cloud. A central node in this modularly built repository is the OLiA Reference Model. This model comprises terminology definitions and has the role of a mediator between different annotation schemes.

The BLL Thesaurus was modelled according to the LOD principles in order to link it to the cloud. The connection to the OLiA Reference Model was established by the manual linking of BLL subject terms to OLiA concepts. With this, the foundation for an LOD-based search function as part of the Lin|gu|is|tik portal was created.

At first, the LLOD cloud was searched thoroughly for relevant data points. During the search, all LLOD resources with a connection to the OLiA Reference Model were indexed. In a next step, the relevant resources were examined with respect to the presence of corresponding BLL terms. The results were then integrated in the existing catalogue search seamlessly and thus a low-threshold access of the LLOD resources was enabled.

At the end of the project "Virtuelle Fachbibliothek", the created databases (for details see BLL LOD edition) were published so that the LOD community can reuse them.

The Lin|gu|is|tik portal and FID Linguistik

Since 2017, the Lin|gu|is|tik portal has been further developed as part of the DFG programme Fachinformationsdienste für die Wissenschaft. FID Linguistik is a project of the University Library Frankfurt in cooperation with the Applied Computational Linguistics Lab.

In close collaboration with the research community, the FIDs develop information services tailored to the needs of the researchers. The goal of FID Linguistik is to expand the existing infrastructure, enhance the functions and the services and thus establish a powerful system of information supply for the linguistic community in Germany.

FID Linguistik (2017-2019)

During the first project period (2017-2019), the focal points were the areas Open Access and Research Data. Different actions were taken in order to optimise the search for research data, enhance its visibility, and improve the availability of research data that is subject to licence fees.

The work in the field of LOD continued: The interlinking of the Lin|gu|is|tik portal and the LLOD cloud was expanded; the LOD search was extended and improved. Firstly, the BLL language identifiers were modelled as LOD and linked to the LLOD repositories Lexvo and Glottolog. Additionally, routines were developed for the automatic assignment of keywords to freely available digital corpora and dictionaries. The development focused on language resources whose formal metadata (title, author, etc.) is available as LOD via metadata portals such as Datahub, Linghub or CLARIN Virtual Language Observatory. Relevant resources found on these portals undergo an analysis with regard to object language and annotation scheme (tag set or vocabulary). The outcomes of the analysis are stored in a newly created metadata repository (Annohub). The indexing is based on the Annohub data: The concepts from the found tag sets / vocabularies and the languages determined are examined in order to find corresponding BLL subject terms using the existing links to OLiA, Lexvo, and Glottolog. With the assignment of BLL subject terms, the resources are integrated in the search index of the Lin|gu|is|tik portal. These measures resulted in an enhancement of the search space, both in qualitative and quantitative terms.

In order to improve the visibility of linguistic research data and its scholarly analysis a bibliographical subproject started: The metadata of relevant publications from the Bibliography of Linguistic Literature is linked to the corpora used for the research described in the publications. As a prerequisite for this linking, authority records are created for the relevant language resources. Each subject heading is linked directly to the web presence of the respective corpus. This approach made possible the direct display of the publications related to a selected language resource.

FID Linguistik started a pilot project aiming to improve the availability of research data: The FID grants licences for corpora provided by the European Language Resources Association (ELRA) to researchers within Germany. In the Lin|gu|is|tik portal, a new module (Licences for corpora) was created for the service. The module consists of descriptions of the general terms, conditions, and procedure as well as a web form for the individual request of a corpus licence.

Another service of FID Linguistik is the acquisition of printed or electronic specialist literature that complements the library stocks of German universities. The acquisition is driven by the needs of the research community. For this purpose, a web form for book suggestions was created.

FID Linguistik supports open access as publication model. A new service was introduced aiming at the strengthening of the open access infrastructure: the hosting of electronic journals. FID Linguistik provides free of charge the technical platform, the long-time sustainability of the content as well as the integration in relevant databases and catalogue systems. Thus, FID Linguistik supports scientific editorial boards in publishing linguistically relevant open access journals. Several journals already use the hosting service of FID Linguistik including the newly founded Journal for Media Linguistics as well as the International Journal for Literary Linguistics.

Additionally, scholars from all over the world have the opportunity to publish their linguistically relevant research in electronic form and in accordance with the principles of open access using the Linguistik-Repository. This document server has been established over the past few years with the support of the DFG.

Since the beginning of the Project "FID Linguistik", the number of services and resources provided via the Lin|gu|is|tik portal has constantly been growing. Simultaneously, the user expectations have changed, especially regarding the use of mobile devices. In order to be able to meet the needs and expectations, the web interface underwent a technical and visual revision that led to the relaunch of the Lin|gu|is|tik portal.

FID Linguistik (2020-2022)

With the support of the DFG, a follow-up project has been launched for the period 2020-2022. During this second funding phase, FID Linguistik will enhance the range of its information services and activities.

The content-related and technical development of the Lin|gu|is|tik portal remains a top priority. Thus, an ontology-based search function will be implemented. In addition, new modules with topic-specific functions regarding languages and corpora will be established. Moreover, we will facilitate the export of all bibliographic records – for end users as well as for other interested portals or related specialised information services.

The existing interlinking with LOD will be extended to include links to further relevant resources. One potential target is for example PHOIBLE, a repository of cross-linguistic phonological inventory data for more than 2,000 distinct languages.

More digital resources – research data and secondary literature – will be indexed in an automated manner. The indexing of research data will be expanded mostly quantitatively by applying the existing routines in combination with extraction of relevant URLs from mailing lists, search engines, online proceedings, etc. Procedures for automatic extraction of formal metadata and assignment of subject terms (i.e. algorithmic subject indexing) will be developed and applied for the indexing of scientific literature.

The linking of research data to related literature will continue. In future, not only corpora but also resources such as lexical databases, electronic dictionaries, and tools used in corpus analysis will be taken into consideration.

FID Linguistik also plans additional steps in the area of Open Access: on the one hand, the continuation of the hosting service for electronic journals and, on the other hand, increased integration of freely available e-books. For example, a catalogue of linguistically relevant e-books will be established in cooperation with publishers.

Furthermore, FID Linguistik will provide supra-regional licences for selected, highly specialised databases and journals.