<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "JATS-journalpublishing1.dtd">
<article article-type="research-article" xml:lang="EN" xmlns:xlink="http://www.w3.org/1999/xlink">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">LIBER</journal-id>
<journal-title-group>
<journal-title>LIBER QUARTERLY</journal-title>
</journal-title-group>
<issn pub-type="epub">2213-056X</issn>
<publisher>
<publisher-name>openjournals.nl</publisher-name>
<publisher-loc>The Hague, The Netherlands</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">lq.10938</article-id>
<article-id pub-id-type="doi">10.53377/lq.10938</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Words Algorithm Collection &#x2013; Finding Closely Related Open Access Books using Text Mining Techniques</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-9260-4941</contrib-id>
<name>
<surname>Snijder</surname>
<given-names>Ronald</given-names>
</name>
<email>r.snijder@oapen.org</email>
<xref ref-type="aff" rid="aff1"/>
</contrib>
<aff id="aff1">OAPEN Foundation, The Hague, Netherlands</aff>
</contrib-group>
<pub-date pub-type="epub">
<month>8</month>
<year>2021</year>
</pub-date>
<volume>31</volume>
<fpage>1</fpage>
<lpage>22</lpage>
<permissions>
<copyright-statement>Copyright 2021, The copyright of this article remains with the author</copyright-statement>
<copyright-year>2021</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://www.liberquarterly.eu/article/10.53377/lq.10938"/>
<abstract>
<p>Open access platforms and retail websites are both trying to present the most relevant offerings to their patrons. Retail websites deploy recommender systems that collect data about their customers. These systems are successful but intrude on privacy. As an alternative, this paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters. The algorithm splits the full text in trigrams. It removes all trigrams containing words that are commonly used in everyday language and in (open access) book publishing. The most occurring remaining trigrams are distinctive to the publication and indicate the themes of the book. The next step is finding publications that share one or more of the trigrams. The strength of the connection can be measured by counting &#x2013; and ranking &#x2013; the number of shared trigrams. The algorithm was used to find connections between 10,997 titles: 67% in English, 29% in German and 6% in Dutch or a combination of languages. The algorithm is able to find connected books across languages. It is possible to use the algorithm for several use cases, not just recommender systems. Creating benchmarks for publishers or creating a collection of connected titles for libraries are other possibilities. Apart from the OAPEN Library, the algorithm can be applied to other collections of open access books or even open access journal articles. Combining the results across multiple collections will enhance its effectiveness.</p>
</abstract>
<kwd-group>
<kwd>Open access</kwd>
<kwd>recommendations</kwd>
<kwd>books</kwd>
<kwd>algorithms</kwd>
<kwd>text mining</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<title>1. Introduction</title>
<p>Open access platforms and retail websites have one thing in common: they are trying to present the most relevant offerings possible to their patrons. Retail websites &#x2013; such as Amazon.com &#x2013; deploy recommender systems based on data collected about their customers. These systems improve with the amount of data available: the more is known about the customers, the better it can predict what other merchandise will appeal.</p>
<p>For open access platforms, this is not a viable solution. First, these platforms are designed to lower as many barriers as possible to make sure that the largest group of people have access to the publications. Forcing people to identify themselves and tracking their actions on the website is a serious barrier. Second, and more importantly, protecting privacy is an important principle in the library community which is at the very least overlapping with the open access community.</p>
<p>Recommender systems are successful but using open access platforms to track people is not acceptable. Therefore, a different solution is needed. Compared to retail websites, open access platforms have a unique advantage: they are able to use the complete contents of the publications they host. So, the question arises if it is possible to create a recommender system based on the contents of freely available documents, instead of personal data. This paper presents an algorithm that uses text mining techniques to find the most important themes of an open access book or chapter. By locating other publications that share one or more of these themes, it is possible to recommend closely related books or chapters.</p>
<p>The algorithm splits the full text of the book or chapter in sets of three consecutive words: trigrams. Then it removes all trigrams containing words that are commonly used in everyday language and the trigrams containing terms that are commonly used in (open access) book publishing. When a trigram contains a word &#x2013; or multiple words &#x2013; that is commonly used, the whole trigram is discarded.</p>
<p><xref ref-type="fig" rid="fg001">Figure 1</xref> illustrates this using a simple sentence: &#x201C;The quick brown fox jumps over the lazy dog&#x201D;. Converting the sentence to trigrams results in seven sets of three words, the trigrams. Removing all trigrams that contain commonly used words brings the remaining number back to two. Deploying this procedure to the complete text of a book still creates a large set of trigrams, hence the need for additional filtering using terms that are common for open access academic books.</p>
<fig id="fg001">
<label>Fig. 1:</label>
<caption><p>Trigrams example.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig1.jpg"/>
</fig>
<p>The remaining trigrams are distinctive to the book or chapter and selecting the most occurring of those trigrams indicates the concepts the author of this title is discussing. The next step is finding publications that share one or more of the trigrams; the more trigrams they share, the closer the connection between them. The strength of the connection can be measured by simply counting the number of shared trigrams.</p>
<p>In contrast to black box technologies such as machine learning<xref ref-type="fn" rid="fn1">1</xref>, the algorithm is completely transparent. Every term used is open to scrutiny and can be updated. Furthermore, the algorithm is tool agnostic: it is not tied to a specific coding environment.</p>
<p>The solution described in this paper is based on standard open-source software. It is built using a combination of DSpace 6 and the R programming language. The open access platform &#x2013; based on DSpace 6 &#x2013; is the OAPEN Library; the data set used consists of nearly 11,000 open access books and chapters. The OAPEN Library enables data extraction through an API (application programming interface). A text mining algorithm written in the R programming language uses the full text of the publications, filters out the trigrams and creates an overview of closely related books and chapters.</p>
<p>Different users may have different needs: a reader might be interested in finding a few select titles, while a library might want to download a larger collection of books around a certain topic. These use cases are discussed in section 4.4.</p>
</sec>
<sec id="s2">
<title>2. Background</title>
<p>As mentioned in the previous section, the set of publications is provided by the OAPEN Library. The OAPEN Library is a platform &#x2013; launched in 2010 &#x2013; hosting open access books and chapters. It is managed by the OAPEN Foundation<xref ref-type="fn" rid="fn2">2</xref>. In June 2021, the collection consists of over 17,000 titles. This background section discusses privacy in libraries, recommender systems, ngrams and previous experiments run on the OAPEN collection.</p>
<sec id="s2a">
<title>2.1. Libraries and Privacy</title>
<p>Libraries &#x2013; whether physical or online &#x2013; have been protecting the privacy of their patrons for quite some time; for instance by the American Library Association (ALA) Code of Ethics in 1938 (<xref ref-type="bibr" rid="r42">Witt, 2017</xref>). This position is shared among the <xref ref-type="bibr" rid="r14">International Federation of Library Associations and Institutions (2016)</xref>, the <xref ref-type="bibr" rid="r5">American Library Association (2014)</xref> and several other national library associations. Privacy in libraries is associated with protection from unwanted government attention (<xref ref-type="bibr" rid="r15">Jaeger et al., 2004</xref>), but also from commercial organisations (<xref ref-type="bibr" rid="r6">Corrado, 2007</xref>; <xref ref-type="bibr" rid="r20">Maceli, 2018</xref>).</p>
</sec>
<sec id="s2b">
<title>2.2. Recommender Systems</title>
<p>Recommender systems are used to provide suggestions about items that are valuable to a person. While there are several techniques for building recommender systems, most are based on the same principle: create a profile of the user and her peers, extend this as much as possible and update it over time. This enables the system to know the preferences of the user and thus predict other items (<xref ref-type="bibr" rid="r26">Pazzani &#x0026; Billsus, 2007</xref>; <xref ref-type="bibr" rid="r29">Ricci et al., 2011</xref>; <xref ref-type="bibr" rid="r32">Schafer et al., 1999</xref>). <xref ref-type="bibr" rid="r19">Linden et al. (2003)</xref> and <xref ref-type="bibr" rid="r35">Smith and Linden (2017</xref>) discuss their experiences at Amazon, spanning two decades.</p>
<p>For those who do not feel comfortable with the lack of privacy in connection to these type of systems, <xref ref-type="bibr" rid="r16">Jeckmans et al. (2013)</xref> have listed countermeasures. These include raising awareness about privacy issues and invoking specific laws dealing with personal information. As it might take quite some time before this will take effect, the authors also describe technical measures such as anonymisation, randomisation and the use of cryptography.</p>
<p>Instead of recommending titles based on personal data, here the contents of the titles will be used. The texts of the books and chapters are analysed using ngrams.</p>
</sec>
<sec id="s2c">
<title>2.3. Ngrams</title>
<p>Ngrams are based on the relationships between words, either by examining which words tend to follow others immediately, or by looking at words that co-occur within the same documents. Two consecutive words are called &#x201C;bigrams&#x201D;, three consecutive words are called &#x201C;trigrams&#x201D;. Naturally, the number of trigrams in a text is lower compared to bigrams, while the trigrams are more specific. As we are examining a large text corpus &#x2013; the text of almost 11,000 books and chapters &#x2013; the total number of possible trigrams is still large.</p>
<p>Ngrams are used in different types of research. One application is document clustering: creating related groups of documents. Each document is represented by a numerical value. The k-means algorithm is typically used to calculate the distance between the documents and a &#x2018;cluster means&#x2019;; the goal is to all documents in clusters with the smallest numerical distance (<xref ref-type="bibr" rid="r24">Miao et al., 2005</xref>). Furthermore, the authors looked at the performance of several types of ngrams &#x2013; ranging from bigrams to 5-grams &#x2013; used in document clustering. They conclude that trigrams are roughly as accurate as 4-grams and 5-grams but are more economical in their resource usage.</p>
<p>Apart from clustering documents, ngrams are also deployed for author attribution. This technique aims to find the characteristics of a writer&#x2019;s style and use that to define whether a certain text is written by that author. Here, the ngrams are not based on clusters of words, but clusters of characters (<xref ref-type="bibr" rid="r17">Ke&#x0161;elj et al., 2003</xref>). <xref ref-type="bibr" rid="r7">Eve (2019</xref>) is critical of the application of this technique to identify authors and uses it to distinguish literary genres instead.</p>
<p>The best-known use of ngrams is probably the Google Books Ngram Viewer. This vast corpus of books is used for cultural research. The most cited example is written by <xref ref-type="bibr" rid="r25">Michel et al. (2011)</xref>. In this paper, the authors examine the change of language over time, but also cultural changes: the rise and demise of the celebrity of certain persons and suppression of ideas over time. This is far from the only paper based on the Google Books Ngram Viewer: a recent search on this subject in the Google Scholar search engine resulted in over 3,700 titles<xref ref-type="fn" rid="fn3">3</xref>.</p>
<p>The experiment of this paper does not quite fit within these three research types. It is clearly not meant to discover long term trends, in the manner of the Google Books research. Finding authors is also not necessary: this information is provided in the metadata of the OAPEN Library. The k-means algorithm is a more general-purpose application, aimed to be useful in various situations.</p>
<p>The most closely connected experiments are those aiming to extract keywords from an article or book (<xref ref-type="bibr" rid="r31">Rohini &#x0026; Ambati, 2007</xref>; <xref ref-type="bibr" rid="r39">Souza &#x0026; Raghavan, 2014</xref>). The authors describe the use of statistical methods to find distinctive words. However, the text corpora used are small and no attempt is made to connect multiple titles.</p>
<p>The algorithm used in this experiment is optimised for a very specific purpose: instead of creating amorph groups, it aims to find exact relations for each individual title. These relations &#x2013; based on the number of shared trigrams &#x2013; are ranked. The ranking and the number of shared trigrams can be used to create services for digital libraries with an open access collection. This algorithm is not general-purpose but optimised for one specific environment.</p>
</sec>
<sec id="s2d">
<title>2.4. Other Experiments</title>
<p>Several other experiments have been conducted on the OAPEN Library collection: creating groups of books based on usage data (<xref ref-type="bibr" rid="r37">Snijder, 2019</xref>) or categorising titles based on Wikipedia pages (<xref ref-type="bibr" rid="r38">Snijder, 2021</xref>). In the first experiment, the download patterns are analysed to find which books are regularly selected together. So, instead of looking at individual preferences, social network analysis was deployed to find the preferences of groups of people. The more recent investigation aimed to categorize books by automatically finding the Wikipedia pages that describe their contents.</p>
<p>Grouping books based on usage data has drawbacks: apart from the reliance on external usage data, the results need to be interpreted. The interpretation depends on analysing aspects of the books and the users. This cannot be automated, making it hard to upscale, and the analysis might be open to bias. Furthermore, using data captured on different time periods lead to different results.</p>
<p>Another way to discover similar titles is by adding standardised metadata. Most libraries use a classification for this purpose, which is standardised but rigid. Another option is using uncontrolled keywords that are flexible but lack standardisation. Wikipedia was used as &#x2018;middle ground&#x2019;: a standardised but very broad set of keywords. Adding Wikipedia pages to book records in the OAPEN Library is also reliant on external data, which must be provided by separate service. Furthermore, manual &#x2018;culling&#x2019; of the results was necessary.</p>
<p>Both methods cannot be implemented completely automatically, rely on external services and need extra effort to scale up. This makes them less desirable for production. The solution described in this paper does not rely on external services but uses the strength of open access publishing: direct access to the contents of the documents.</p>
</sec>
</sec>
<sec id="s3">
<title>3. Finding Related Titles by Algorithm</title>
<p>This section describes the algorithm used and the data set. The text mining techniques deployed are built using the work by <xref ref-type="bibr" rid="r33">Silge and Robinson (2016</xref>, <xref ref-type="bibr" rid="r34">2017</xref>). The authors created a set of tools (&#x201C;package&#x201D;) in the programming language R (<xref ref-type="bibr" rid="r28">R Core Team, 2020</xref>) aimed to simplify text mining. The R package creates the trigrams, which are manipulated to find the related documents.</p>
<sec id="s3a">
<title>3.1. The Algorithm</title>
<p>Our goal is to find relevant open access titles, when a book or chapter has been selected. Relevant titles discuss the same concept or concepts that are closely connected. The algorithm is based on two assumptions: 1. The terms describing the themes of the title are frequently occurring in the text; 2. Books and chapters on the same subject use similar terms. In other words: if titles share relevant terms, they are connected. The number of shared relevant terms is an indication of the strength of the connection. <xref ref-type="fig" rid="fg002">Figure 2</xref> displays the complete algorithm.</p>
<fig id="fg002">
<label>Fig. 2:</label>
<caption><p>The algorithm as flow chart.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig2.jpg"/>
</fig>
<p>The next question is what terms to use. In this experiment, the terms are sets of three words &#x2013; trigrams. In a text, the number of trigrams is relatively small &#x2013; compared to bigrams &#x2013; while they are more specific. This leads to a more &#x2018;workable&#x2019; set of possible items. However, not all trigrams are relevant for our purpose, and therefore it is important to filter out the ones that are not needed.</p>
<p>The first set of trigrams to discard contains words that are too common: stop words. Examples are &#x201C;a&#x201D;; &#x201C;able&#x201D;; &#x201C;about&#x201D;; &#x201C;above&#x201D;, and almost 1,200 more words for the English language. Comparable sets of stop words for German and Dutch were also deployed.</p>
<p>The next set to filter out is trigrams that contain parts of words. When the contents of the books are converted to text, hyphens are converted to spaces, leading to trigrams such as &#x201C;diff ere nt&#x201D;, &#x201C;inso fe rn&#x201D; or &#x201C;werkge legenhe id&#x201D;. These are not three words, but just one.</p>
<p>Furthermore, trigrams that are specific to open access publishing or academic writing are discarded. These are descriptions of Creative Commons licenses, or terms that are quite common in academic books, but are meaningless in themselves, such as &#x201C;pdf letzter zugriff&#x201D;, &#x201C;pdf zuletzt gepr&#x00FC;ft&#x201D;, phd diss university&#x201D; or &#x201C;phd thesis university&#x201D;.</p>
<p>Also, the part of references that only contain the publisher&#x2019;s name are filtered out. For instance, the trigram &#x201C;manchester university press&#x201D; does not convey which title is cited. As Manchester University Press has published hundreds of titles on many different subjects, linking books using this term does not describe any subject related connection. Of course, this also applies many other academic publishers.</p>
<p>It is important to note that the terms to be excluded are a clearly visible part of the algorithm. This ensures maximum transparency: each person working with the algorithm has direct access to the &#x2018;filtering terms&#x2019; and might choose to update them.</p>
</sec>
<sec id="s3b">
<title>3.2. The Data Set</title>
<p>At the start, 12,224 titles in the OAPEN Library were selected. The selection was based on one criterium: language. The books and chapters were published in English, German, Dutch or a combination of these languages. Choosing these three languages was pragmatic: over 90 percent of the OAPEN Library collection is published in either English, German or Dutch, ensuring a sizable set of titles to analyse. Having a data set spanning multiple languages also enables possible connections between books in several languages. In one of the examples in section 4.2, we will find two closely connected books: an English language translation of a German book.</p>
<p>The first phase of data gathering was an attempt to download the full text of the titles. From each text, the most relevant trigrams were selected and lastly, for each title was determined if it could be matched to one of more other books or chapters. During this process, some texts could not be extracted, or no matching title could be found. This led to a dropout rate of around 10 percent, resulting in the 10,997 titles of the data set.</p>
<p>The data set is dominated by books (see <xref ref-type="table" rid="tb001">Table 1</xref>); only 4% are chapters. Within the data set, English and German stand out, with a small percentage &#x2013; 6% &#x2013; of titles in Dutch or in multiple languages.</p>
<table-wrap id="tb001" position="float" orientation="portrait">
<label>Table 1:</label>
<caption><p>Publication types and languages.</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top">Publication type</th>
<th align="left" valign="top">Language</th>
<th align="left" valign="top">Amount</th>
<th align="left" valign="top">Percentage</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Book</td>
<td align="left" valign="top">English</td>
<td align="left" valign="top">6,736</td>
<td align="left" valign="top">61%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">German</td>
<td align="left" valign="top">3,155</td>
<td align="left" valign="top">29%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">Dutch</td>
<td align="left" valign="top">521</td>
<td align="left" valign="top">5%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">Multiple languages</td>
<td align="left" valign="top">97</td>
<td align="left" valign="top">1%</td>
</tr>
<tr>
<td align="left" valign="top">Total book</td>
<td align="left" valign="top"/>
<td align="left" valign="top"><bold>10,509</bold></td>
<td align="left" valign="top"><bold>96%</bold></td>
</tr>
<tr>
<td align="left" valign="top">Chapter</td>
<td align="left" valign="top">English</td>
<td align="left" valign="top">467</td>
<td align="left" valign="top">4%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">Dutch</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">0%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">German</td>
<td align="left" valign="top">9</td>
<td align="left" valign="top">0%</td>
</tr>
<tr>
<td align="left" valign="top"/>
<td align="left" valign="top">Multiple languages</td>
<td align="left" valign="top">3</td>
<td align="left" valign="top">0%</td>
</tr>
<tr>
<td align="left" valign="top">Total chapter</td>
<td align="left" valign="top"/>
<td align="left" valign="top"><bold>488</bold></td>
<td align="left" valign="top"><bold>4%</bold></td>
</tr>
<tr>
<td align="left" valign="top">Total</td>
<td align="left" valign="top"/>
<td align="left" valign="top"><bold>10,997</bold></td>
<td align="left" valign="top"><bold>100%</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Each book or chapter in the data set is connected to one or more titles. The majority of the titles &#x2013; over 7,000 &#x2013; are closely related to 50 titles or less. Another 1,986 are connected to 100 titles or less. When the largest group is subdivided, it becomes clear that 4,498 books or chapters are closely connected to 20 titles of less. In other words, 40% of the titles.</p>
<p>Each title shares one or more trigrams with another publication. As is clearly visible in the histogram (<xref ref-type="fig" rid="fg003">Figure 3</xref>), most books and chapters are connected to 21 titles or more. Most of these connections vary in the number of shared trigrams. The number of shared trigrams is an indication of the strength of the connection: a higher number indicates a stronger connection.</p>
<fig id="fg003">
<label>Fig. 3:</label>
<caption><p>Histogram, detailed.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig3.jpg"/></fig>
<p>These connections could be ranked. For instance, if a book is connected to 25 books &#x2013; two books with three shared trigrams, five books with two shared trigrams and the rest with one shared trigram &#x2013; these could be ranked first, second and third. However, we could also imagine several books that share a higher number of trigrams, where the first ranked titles share ten trigrams, the next six etc. Thus, the connections between the publications can be ranked, and their relative strength can be measured. This enables us to make specific selections, based on these parameters. The next section describes some examples.</p>
</sec>
</sec>
<sec id="s4">
<title>4. Finding Connected Titles</title>
<p>Using the data about the relative strength of the connections, it is possible to select publications based on several options. The first example consists of the titles connected to a single book. This could be used for recommender systems, showing a few closely connected titles to a book. After that, we will explore other possibilities, based on groups of publications.</p>
<sec id="s4a">
<title>4.1. Single Book</title>
<p>This example is based on the book &#x201C;Complexity, Security and Civil Society in East Asia&#x201D; (<xref ref-type="bibr" rid="r9">Hayes &#x0026; Yi, 2015</xref>), which discusses complex global problems such as urban insecurity, energy, and climate change. It shares three trigrams with four titles, one of them is the book &#x201C;Loss and Damage from Climate Change&#x201D; (<xref ref-type="bibr" rid="r23">Mechler et al., 2019</xref>). These four titles are part of the first rank. Moving on to the second rank, there are 29 tiles. One of those titles is &#x201C;Louisiana&#x2019;s response to extreme weather&#x201D; (<xref ref-type="bibr" rid="r18">Laska, 2020</xref>). Furthermore, it shares one trigram with 217 titles, among them the book &#x201C;Sustainable rice straw management&#x201D; (<xref ref-type="bibr" rid="r8">Gummert et al., 2020</xref>); here the connection with insecurity and climate change seems weaker. However, the trigram both books share is &#x201C;greenhouse gas emissions&#x201D;.</p>
<p><xref ref-type="fig" rid="fg004">Figure 4</xref> displays the three ranks:</p>
<fig id="fg004">
<label>Fig. 4:</label>
<caption><p>Book with related titles, ranked.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig4.jpg"/></fig>
<p>When looking at the connection between this book and the closely related books, the first question is what trigrams they share. These are listed in <xref ref-type="table" rid="tb002">Table 2</xref>, <italic>Shared trigrams</italic>. The common theme connecting these books is quite clear: global warming and its effects.</p>
<table-wrap id="tb002" position="float" orientation="portrait">
<label>Table 2:</label>
<caption><p>Shared trigrams.</p></caption>
<table>
<thead>
<tr>
<th rowspan="2" align="left" valign="top">Trigram</th>
<th colspan="3" align="left" valign="top">Title<hr/></th>
</tr>
<tr>
<th align="left" valign="top">Loss and damage from climate change</th>
<th align="left" valign="top">Louisiana&#x2019;s response to extreme weather</th>
<th align="left" valign="top">Sustainable rice straw management</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">greenhouse gas</td>
<td align="left" valign="top">X</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">X</td>
</tr>
<tr>
<td align="left" valign="top">emissions climate change</td>
<td align="left" valign="top">X</td>
<td align="left" valign="top">X</td>
<td align="left" valign="top">-</td>
</tr>
<tr>
<td align="left" valign="top">adaptation sea level rise</td>
<td align="left" valign="top">X</td>
<td align="left" valign="top">X</td>
<td align="left" valign="top">-</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>However, the book &#x201C;Complexity, Security and Civil Society in East Asia&#x201D; does not only focus on climate change, and the trigrams reflect that. The most common trigrams are &#x201C;civil society organizations&#x201D; (occurring 78 times); &#x201C;rok foreign policy&#x201D; (occurring 57 times) and &#x201C;world economic forum&#x201D; (occurring 43 times). The first &#x2018;shared&#x2019; trigram is &#x201C;greenhouse gas emissions&#x201D;, which is mentioned 29 times. The term &#x201C;climate change adaption&#x201D; is mentioned 20 times &#x2013; the almost identical trigram &#x201C;climate change mitigation&#x201D; was counted 17 times. Lastly, &#x201C;sea level rise&#x201D; could be found 14 times.</p>
<p>It is also interesting to look at which trigrams could not be linked. Several of them are related to policy making, which became clear from the top three trigrams and several mentions of the Nautilus Institute for Security and Sustainability, a public policy think-tank. Furthermore, nuclear energy and energy security are also mentioned in several trigrams. The complete list can be found in section Appendix.</p>
</sec>
<sec id="s4b">
<title>4.2. Groups</title>
<p>The previous section showed the titles related to one book. Another possibility is to examine groups of publications and their relations. What books are closely connected, and does their relative &#x2018;distance&#x2019; display subtopics within a larger collection&#x003F; <xref ref-type="fig" rid="fg005">Figure 5</xref> shows a selection of books that share three trigrams or more. Each of the groups consists of closely connected books and chapters.</p>
<fig id="fg005">
<label>Fig. 5:</label>
<caption><p>Groups of books sharing three or more trigrams.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig5.jpg"/></fig>
<p>Randomly selecting titles based on the number of shared connections does not lead to very useful results. Starting with one book, it makes sense to search for related titles. In order to find more relevant results for groups, it is necessary to use additional metadata. In this case, the metadata of the OAPEN Library.</p>
<p>Using the metadata of the OAPEN Library enables us to search using several characteristics. In the next example, <xref ref-type="fig" rid="fg006">Figure 6</xref> displays books published by Language Science Press. This publisher specialises in linguistics and all titles are part of a series; the colour of the cover denotes a series which helps to visualise the relations further. For instance, the green covers are part of the series &#x201C;Studies in diversity linguistics&#x201D;, and the dark blue covers indicate the &#x201C;Computational models of language evolution&#x201D; series. Moreover, the thickness of the connecting line is an indication of the number of shared trigrams.</p>
<fig id="fg006">
<label>Fig. 6:</label>
<caption><p>Connected books, published by Language Science Press.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig6.jpg"/></fig>
<p>Instead of focussing on a single publisher, we could also look at the open access titles that received financial support from the same funder. If the funder has an underlying policy regarding the titles &#x2013; see for instance <xref ref-type="bibr" rid="r30">Rieck (2019)</xref> &#x2013; is that reflected in the publications&#x003F; <xref ref-type="fig" rid="fg007">Figure 7</xref> displays books funded by the Austrian Science Fund (FWF). Here, several smaller groups of closely connected books are noticeable.</p>
<fig id="fg007">
<label>Fig. 7:</label>
<caption><p>Connected books, funded by FWF.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/LIBER_2021_31_Snijder_fig7.jpg"/></fig>
<p>Furthermore, the two titles in the bottom right are translations: &#x201C;Revolution and transition : Cultural policy in Bulgaria, 1989&#x2013;2012&#x201D; (<xref ref-type="bibr" rid="r3">Alexandrov, 2017a</xref>) and &#x201C;Wende und &#x00DC;bergang : Die Kulturpolitik Bulgariens, 1989&#x2013;2012&#x201D; (<xref ref-type="bibr" rid="r4">Alexandrov, 2017b</xref>). The algorithm is capable of connecting books across languages. More on translations in the next section.</p>
<p>The graphics in this section were created using NodeXL (<xref ref-type="bibr" rid="r36">Smith et al., 2010</xref>). The data set and the algorithm in the R language is available at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17026/dans-xbm-qr5e">https://doi.org/10.17026/dans-xbm-qr5e</ext-link>.</p>
</sec>
<sec id="s4c">
<title>4.3. Finding Translations</title>
<p>The connection between the two translated books in the set of FWF funded titles is not a coincidence. Within the data set, at least 15 &#x201C;translated couples&#x201D; could be found. This might seem counterintuitive: the algorithm is based on finding exact trigrams, and one would expect translations to use different words to describe the same concepts. However, the analysis of several sets of translated books that share nine or more trigrams shows they often share English language terms, such as &#x201C;adaptive cruise control&#x201D; (<xref ref-type="bibr" rid="r21">Maurer et al., 2015</xref>, <xref ref-type="bibr" rid="r22">2016</xref>); &#x201C;labour force survey&#x201D; (<xref ref-type="bibr" rid="r12">Holtslag et al., 2012</xref>, <xref ref-type="bibr" rid="r13">2013</xref>) or &#x201C;deep packet inspection&#x201D; (<xref ref-type="bibr" rid="r40">Sprenger, 2015a</xref>, <xref ref-type="bibr" rid="r41">2015b</xref>). Nevertheless, the shared terms do not have to be restricted to English, such as &#x201C;graf leo thun&#x201D; (<xref ref-type="bibr" rid="r1">Aichner &#x0026; Mazohl, 2017a</xref>, <xref ref-type="bibr" rid="r2">2017b</xref>). Additionally, web addresses also function as a language agnostic identifier. See for instance &#x201C;<ext-link ext-link-type="uri" xlink:href="http://www.siebenbuerger.de">http://www.siebenbuerger.de</ext-link> zeitung&#x201D; (<xref ref-type="bibr" rid="r10">Hermanik, 2016a</xref>, <xref ref-type="bibr" rid="r11">2016b</xref>) or &#x201C;<ext-link ext-link-type="uri" xlink:href="http://www.minfin.bg">http://www.minfin.bg</ext-link> bg&#x201D; (<xref ref-type="bibr" rid="r3">Alexandrov, 2017a</xref>, <xref ref-type="bibr" rid="r4">2017b</xref>).</p>
</sec>
<sec id="s4d">
<title>4.4. Use Cases</title>
<p>The previous sections described some of the possible applications of the trigram algorithm, based on a single books or groups of titles. What are possible use cases for the stakeholders involved&#x003F; The first use case is based on the connections surrounding a single title. As discussed in the introduction, this can be used to create a recommender system. For each title, the recommender system might display titles ranked first to third. The selection could also be refined by the number of titles: in the example of section 4.1, the number of third ranked titles linked to the book is 217, which is possibly too much for a single recommendation.</p>
<p>Creating benchmarks for publishers would be another use case. Here, the goal is comparing usage data of a set of comparable titles to a publication. By selecting all connected titles and collecting usage data it is possible to establish the average usage for this particular publication. This can be used as benchmark. Again, the number of titles to include can be varied by selecting only higher ranked titles.</p>
<p>Libraries might be interested in creating a collection of connected titles. Using the metadata such as keywords or classification creates a core set of titles, which can be expanded by selecting connected titles. Once more, the differences in ranking help to determine the extensiveness of the collection. A similar approach could also be used by researchers, looking for related titles to be used for citation or usage analysis.</p>
</sec>
</sec>
<sec id="s5">
<title>5. Conclusion</title>
<p>Recommender systems based on personal data are successful but are not a viable option for those who want to protect the privacy of their users. Deploying a ngrams based algorithm is a good alternative for open access books, as it uses the contents of the publications. The algorithm quantifies the connections between the titles, which makes it easy to select a level of connectivity. The results can be used in several scenarios: recommendations for a single title or creating collections based on several conditions.</p>
<p>The use of trigrams and the algorithm to find related titles does not have to be confined to the OAPEN Library. The same method can be applied to other collections of open access books or even open access journal articles. By combining the trigrams and searching for matching titles, the algorithm helps to find relevant titles across multiple collections, enhancing its effectiveness.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgments</title>
<p>The author would like to thank the colleagues of the OAPEN Foundation and Professor Martin Paul Eve of Birkbeck, University of London for commenting on previous versions of this paper. The data of this paper is publicly available through the support of Data Archiving and Networked Services (DANS).</p>
</ack>
<ref-list>
<title>References</title>
<ref id="r1"><mixed-citation>Aichner, C., &#x0026; Mazohl, B. (2017a). <italic>Die Thun-Hohenstein&#x2019;sche Universit&#x00E4;tsreformen 1849&#x2013;1860: Konzeption &#x2013; Umsetzung &#x2013; Nachwirkungen</italic>. B&#x00F6;hlau. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/31673">https://library.oapen.org/handle/20.500.12657/31673</ext-link></mixed-citation></ref>
<ref id="r2"><mixed-citation>Aichner, C., &#x0026; Mazohl, B. (2017b). <italic>The Thun-Hohenstein University reforms 1849&#x2013;1860: Conception &#x2013; Implementation &#x2013; Aftermath</italic>. B&#x00F6;hlau. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/31171">https://library.oapen.org/handle/20.500.12657/31171</ext-link></mixed-citation></ref>
<ref id="r3"><mixed-citation>Alexandrov, A. (2017a). <italic>Revolution and transition: Cultural policy in Bulgaria, 1989-2012</italic>. LIT Verlag GmbH &#x0026; Co. KG. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/31422">https://library.oapen.org/handle/20.500.12657/31422</ext-link></mixed-citation></ref>
<ref id="r4"><mixed-citation>Alexandrov, A. (2017b). <italic>Wende und &#x00DC;bergang: Die Kulturpolitik Bulgariens, 1989-2012</italic>. LIT Verlag GmbH &#x0026; Co. KG. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/31423">https://library.oapen.org/handle/20.500.12657/31423</ext-link></mixed-citation></ref>
<ref id="r5"><mixed-citation>American Library Association. (2014). <italic>Privacy: An interpretation of the Library Bill of Rights</italic>. <ext-link ext-link-type="uri" xlink:href="http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy">http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy</ext-link></mixed-citation></ref>
<ref id="r6"><mixed-citation>Corrado, E. M. (2007). Privacy and Library 2.0: How do they conflict&#x003F; <italic>Ailing into the Future: Charting our destiny: Proceedings of the Thirteenth National Conference of the Association of College and Research Libraries</italic>. <ext-link ext-link-type="uri" xlink:href="https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/national/baltimore/papers/330.pdf">https://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconfs/national/baltimore/papers/330.pdf</ext-link></mixed-citation></ref>
<ref id="r7"><mixed-citation>Eve, M. P. (2019). Reading genre computationally. In M. P. Eve (Ed.), <italic>Close Reading with Computers: Textual scholarship, computational formalism, and David Mitchell&#x2019;s Cloud Atlas</italic> (pp. 61&#x2013;95). Stanford University Press. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.21627/9781503609372">https://doi.org/10.21627/9781503609372</ext-link></mixed-citation></ref>
<ref id="r8"><mixed-citation>Gummert, M., Hung, N. V., Chivenge, P., &#x0026; Douthwaite, B. (2020). <italic>Sustainable rice straw management</italic>. Springer Nature. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-030-32373-8">https://doi.org/10.1007/978-3-030-32373-8</ext-link></mixed-citation></ref>
<ref id="r9"><mixed-citation>Hayes, P., &#x0026; Yi, K. (2015). <italic>Complexity, Security and C</italic>t<italic>ivil Society in East Asia: Foreign Policies and the Korean Peninsula</italic>. Open Book Publishers. <ext-link ext-link-type="uri" xlink:href="https://www.openbookpublishers.com/product/326">https://www.openbookpublishers.com/product/326</ext-link></mixed-citation></ref>
<ref id="r10"><mixed-citation>Hermanik, K.-J. (2016a). <italic>Deutsche und Ungarn im s&#x00FC;d&#x00F6;stlichen Europa: Identit&#x00E4;ts- und Ethnomanagement</italic>. B&#x00F6;hlau. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/31956">https://library.oapen.org/handle/20.500.12657/31956</ext-link></mixed-citation></ref>
<ref id="r11"><mixed-citation>Hermanik, K.-J. (2016b). <italic>Germans and Hungarians in Southeast Europe: Identity management and ethnomanagement</italic>. B&#x00F6;hlau. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/29393">https://library.oapen.org/handle/20.500.12657/29393</ext-link></mixed-citation></ref>
<ref id="r12"><mixed-citation>Holtslag, J. W., Kremer, M., &#x0026; Schrijvers, E. (2012). <italic>In betere banen</italic>. WRR. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.26530/OAPEN_440005">https://doi.org/10.26530/OAPEN_440005</ext-link></mixed-citation></ref>
<ref id="r13"><mixed-citation>Holtslag, J. W., Kremer, M., &#x0026; Schrijvers, E. (2013). <italic>Making migration work: The future of labour migration in the European Union</italic>. Amsterdam University Press. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/33887">https://library.oapen.org/handle/20.500.12657/33887</ext-link></mixed-citation></ref>
<ref id="r14"><mixed-citation>International Federation of Library Associations and Institutions. (2016). <italic>IFLA Code of ethics for librarians and other information workers (full version)</italic>. <ext-link ext-link-type="uri" xlink:href="http://www.ifla.org/publications/node/11092#privacy">http://www.ifla.org/publications/node/11092#privacy</ext-link></mixed-citation></ref>
<ref id="r15"><mixed-citation>Jaeger, P. T., McClure, C. R., Bertot, J. C., &#x0026; Snead, J. T. (2004). The USA PATRIOT act, the foreign intelligence surveillance act, and information policy research in libraries: Issues, impacts, and questions for libraries and researchers. <italic>The Library Quarterly</italic>, <italic>74</italic>(2), 99&#x2013;121. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1086/382843">https://doi.org/10.1086/382843</ext-link></mixed-citation></ref>
<ref id="r16"><mixed-citation>Jeckmans, A. J. P., Beye, M., Erkin, Z., Hartel, P., Lagendijk, R. L., &#x0026; Tang, Q. (2013). Privacy in recommender systems. In N. Ramzan., R. van Zwol., J.-S. Lee., K. Clover., &#x0026; X.-S. Hua (Eds.), <italic>Social media retrieval</italic> (pp. 263&#x2013;281). Springer. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-1-4471-4555-4_12">https://doi.org/10.1007/978-1-4471-4555-4_12</ext-link></mixed-citation></ref>
<ref id="r17"><mixed-citation>Ke&#x0161;elj, V., Peng, F., Cercone, N., &#x0026; Thomas, C. (2003). N-gram-based author profiles for authorship attribution. In <italic>Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING</italic>, <italic>3</italic> (pp. 255&#x2013;264). <ext-link ext-link-type="uri" xlink:href="https://web.cs.dal.ca/&#x223C;vlado/papers/pacling03.pdf">https://web.cs.dal.ca/&#x223C;vlado/papers/pacling03.pdf</ext-link></mixed-citation></ref>
<ref id="r18"><mixed-citation>Laska, S. (2020). <italic>Louisiana&#x2019;s response to extreme weather: A coastal state&#x2019;s adaptation challenges and successes</italic>. Springer Nature. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-030-27205-0">https://doi.org/10.1007/978-3-030-27205-0</ext-link></mixed-citation></ref>
<ref id="r19"><mixed-citation>Linden, G., Smith, B., &#x0026; York, J. (2003). Amazon.com recommendations: Item-to-item collaborative filtering. <italic>IEEE Internet Computing</italic>, <italic>7</italic>(1), 76&#x2013;80. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/MIC.2003.1167344">https://doi.org/10.1109/MIC.2003.1167344</ext-link></mixed-citation></ref>
<ref id="r20"><mixed-citation>Maceli, M. G. (2018). Encouraging patron adoption of privacy-protection technologies: Challenges for public libraries. <italic>IFLA Journal</italic>, <italic>44</italic>(3), 195&#x2013;202. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1177/0340035218773786">https://doi.org/10.1177/0340035218773786</ext-link></mixed-citation></ref>
<ref id="r21"><mixed-citation>Maurer, M., Gerdes, J. C., Lenz, B., &#x0026; Winner, H. (2015). <italic>Autonomes Fahren: Technische, rechtliche und gesellschaftliche Aspekte</italic>. Springer Nature. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-662-45854-9">https://doi.org/10.1007/978-3-662-45854-9</ext-link></mixed-citation></ref>
<ref id="r22"><mixed-citation>Maurer, M., Gerdes, J. C., Lenz, B., &#x0026; Winner, H. (2016). <italic>Autonomous driving: Technical, legal and social aspects</italic>. Springer Nature. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-662-48847-8">https://doi.org/10.1007/978-3-662-48847-8</ext-link></mixed-citation></ref>
<ref id="r23"><mixed-citation>Mechler, R., Bouwer, L. M., Schinko, T., Surminski, S., &#x0026; Linnerooth-Bayer, J. (2019). <italic>Loss and damage from climate change: Concepts, methods and policy options</italic>. Springer Nature. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/23027">https://library.oapen.org/handle/20.500.12657/23027</ext-link></mixed-citation></ref>
<ref id="r24"><mixed-citation>Miao, Y., Ke&#x0161;elj, V., &#x0026; Milios, E. (2005). Document clustering using character N-grams: A comparative evaluation with term-based and word-based clustering. In <italic>Proceedings of the 14th ACM International Conference on Information and Knowledge Management</italic> (pp. 357&#x2013;358). <ext-link ext-link-type="uri" xlink:href="http://www.ezcodesample.com/SemanticSearchArt/downloads/CS-2005-23.pdf">http://www.ezcodesample.com/SemanticSearchArt/downloads/CS-2005-23.pdf</ext-link></mixed-citation></ref>
<ref id="r25"><mixed-citation>Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Google Books Team, Pickett, J. P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. A., &#x0026; Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. <italic>Science, 331</italic>(6014), 176&#x2013;182. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1126/science.1199644">https://doi.org/10.1126/science.1199644</ext-link></mixed-citation></ref>
<ref id="r26"><mixed-citation>Pazzani, M., &#x0026; Billsus, D. (2007). Content-based recommendation systems. In P. Brusilovsky., A. Kobsa., &#x0026; W. Nejdl (Eds.), <italic>The Adaptive Web</italic> (pp. 325&#x2013;341). Springer. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-3-540-72079-9_10">https://doi.org/10.1007/978-3-540-72079-9_10</ext-link></mixed-citation></ref>
<ref id="r27"><mixed-citation>Project MUSE. (2021, June 21). <italic>Project MUSE introduces AI-based links, powered by UNSILO, for related content</italic>. <ext-link ext-link-type="uri" xlink:href="https://about.muse.jhu.edu/news/unsilo-ai-based-links-on-muse/">https://about.muse.jhu.edu/news/unsilo-ai-based-links-on-muse/</ext-link></mixed-citation></ref>
<ref id="r28"><mixed-citation>R Core Team. (2020). <italic>The R project for statistical computing.</italic> The R Foundation. <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org/">https://www.R-project.org/</ext-link></mixed-citation></ref>
<ref id="r29"><mixed-citation>Ricci, F., Rokach, L., Shapira, B., &#x0026; Kantor, P. B. (Eds.). (2011). <italic>Recommender systems handbook</italic>. Springer. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-0-387-85820-3">https://doi.org/10.1007/978-0-387-85820-3</ext-link></mixed-citation></ref>
<ref id="r30"><mixed-citation>Rieck, K. (2019). The FWF&#x2019;s Open Access Policy over the last 15 Years &#x2013; Developments and Outlook. <italic>Mitteilungen Der Vereinigung &#x00D6;sterreichischer Bibliothekarinnen Und Bibliothekare</italic>, <italic>72</italic>(2). <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.31263/voebm.v72i2.2837">https://doi.org/10.31263/voebm.v72i2.2837</ext-link></mixed-citation></ref>
<ref id="r31"><mixed-citation>Rohini, U., &#x0026; Ambati, V. (2007). Extracting Keyphrases from books using language modeling approaches. In <italic>Proceedings of the 3rd International Conference on Universal Digital Library</italic>. ulib.isri.cmu.edu/conference/2007/Rohini.pdf</mixed-citation></ref>
<ref id="r32"><mixed-citation>Schafer, J. B., Konstan, J., &#x0026; Riedl, J. (1999). Recommender systems in e-commerce. In <italic>Proceedings of the 1st ACM conference on electronic commerce</italic> (pp. 158&#x2013;166). ACM. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/336992.337035">https://doi.org/10.1145/336992.337035</ext-link></mixed-citation></ref>
<ref id="r33"><mixed-citation>Silge, J., &#x0026; Robinson, D. (2016). Tidytext: Text mining and analysis using tidy data principles in R. <italic>JOSS</italic>, <italic>1</italic>(3). 37. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.21105/joss.00037">https://doi.org/10.21105/joss.00037</ext-link></mixed-citation></ref>
<ref id="r34"><mixed-citation>Silge, J., &#x0026; Robinson, D. (2017). <italic>Text Mining with R: A tidy approach</italic> (1<sup>st</sup> ed.). O&#x2019;Reilly Media. <ext-link ext-link-type="uri" xlink:href="https://www.tidytextmining.com/">https://www.tidytextmining.com/</ext-link></mixed-citation></ref>
<ref id="r35"><mixed-citation>Smith, B., &#x0026; Linden, G. (2017). Two decades of recommender systems at Amazon.com. <italic>IEEE Internet Computing</italic>, <italic>21</italic>(3), 12&#x2013;18. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/MIC.2017.72">https://doi.org/10.1109/MIC.2017.72</ext-link></mixed-citation></ref>
<ref id="r36"><mixed-citation>Smith, M., Ceni, A., Milic-Frayling, N., Shneiderman, B., Mendes Rodrigues, E., Lescovec, J., &#x0026; Dunne, C. (2010). <italic>NodeXL: a free and open network overview, discovery and exploration add-in for Excel</italic>. Social Media Research Foundation. <ext-link ext-link-type="uri" xlink:href="http://www.smrfoundation.org">http://www.smrfoundation.org</ext-link></mixed-citation></ref>
<ref id="r37"><mixed-citation>Snijder, R. (2019). Patterns of information&#x2014;Clustering books and readers in open access libraries. In <italic>The deliverance of open access books: Examining usage and dissemination</italic> (pp. 83&#x2013;103). Amsterdam University Press. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.26530/OAPEN_1004809">https://doi.org/10.26530/OAPEN_1004809</ext-link></mixed-citation></ref>
<ref id="r38"><mixed-citation>Snijder, R. (2021). <italic>OK Computer, what are these books about&#x003F; &#x2013; An experiment in large-scale classification of open access books</italic>. Manuscript submitted for publication.</mixed-citation></ref>
<ref id="r39"><mixed-citation>Souza, R. R., &#x0026; Raghavan, K. S. (2014). <italic>Extraction of keywords from texts: An exploratory study using noun phrases</italic>. <ext-link ext-link-type="uri" xlink:href="https://hdl.handle.net/10438/28306">https://hdl.handle.net/10438/28306</ext-link></mixed-citation></ref>
<ref id="r40"><mixed-citation>Sprenger, F. (2015a). <italic>Politik der Mikroentscheidungen: Edward Snowden, Netzneutralit&#x00E4;t und die Architekturen des Internets</italic>. meson press. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/37577">https://library.oapen.org/handle/20.500.12657/37577</ext-link></mixed-citation></ref>
<ref id="r41"><mixed-citation>Sprenger, F. (2015b). <italic>The Politics of Micro-Decisions: Edward Snowden, net neutrality, and the architectures of the Internet</italic>. meson press. <ext-link ext-link-type="uri" xlink:href="https://library.oapen.org/handle/20.500.12657/37575">https://library.oapen.org/handle/20.500.12657/37575</ext-link></mixed-citation></ref>
<ref id="r42"><mixed-citation>Witt, S. (2017). The evolution of privacy within the American Library Association, 1906&#x2013;2002. <italic>Library Trends</italic>, <italic>65</italic>(4), 639&#x2013;657. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1353/lib.2017.0022">https://doi.org/10.1353/lib.2017.0022</ext-link></mixed-citation></ref>
</ref-list>
<fn-group>
<title>Notes</title>
<fn id="fn1"><p>Recently, Project MUSE announced a recommender system based on artificial intelligence (<xref ref-type="bibr" rid="r27">Project MUSE, 2021</xref>).</p></fn>
<fn id="fn2"><p>OAPEN Foundation. <italic>OAPEN Library</italic>. <ext-link ext-link-type="uri" xlink:href="https://www.oapen.org">https://www.oapen.org</ext-link>.</p></fn>
<fn id="fn3"><p>See <ext-link ext-link-type="uri" xlink:href="https://scholar.google.com/scholar?hl&#x003D;en&#x0026;as_sdt&#x003D;0,5&#x0026;q&#x003D;%22Google+books+ngram%22">https://scholar.google.com/scholar?hl&#x003D;en&#x0026;as_sdt&#x003D;0,5&#x0026;q&#x003D;%22Google+books+ngram%22</ext-link>.</p></fn>
</fn-group>
<app-group>
<app id="app1">
<title>Appendix</title>
<table-wrap id="tb003" position="float" orientation="portrait">
<caption><p>Trigrams of the book &#x201C;Complexity, Security and Civil Society in East Asia&#x201D;.</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top">Trigram</th>
<th align="left" valign="top">Amount</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">civil society organizations</td>
<td align="left" valign="top">78</td>
</tr><tr>
<td align="left" valign="top">rok foreign policy</td>
<td align="left" valign="top">57</td>
</tr><tr>
<td align="left" valign="top">world economic forum</td>
<td align="left" valign="top">43</td>
</tr><tr>
<td align="left" valign="top">civil society networks</td>
<td align="left" valign="top">35</td>
</tr><tr>
<td align="left" valign="top">greenhouse gas emissions</td>
<td align="left" valign="top">29</td>
</tr><tr>
<td align="left" valign="top">berkeley nautilus institute</td>
<td align="left" valign="top">24</td>
</tr><tr>
<td align="left" valign="top">east asia institute</td>
<td align="left" valign="top">23</td>
</tr><tr>
<td align="left" valign="top">jeju peace forum</td>
<td align="left" valign="top">23</td>
</tr><tr>
<td align="left" valign="top">north korean nuclear</td>
<td align="left" valign="top">21</td>
</tr><tr>
<td align="left" valign="top">climate change adaptation</td>
<td align="left" valign="top">20</td>
</tr><tr>
<td align="left" valign="top">doi http dx.doi</td>
<td align="left" valign="top">20</td>
</tr><tr>
<td align="left" valign="top">green economy policies</td>
<td align="left" valign="top">19</td>
</tr><tr>
<td align="left" valign="top">climate change mitigation</td>
<td align="left" valign="top">17</td>
</tr><tr>
<td align="left" valign="top">energy security policies</td>
<td align="left" valign="top">17</td>
</tr><tr>
<td align="left" valign="top">seoul nautilus institute</td>
<td align="left" valign="top">17</td>
</tr><tr>
<td align="left" valign="top">united nations development</td>
<td align="left" valign="top">17</td>
</tr><tr>
<td align="left" valign="top">nautilus institute 2010</td>
<td align="left" valign="top">16</td>
</tr><tr>
<td align="left" valign="top">2011 doi http</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">geneva world economic</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">napsnet special report</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">nuclear fuel cycle</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">nuclear power plants</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">security seoul nautilus</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">york united nations</td>
<td align="left" valign="top">15</td>
</tr><tr>
<td align="left" valign="top">energy security seoul</td>
<td align="left" valign="top">14</td>
</tr><tr>
<td align="left" valign="top">energy supply security</td>
<td align="left" valign="top">14</td>
</tr><tr>
<td align="left" valign="top">sea level rise</td>
<td align="left" valign="top">14</td>
</tr><tr>
<td align="left" valign="top">yonhap news agency</td>
<td align="left" valign="top">14</td>
</tr><tr>
<td align="left" valign="top">2008 doi http</td>
<td align="left" valign="top">13</td>
</tr>
<tr>
<td align="left" valign="top">east asia green</td>
<td align="left" valign="top">13</td>
</tr>
</tbody>
</table>
</table-wrap>
</app>
</app-group>
</back>
</article>