Unit I: Introduction to Social Justice in the Digital Humanities
1.5 The SCWAReD Projects: Scholar-Curated Worksets for Analysis, Reuse & Dissemination
This case study is written by Isabella Magni, Ryan Dubnicek, José Eduardo González, Janet Swatscheno, Glen Layne-Worthey, J. Stephen Downie, Maryemma Graham, and John A. Walsh. The page is designed by Anna Villarica.
I. What is SCWAReD?
The HathiTrust Research Center’s project on Scholar-Curated Worksets for Analysis, Reuse & Dissemination, orSCWAReD (pronounced squared), generously supported from 2020 to 2023 by the Mellon Foundation, produced a suite of scholar-curated, targeted worksets of materials from the HathiTrust Digital Library (HTDL). HTRC worksets are user-created collections of HathiTrust volumes that can be treated as data and analyzed using a variety of tools and services. Worksets can be shared and cited, contributing to research reproducibility and durable scholarship. In addition to their intrinsic value as focused digital collections, SCWAReD’s enhanced, scholar-curated worksets also serve as illustrative, reusable research models and include not only the worksets themselves but also scholarly introductions, derived datasets and related documentation, and research reports, demonstrating the collaborative workset-building, textual analysis, workflow development, and dataset creation activities typically carried out by the HathiTrust Research Center (HTRC).
The special mission of SCWAReD is to highlight and center the work of historically under-resourced and marginalized textual communities. For this purpose, a flagship project and four additional projects were selected; each explores new methods for creating, analyzing, and reusing curated digital library collections and their resulting research data.
II. The SCWAReD Projects
Black Book Interactive Project in HathiTrust
HTRC partnered with SCWAReD co-PI Dr. Maryemma Graham and her team (Jade Harrison, Ashley Simmons, and Brendan Williams-Childs) at the University of Kansas to develop a flagship research model based on the Project on the History of Black Writing (HBW). HBW was founded in 1983 at the University of Mississippi by Dr. Graham and has been hosted since 1998 at the University of Kansas. Its principal activities include the creation of the largest known bibliographic database of African-American literary texts in existence; a robust program of summer institutes for professional development and student engagement; community building; and research publishing. HBW’s more recent project to create and curate digital full-text versions of works documented in the bibliography is subsumed under the Black Book Interactive Project (BBIP). The motivating question for BBIP is this: How well is the largest known bibliography of African-American-authored texts reflected and represented in the largest known academic digital library? With a combination of advanced metadata matching and methods in textual analysis, this SCWAReD flagship project seeks not only to answer this question but also to remediate some of the discovered gaps by seeking sources for as many missing texts as possible, developing new ways of incorporating as many of these into the HathiTrust Digital Library as possible, devising new ways to derive data from those texts that cannot be ingested into HathiTrust, and creating SCWARed worksets and research models from all the diverse sources that we are able to gather.
Projects selected for SCWAReD-funded Advanced Collaborative Support
On top of the Black Book Interactive Project, four additional projects were selected through a competitive process to create, document, and analyze the corresponding worksets, in collaboration with HTRC. They are outlined below:
Click on the image hotspots above to find out more about the four projects. |
III. Supporting Diversity in Digital Humanities with Worksets
The SCWAReD project helps to answer ongoing calls in the digital humanities to diversify both research and dataset production, partially through the emphasis on collaboration, which Isabel Galina Russell has identified as a fruitful avenue “towards achieving a more inclusive and open” digital humanities (2014, p. 315). In addition to Russell, Risam (2015) and Mahony (2018), among many, have called for a shift from the North American, English-centric data and study in digital humanities. Assembling and publishing expert-curated worksets of traditionally marginalized items, along with associated scholarly documentation and general sets of derived data is designed to help inspire both new work in non-traditional research areas of DH as well as help lower the barrier of entry for new scholars to engage in digital humanities research. SCWAReD projects are also designed to serve as a replicable model for meaningful collaboration, especially aimed at including new and non-traditional research and researchers. Prof. Kim Gallon, a SCWAReD project director and prominent voice in Black digital humanities, identifies a new Black digital praxis influenced by the notion of the workset. Key to this praxis are the concepts of community building and participation; recovery and iterative data collection, documentation, and analysis; and viewing the workset as a living product that can support user tags and metadata additions (Gallon, 2021), concepts that are also at the heart of SCWAReD.
Figure 1: Kim Gallon is an Associate Professor of Africana Studies at Brown University. Here, |
Filling the gaps
Diverse collaboration can often only take the field so far, as digital humanities research is at the mercy of data quality and availability. Given the legal history of text and data mining (Grimmelmann, 2016) and corresponding digitization policies and initiatives, the search for relevant materials for non-canonical cultural research is often one that ends with empty hands. The SCWAReD project attempts to pilot and examine a workflow for filling gaps in incomplete digital collections–gaps that too often silence voices and perspectives at the periphery. The emphasis on recovering and documenting existing marginalized (or “hidden”) portions of collections implies the discovery and potential remediation of gaps in a digital library collection that hopes to be reasonably comprehensive.
The causes of neglect and marginalization of certain texts and textual communities are many, beginning with historical injustices related to the social, economic, and cultural conditions in which these texts were first produced, distributed, collected, and preserved [1]. What we must recognize, and what our project seeks to address, is that such injustices persist in library collections and continue today in our own library practice. The HathiTrust Digital Library is only as complete as the collections of its contributing member libraries, in which historical injustices are of course reflected. At the same time, similar historical injustices have impeded the development of infrastructures required to allow under-resourced non-member libraries (many of which have collections that could indeed fill important gaps) to join HathiTrust and contribute. The SCWAReD project cannot, of course, even begin to address the deepest of these injustices, nor can it remove the barriers to participation in HathiTrust. But what it can do is to identify some of the intellectual consequences of those injustices (“collection gaps,” for want of a better term), and adopt innovative strategies for outreach and engagement in the recovery and inclusion of missing materials through available means [2]. Below we explore in more detail one of the five SCAReD workset projects, José González’s Worksets for Spanish American Fiction.
IV. Case Study: Period-Specific Worksets for Spanish American Fiction [3].
Rationale for the project
While the majority of volumes currently included in the HathiTrust Digital Library (HTDL) are written in English (51.1%), Spanish only accounts for 6.5% of the 17.6+ million current total volumes[4].Period-Specific Worksets for Spanish American Fiction (directed by Prof. José Eduardo González) aims to create worksets of relevant items available in HTDL - both in and out of copyright - for scholars interested in Spanish-language literature, and in particular Spanish American fiction. Working towards multilingual digital humanities communities, projects, methods and practices has been at the center of several conferences and publications in the past decade. Isabella Galina Russell’s pivotal question “‘who is ‘we’?” (Galina, 2014, p. 307) and her remarks on the general status of digital humanities as “a reflection of the way academia works in general with English as the predominant language and with a few countries having a far larger representation and research output” (Galina, 2014, p. 314) still resonate today, though many efforts have been made since then to promote a more inclusive and multilingual DH [5].
An additional goal of González’s project is indeed to incentivise the use of digital and computational methods - and in particular quantitative methods - for the study of Spanish American literature. Lastly, this project will help HathiTrust to identify and potentially rectify gaps in HTDL collections of Spanish American fiction, contributing to a less English-centric and more inclusive HTDL. Period-Specific Worksets for Spanish American Fiction includes six worksets, each focusing on different literary periods, genres or styles (To these, a seventh workset of “Unsorted fiction” was compiled in part as a comparative set to use for analysis purposes, and in part as a continued effort to compile and share lists of Spanish American novels available to researchers in the HTDL, classified by country of origin):
Building worksets
The six (plus one) scholarly-curated lists of volumes assembled by Prof. González, were used to perform automatic searches of the HTDL. The searching process developed in three main phases: 1) pre-processing to correct errors and normalize data (title/author lists); 2) searching the published HathiFiles (an up-to-date listing of the entire HTDL holdings which includes bibliographic metadata) for matches for each title/author combination from the list provided; 3) reviewing search results to evaluate matches and identify any potential false positives. Searching was done via fuzzy string (text) matching to allow potential variations in spelling of both the title and author names. Around 68.9% of volumes searched were found in the HT digital library, with the following breakdown by genre:
Table 1: Coverage of Latin American genres in HathiTrust Digital Library |
Figures 1-3: Slide right to see numbers representing coverage of Latin American genres in HTDL (Figures 1 and 2); |
Figure 4: Items in each Latin American genre fiction workset, by year. |
Analysis results: The case of the Mexican Revolution
Using the seven worksets described above, Prof. González performed analysis aimed at automatically identifying each of the six genres, using the “Unsorted” list as a comparative set. The initial goal was to apply techniques that had previously been successful at studying and detecting literary genres in English-language worksets to texts written in Spanish and produced in Latin America. This process used a predictive “logistic regression model” trained on examples of each genre and then given unseen samples to classify. Below is a brief description of results related to the Narrative of the Mexican Revolution genre.
Characteristics of the genre: One of the most successful genres in Latin American literature is a specific narrative that originated in Mexico during the revolution against the dictatorship of Porfirio Díaz between 1911 and 1921, thereafter labeled as “novel of the Mexican Revolution.” Though critics have long debated whether these novels constituted a “genre”, there are common characteristics that identify this group: common topics related to the revolutionary uprising, and tendency to combine literature, politics and history.
Analysis results: The analysis was performed in two different batches: volumes published between 1930 and 1950 (Figure 5), and volumes published between 1950 and 1960 (Figure 6).
Figures 5-6: Slide to the right to focus on the 1930s-1950 (Figure 5); |
As Figure 5 shows, the logistic regression model experienced problems detecting this genre from randomly chosen volumes published between 1930 and 1950. Not only are some of the random volumes considered part of the genre, but also several of the books labeled as novels of the revolution received low predicted probability numbers, potentially undermining the argument for these novels as a distinct genre. This will come as no surprise, especially for those familiar with the history of the Mexican Revolution genre: as González underlines, the high number of noisy labels in the data originate from the fact that many competing literary publications - not easily distinguishable from those considered narratives of the Mexican Revolution - were simultaneously published during those first few decades, speaking perhaps to characteristics of writing of the era rather than the genre.
Meanwhile, Figure 6 shows that the Mexican revolutionary novel is more easily distinguishable from other products in the Mexican literary field of the 1950s and 1960s. As González underlines in his Report, “this outcome is not simply the result of very few new novels being published in a dying category, but of the nationalistic discourse of the 1930s running its course”.
The SCWAReD team at the HathiTrust Research Center and the many scholars with whom we are working to developed these curated worksets continue our work and will be releasing additional worksets, data sets, and reports as they become available in the coming months. We hope these five worksets, focused on making available for research volumes relevant to historically under-resourced and marginalized textual communities, will be the first of many and we look forward to partnering with other scholars in the future to continue this work.
V. Availability
Access to HathiTrust data through HTRC is informed in part by the concept of non-consumptive research. Non-consumptive research simply refers to research that does not involve traditional human reading (or consumption) of the original text and thus does not violate copyright or licensing restrictions. Examples of non-consumptive research include computer processing of texts, such as the generation of word-frequency lists and lists of named entities (e.g, proper nouns).
Most HathiTrust Research Center tools, datasets, and services are freely available to the worldwide scholarly community. Any user may analyze HathiTrust volumes through the various web-based algorithms available on the HTRC Analytics site. These include our Topic Model Explorer, Named Entity Recognizer, and Token Count and Tag Cloud Creator. These algorithms access the full-text directly on our servers. The results are provided to users but not the raw full-text of the volumes.
Figure 7: The interface of the HTRC Analytics web environment |
Anyone can access all the derived datasets included in the github repositories for each SCWAReD workset. These include Extracted Features data for the workset, which consists of volume- and page-level metadata and page-level arrays of all the words that occur on the page, with their corresponding part of speech and occurrence counts. These extracted representations of the workset include all the words (or “tokens”) in all its volumes, but since the words don’t appear in their original order they are also non-consumptive, and thus open to all.
The HTRC Data Capsules provide a secure computing environment for researcher-driven text analysis on the HathiTrust corpus. In the data capsule, researchers may download arbitrary software and libraries or run their own code to analyze the full-text files directly. Data capsules with access to in-copyright volume are only available to researchers affiliated with HathiTrust member institutions.
Lastly, HathiTrust makes available the full-text of all volumes in their Digital Library deemed to be public domain, or not under copyright. A volume’s status as public domain is based on many factors, including the location of the user wishing to access the data. More information about this program and available data can be found on the HathiTrust datasets webpage.
Footnotes
[1]: On this topic, see, among many others: Kara Bledsoe et al. Leading by Diversifying Collections: A Guide for Academic Library Leadership. Ithaka S+R, 2022. https://doi.org/10.18665/sr.317833; Katherine Bode, “Why you can’t model away bias.” Modern Language Quarterly 81.1 (March 2020); Bonnie Mak, “Archaeology of a Digitization.” Journal of the Association for Information Science & Technology, vol. 65, no. 8, Aug. 2014, pp. 1515–26.; Merrilee Proffitt, “Casting a different net: Diversifying print monograph collecting in research libraries.” Hanging Together: the OCLC Research Blog, March 9, 2023. https://hangingtogether.org/casting-a-different-net-diversifying-print-monograph-collecting-in-research-libraries/; Nanna Bonde Thylstrup, The Politics of Mass Digitization. Cambridge, MA: MIT Press, 2019.
[2]: Examples of previous work on this issue, as it relates to HTRC: a 2019 article by Nicole M. Brown, et al., which reports on the use of statistical topic modeling to examine nearly a million documents in the HT and JSTOR collections, resulting in the recovery of 150 items related to African-American women’s experience that could not have been found through traditional metadata (Nicole M. Brown, Ruby Mendenhall, Michael Black, Mark Van Moer, Karen Flynn, Malaika McKee, Assata Zerai, Ismini Lourentzou & ChengXiang Zhai, “In Search of Zora / When Metadata Isn’t Enough: Rescuing the Experiences of Black Women Through Statistical Modeling.” Journal of Library Metadata 19, no 3-4 (2019):, 141- 162, https://doi.org/10.1080/19386389.2019.1652967); and David Bainbridge, et al., who, with the HTRC Extracted Features Dataset, uncovered many previously unidentified Māori language, tripling the number of Māori-language texts that had been documented in existing library metadata (David Bainbridge, J. Stephen Downie, Hemi Whaanga. An Open Data Approach to Revealing Indigenous Texts in Large-Scale Digital Repositories: A Case-Study of Locating Pages of Māori Text in the HathiTrust. In Laura Estill, Jennifer Guiliano, editors, 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, Ottawa, Canada, July 20-25, 2020, Conference Abstracts. 2020, https://dh2020.adho.org/wp-content/uploads/2020/07/669_AnOpenDataApproachtoRevealingIndigenousTextsinLargeScaleDigitalRepositoriesACaseStudyofLocatingPagesofMoriTextintheHathiTrust.html). These digital scholarly interventions, among many others, have turned a new lens on the age-old problem of incompleteness and gaps in research collections, even those as seemingly comprehensive as the HathiTrust Digital Library.
[3]: Adapted from José Eduardo González's Introduction to the Spanish American Fiction workset and its related Final report. Table 1, Figure 4 and 5 are taken directly from González’s Introduction (table) and Final Report (figures).
[4]: Spanish is fourth in terms of language representation in HathiTrust Digital Library (6.5%), after English (51.1%), German (8.6%) and French (7%). Data accessed on April 18, 2023. For full, updated statistics see: https://www.hathitrust.org/visualizations_languages).
[5]: Many recent publications and studies have dealt with issues of language and DH from a variety of different perspectives. Among them works by Roopika Risam (Roopika Risam. “Other Worlds, Other DHs: Notes towards a DH Accent.” Digital Scholarship in the Humanities 32 no. 2 (2017): 377–84, http://doi.org/10.1093/llc/fqv063; and Roopika Risam, New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy. Evanston, Illinois: Northwestern University Press, 2018.), Alan Liu (Alan Liu, “Toward a Diversity Stack: Digital Humanities and Diversity as Technical Problem.” PMLA 135 no. 1 (2020): 130–51, http://doi.org/10.1632/pmla.2020.135.1.130), and Domenico Fiormonte, Sukanta Chaudhuri and Paola Ricaurte (Domenico Fiormonte, Sukanta Chaudhuri, and Paola Ricaurte, eds. Global Debates in the Digital Humanities. University of Minnesota Press, 2022, https://doi.org/10.5749/9781452968919; and Domenico Fiormonte, “Digital Humanities and the Geopolitics of Knowledge.” Digital Studies/Le Champ Numérique 7 1 (2017), http://doi.org/10.16995/dscn.274). Additional, selected, literature on the topic includes: Pedro Nilsson-Fernàndez, and Quinn Dombrowski, 'Multilingual Digital Humanities', in O'Sullivan, J. (ed)., The Bloomsbury Handbook to the Digital Humanities, New York: Bloomsbury Publishing (2022): 81-90; "Preparing Non-English Texts for Computational Analysis". Modern Languages Open 1 (2020): 1–9, https://doi.org/10.3828/mlo.v0i0.294; Paul Joseph Spence, and Renata Brandao, “Towards Language Sensitivity and Diversity in the Digital Humanities”, Digital Studies / Le champ numérique 11 1 (2021), https://doi.org/10.16995/dscn.8098; Alex Gil, “The (Digital) Library of Babel,” 2014, http://elotroalex.com/digital-library-babel/.; Alex Gil, and Élika Ortega, “Global Outlooks in Digital Humanities: Multilingual Practices and Minimal Computing,” In Doing Digital Humanities: Practice, Training, Research, edited by Constance Crompton, Richard J. Lane, and Ray Siemens (2016): 22–34. New York, NY: Routledge.; Élika Ortega, “Zonas de Contacto: A Digital Humanities Ecology of Knowledges.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein (2019): 179–87. Minneapolis, Minnesota: University of Minnesota Press, http://doi.org/10.5749/j.ctvg251hk.18; Priani Saisó, Ernesto, Paul Spence, Isabel Galina Russell, Elena González-Blanco García, Daniel Alves, José Francisco Barrón Tovar, Marco Antonio Godínez Bustos, and Maria Clara Paixão de Sousa, “Las Humanidades Digitales En Español y Portugués. Un Estudio de Caso: DíaHD/DiaHD.” Anuario Americanista Europeo 0 12 (2014), https://dialnet.unirioja.es/servlet/articulo?codigo=5071434.; Nick Thieberger, “What Remains to Be Done—Exposing Invisible Collections in the Other 7,000 Languages and Why It Is a DH Enterprise.” Digital Scholarship in the Humanities 32 2 (2017): 423–34, http://doi.org/10.1093/llc/fqw006; Whose Language? Whose DH? Towards a taxonomy of definitional elusiveness in the digital humanities, Digital Scholarship in the Humanities (2022), https://doi.org/10.1093/llc/fqac072.; Aliz Horvath, “Enhancing Language Inclusivity in Digital Humanities: Towards Sensitivity and Multilingualism: Includes Interviews with Erzsébet Tóth-czifra and Cosima Wagner,” Modern Languages Open 0 1 (2021): 26, https://doi.org/10.3828/mlo.v0i0.382.
Author Bio*:
Isabella Magni is Lecturer in Digital Humanities at the University of Sheffield. She previously held postdoctoral fellowships at the HathiTrust Research Center, Rutgers University, and the Newberry Library. Dr. Magni works at the intersection of digital humanities, textual studies, philology, and palaeography. She is editor and co-investigator for a number of digital projects including Petrarchive and Italian Paleography.
Ryan Dubnicek is a Digital Humanities Specialist with HTRC, where he works on external research collaboration, tool and service design and user outreach.
José Eduardo González is Associate Professor of Spanish and Ethnic Studies at the University of Nebraska-Lincoln. His research centers on twentieth and twenty-first century Latin American narrative and Digital Humanities approaches to literary history. He is author of Borges and the Politics of Form and Appropriating Theory: Angel Rama’s Critical Work, and co-editor of Primitivism and Identity in Latin America.
Janet Swatscheno is the Digital Scholarship Librarian and Associate Director for Outreach & Education at the HathiTrust Research Center. Before joining HathiTrust, she held the position of Digital Publishing Librarian and Co-Director of the Digital Humanities Initiative at the University of Illinois Chicago (UIC).
J. Stephen Downie is a Professor and Associate Dean for Research at the School of Information Science, University of Illinois. He is also the Illinois Co-director of the HathiTrust Research Center (HTRC). Dr. Downie has been an active member of the music information retrieval, digital humanities, and digital libraries research communities for the past 30 years.
Glen Layne-Worthey is Associate Director for Research Support Services in the HathiTrust Research Center, based in the University of Illinois at Urbana-Champaign School of Information Sciences. Formerly, he was Digital Humanities Librarian at Stanford, 1997-2019, and is currently Chair of the Alliance of Digital Humanities Organizations (ADHO) Executive Board.
Maryemma Graham is University Distinguished Professor in the Department of English at the University of Kansas. She is the author/editor of 12 books on teaching, black literature, and literary history, most notably The Cambridge History of African American Literature with Jerry W. Ward, Jr. Known best as the Founding Director of the History of Black Writing (1983), she has been at the forefront of the transformations in Black literary studies. Today HBW is a leading center for literary recovery and engaged scholarship, known for its early use of interactive technologies. The House Where My Soul Lives: The Life of Margaret Walker, the first complete biography of the poet and novelist, was published in December 2022 by Oxford University Press.
John A. Walsh is an Associate Professor of Information and Library Science at Indiana University and Director of the HathiTrust Research Center. His research applies computational methods to the study of literary and historical documents. Research interests include computational literary studies, textual studies, book history, 19th-century British literature, and comic books.
Designer Bio*:
Anna Villarica is a research assistant on the #dariahTeach project. She is a junior lecturer at Maastricht University currently teaching courses on design thinking, digital transformations, the philosophy of technology, research skills, and museology. She received her MA in Media Studies Digital Cultures from Maastricht University and her BA in Communications and New Media from the National University of Singapore. While she does not specialise in anything (yet), she loves all things digital and is always learning and creating.
*Bios and affiliations are accurate at the time of writing
References
- Bainbridge, David, J. Stephen Downie, Hemi Whaanga. “An Open Data Approach to Revealing Indigenous Texts in Large-Scale Digital Repositories: A Case-Study of Locating Pages of Māori Text in the HathiTrust.” In Laura Estill, Jennifer Guiliano, editors, 15th Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2020, Ottawa, Canada, July 20-25, 2020, Conference Abstracts. 2020, https://dh2020.adho.org/wp-content/uploads/2020/07/669_AnOpenDataApproachtoRevealingIndigenousTextsinLargeScaleDigitalRepositoriesACaseStudyofLocatingPagesofMoriTextintheHathiTrust.html.
- “The Black Book Interactive Project.” Accessed April 28, 2023. http://bbip.ku.edu/.
- Bledsoe, Kara, et al. Leading by Diversifying Collections: A Guide for Academic Library Leadership. Ithaka S+R, 2022. https://doi.org/10.18665/sr.317833.
- Bode, Katherine “Why you can’t model away bias.” Modern Language Quarterly 81.1 (March 2020); Mak, Bonnie. “Archaeology of a Digitization.” Journal of the Association for Information Science & Technology, vol. 65, no. 8, Aug. 2014, pp. 1515–26.
- Brown, Josh, “Whose Language? Whose DH? Towards a taxonomy of definitional elusiveness in the digital humanities,” Digital Scholarship in the Humanities (2022), https://doi.org/10.1093/llc/fqac072.
- Brown, Nicole M., Ruby Mendenhall, Michael Black, Mark Van Moer, Karen Flynn, Malaika McKee, Assata Zerai, Ismini Lourentzou & ChengXiang Zhai, “In Search of Zora / When Metadata Isn’t Enough: Rescuing the Experiences of Black Women Through Statistical Modeling.” Journal of Library Metadata 19, no 3-4 (2019): 141- 162. https://doi.org/10.1080/19386389.2019.1652967.
- Dombrowski, Quinn. "Preparing Non-English Texts for Computational Analysis," Modern Languages Open: (2020): 1–9, https://doi.org/10.3828/mlo.v0i0.294.
- Fiormonte, Domenico. “Digital Humanities and the Geopolitics of Knowledge.” Digital Studies/Le Champ Numérique 7 1 (2017). http://doi.org/10.16995/dscn.274.
- Fiormonte, Domenico, Sukanta Chaudhuri, and Paola Ricaurte, eds. Global Debates in the Digital Humanities. University of Minnesota Press, 2022. https://doi.org/10.5749/9781452968919.
- Galina Russell, Isabel. 2014. “Geographical and Linguistic Diversity in the Digital Humanities.” Literary and Linguistic Computing 29 (3): 307–316. http://doi.org/10.1093/llc/fqu005.
- Gallon, Kim. “What Black Bibliography and Collection Building Can Teach Us about the Radical Potential of the Workset,” October 21, 2021, keynote presentation at HathiTrust 2021 Member Meeting, 46:07. https://youtu.be/ajk4d2BN2-4?t=4792.
- Gil, Alex. “The (Digital) Library of Babel,” 2014, http://elotroalex.com/digital-library-babel/.
- Gil, Alex, and Élika Ortega, “Global Outlooks in Digital Humanities: Multilingual Practices and Minimal Computing,” In Doing Digital Humanities: Practice, Training, Research, edited by Constance Crompton, Richard J. Lane, and Ray Siemens, 22–34. New York, NY: Routledge, 2016.
- Grimmelmann, James. Copyright for Literate Robots, 101 IOWA L. REV. (2016): 657-681. https://ssrn.com/abstract=2606731.
- González, José Eduardo, “Final Report: Period-Specific Worksets for Spanish American Fiction.” HathiTrust Research Center, accessed April 28, 2023, https://htrc.github.io/scwared-spanish-american-fiction/final-report.html.
- González, José Eduardo, “Introduction to Spanish American Fiction Workset.” HathiTrust Research Center, accessed April 28, 2023, https://htrc.github.io/scwared-spanish-american-fiction/introduction.html.
- “The History of Black Writing.” Accessed April 28, 2023. https://hbw.ku.edu/.
- Horvath, Aliz, “Enhancing Language Inclusivity in Digital Humanities: Towards Sensitivity and Multilingualism: Includes Interviews with Erzsébet Tóth-czifra and Cosima Wagner,” Modern Languages Open 0 1, (2021): 26, https://doi.org/10.3828/mlo.v0i0.382.
- Liu, Alan. “Toward a Diversity Stack: Digital Humanities and Diversity as Technical Problem.” PMLA 135 no. 1 (2020): 130–51. http://doi.org/10.1632/pmla.2020.135.1.130.
- Mahony, Simon. “Cultural Diversity and the Digital Humanities,” Fudan Journal of the Humanities and Social Sciences 11, no. 3 (September 1, 2018): 371–88. https://doi.org/10.1007/s40647-018-0216-0.
- Nilsson-Fernàndez, Pedro, and Dombrowski, Quinn. 'Multilingual Digital Humanities.' In The Bloomsbury Handbook to the Digital Humanities, edited by O'Sullivan, 81-90. New York: Bloomsbury Publishing, 2022.
- Ortega, Élika, “Zonas de Contacto: A Digital Humanities Ecology of Knowledges,” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein, 179–87. Minneapolis, Minnesota: University of Minnesota Press, 2019, http://doi.org/10.5749/j.ctvg251hk.18.
- Priani Saisó, Ernesto, Paul Spence, Isabel Galina Russell, Elena González-Blanco García, Daniel Alves, José Francisco Barrón Tovar, Marco Antonio Godínez Bustos, and Maria Clara Paixão de Sousa, “Las Humanidades Digitales En Español y Portugués. Un Estudio de Caso: DíaHD/DiaHD,” Anuario Americanista Europeo 0 12 (2014), https://dialnet.unirioja.es/servlet/articulo?codigo=5071434.
- Proffitt, Merrilee. “Casting a different net: Diversifying print monograph collecting in research libraries.” Hanging Together: the OCLC Research Blog, March 9, 2023. https://hangingtogether.org/casting-a-different-net-diversifying-print-monograph-collecting-in-research-libraries/.
- Risam, Roopika. “Beyond the Margins: Intersectionality and the Digital Humanities,” Digital Humanities Quarterly009, no. 2 (September 2, 2015).
- Risam, Roopika. “Other Worlds, Other DHs: Notes towards a DH Accent.” Digital Scholarship in the Humanities 32 no.2 (2017): 377–84. http://doi.org/10.1093/llc/fqv063.
- Risam, Roopika. New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy. Evanston, Illinois: Northwestern University Press, 2018.
- Spence, Paul Joseph, and & Brandao, Renata, “Towards Language Sensitivity and Diversity in the Digital Humanities”, Digital Studies / Le champ numérique 11 1 (2021), https://doi.org/10.16995/dscn.8098.
- Thieberger, Nick, “What Remains to Be Done—Exposing Invisible Collections in the Other 7,000 Languages and Why It Is a DH Enterprise,” Digital Scholarship in the Humanities 32 2 (2017): 423–34, http://doi.org/10.1093/llc/fqw006.
- Thylstrup, Nanna Bonde. The Politics of Mass Digitization. Cambridge, MA: MIT Press, 2019.