Digital Humanities Projects
 — by Robert W. Williams

This page is updated for 1 November 2023. I added Section 9, which contains a work flow using regular expression search protocols to decipher illegible letters in the words of a digitized Du Bois manuscript.

 The General Purpose of the Retextualizer Project

 Retextualizer is a browser-based application for digital humanities ​(DH) research that is designed to facili­tate new inter­pre­ta­tions of a text, spe­cif­i­cally by disassem­bling texts into mean­ing­ful com­po­nents (here sentences), and then reassem­bling the com­ponents into dif­ferent con­figura­tions, whether in reverse order or in random arrange­ments.

 Retextualizer rearranges the original essay by juxta­posing sentences​​perhaps jar­ringly​​that were not initially so posi­tioned; it thereby can pro­vide the con­di­tions for new insights into the text, its ideas, and its themes.

 Each Retextualizer web page repeats the project's general pur­pose, as well as the instruc­tions, which also can be read below. In addition, each project page will contain further information relevant to that specific essay.

 Retextualizing the Works of W.E.B. Du Bois

 "Souls of Black Folk" [SBFI](The Independent, 1904)

 "The Individual and Social Conscience" [IASC](1905)

 "Address to the Country" [ATTC](1906)

 "The Nature of Intellectual Freedom" [IFRE](1949)

 "Apologia", Suppression of the African Slave-Trade [SSTA](1954)

 "Postscript", The Ordeal of Mansart [PSOM](1957)

 The Project Goals of Retextualizer

Digital Humanities Research:

 Texts can be read in sequence as created and/or pre­sented, pub­licly or other­wise. With computers, we can digitally interact with such works. In their written forms as types­cripts or manu­scripts texts can be digitized and then can be (re-)ana­lyzed and (re-)inter­preted via computer software. Section 6 below lists online resources that cover various dimensions of the digital humanities. Many more resources can be located via Internet searches.

 The digital manipulation of texts includes deformance, as Jerome McGann and Lisa Samuels called it in "Deformance and Interpretation" ​[New Literary History, 30:1 (1999): 25-56; Accessible online]. Literary works have unstable meanings, McGann and Samuels argued, and dis­cussed several methods to use on literary texts, typically poems, including a reversal of the poem's lines.

 Such techniques of deformation have received support, such as:

Robert W. Williams's Research:

 The current Retex­tu­al­izer applica­tion builds on a previous version which had no copying, viewing, or sentence-numbering features. I initially coded the basic ran­domizing and display functions in May 1999 as a way to create and present ran­dom­ized versions of essays by Immanuel Kant and Walter Benjamin.

 Since that first version, digital humanities research has come to influ­ence my schol­ar­ship, most notably by means of computer applica­tions, such as conc­ordancers and col­lation soft­ware. Those digital tools help me to under­stand how Du Bois paired words and phrases within their con­texts, and also to illus­trate how he re-used and mod­i­fied text in dif­ferent works over time. The Retex­tu­al­izer project con­tinues this avenue of my research.

 I have created several hypertext presentations on issues related to DH. ​[This subsection was posted for the 1 February 2019 update.]

"The Intertextuality of Du Bois's Idea of Humanity: A Collation Analysis": The African American Studies and Research Center at Purdue University hosted the 30th Symposium on African American Culture and Philosophy on 1-3 December 2016. The symposium theme was "Exploring the 'Humanity' in the Digital Humanities".

"Algorithmic Displacement and the Black Atlantic: Retextualizing the 'Souls' Essay by W.E.B. Du Bois": I presented this at the 2018 African American Digital Humanities Conference, held at the University of Maryland, College Park, on 20 October 2018. I covered the use of the Retextualizer application on the SBFI essay.

 Digital Humanities: Online Texts and Related Resources

[Section 7 was created for the 1 October 2020 update.]

Websites: Blogs, Centers, and DH Projects

[Section 8 was created for the 1 March 2022 update.]

Regular Expressions (Regexes)

[Section 9 was created for the 1 November 2023 update.]

Regex Usage: Interpreting Illegible Letters

I created a multi-part tweet in January 2023 on one way to use regular expressions to determine possible characters that otherwise were indecipherable in words arising from printed or handwritten documents. I present that tweet below in its original parts, but with a few additions and reconfigurations to enhance clarity.

#Regex Technique: Unknown Words
The unpublished works of #WEBDuBois sometimes contain handwritten words that I can't read.
To find potential words that fit the sentence's meaning I use regular expression searches of a word list loaded into a text editor or concordancer.

Case Study: Handwritten Words
I screen-captured one line within what seems to be a never-finished manuscript in which Du Bois discussed his underlying philosophy.

File: Unpublished Du Bois document archived in the Credo repository (UMASS Library's Special Collections).
Title: "Steps Toward a Science of How Men Act"
Typescript: 4 pages + 3 pages of handwritten notes by Du Bois
ID: mums312-b213-i071

What are the 3rd & 4th handwritten words in the image?
Admittedly, those words in the image might be understandable within the context of the sentence fragment.
For purposes of illustrating the #regex technique, I will try to decipher those words in this tweet thread.

#Regex coded by these indicators:
=Discernible letters (based on similarities w/ letters in known words).
=Number of letters in the word (create range of letters if boundaries are indistinct).
=Contractions need to be expanded for the word list, not for the dictionary.

Third word:
=Starting "e"
="g" or other letters with descenders?
=How many letters: 7/8?


Regex briefly annotated (metacharacters manage the search):
\b  =word boundary
e  =literal letter "e"
[a-z]{1,3}  =range of 1-3 letters: "a" through "z"
[gjpqy]  =match any 1 letter
[a-z]{4,5}  =range of 4-5 letters: "a" through "z"
\b  =word boundary

Regex results of 3rd word: 1000+ matches.

Disambiguation of the irrelevant matches is needed because regex results of 3rd word exceeded 1000 matches.

The more letters we know, or can hypothesize about, the fewer the matches.

Hypothesize "v" and "h":

Regex briefly annotated:
\b  =word boundary
ev  =literal letters "e" and "v"
[a-z]{4,5}  =range of 4-5 letters: "a" through "z"
h  =literal letter "h"
[a-z]{1,2}  =range of 1-2 letters: "a" through "z"
\b  =word boundary

Results: 11 matches, including "everythin"
Possibly no final "g" in the original.

Unknown fourth word:
=Starting "f"
=Final letters: "ch"?
=How many letters: 5/6?


Regex briefly annotated:
\b  =word boundary
fe  =literal letters "f" and "e"
[a-z]{1,2}  =range of 1-2 letters: "a" through "z"
ch  =literal letters "c" and "h"
\b  =word boundary

Results: 6 matches, including "fetich"

Plausible interpretation of the fragment in context:
"Force - in everything - Fetich"

I often need to create different regexes: I can
=Change the range of possible letters to match.
=Change the initial or other specified letters to find more possible words.
I then repeat the searching & disambiguation phases.
This #regex technique may not find plausible candidates.

#Regex technique assumptions:
=Standardized spelling
=Era-appropriate word list or dictionary
=Case insensitive [configurable]
=Patterns: same letter is written in a recognizable style
=Discernible letter boundaries that permit a counting of the letters (or a number range).

Useful Resources: Words
A. Word list: Alphabetical [No numbers or symbols]

B. Project Gutenberg: Webster's Unabridged Dictionary
[Upper- & lowercase]

Useful Resources: Regular Expressions
A. #Regex tutorials & guides
* https://regular-expressions.info  (Jan Goyvaerts)
* https://www.rexegg.com
* https://ryanstutorials.net/regular-expressions-tutorial/
* https://riptutorial.com/regex
* http://regextutorials.com

B. Regex testing
* https://regex101.com
* https://regexr.com

END of thread