Keystone DH Conference highlights.

Having recently returned from the Keystone Digital Humanities Conference organized by UPenn and hosted in the beautiful Kislak Center for Rare Books and Special Collections, I thought I would put together a list of a few projects that intrigued me for their innovative approach or problem-solving capacity. The conference provided an opportunity for many scholars, librarians and developers to share their work, and while I met many wonderful and interesting people, the following projects may be particularly relevant to the work at the Digital Scholarship Unit at UTSC and those interested in the digital humanities more broadly. Not all of the following were necessarily presented at the conference; some were cited within talks, for example. However, all are fascinating work happening in this field!

Tools:

Collections or thematic projects:

Initiatives:

  • Hamilton College’s Digital Humanities Initiative – well thought-out, documented and organized approach to generating digital scholarship at a liberal arts college. Amazing.
  • Penn and Michigan State have partnered up to build an open-source open-access collaborative peer-reviewed journal platform for the Public Philosophy Journal. However, once built, the platform can be suited to other disciplines and fields, and has a whole lifecycle of review, collaboration, sharing, etc. I particularly appreciated the inclusion of “collegiality index” as the factor in the review process!
  • DiRT Directory of DH tools and resources

Miriam Kemper’s Keynote address can be found here.

I also went on a couple of tours around UPenn Libraries and Temple Libraries and took some pictures.
1. UPenn Information Commons and Media lab & UPenn Education Commons
2. Temple University’s Digital Scholarship Center

Access 2013 notes: Rachel Frick’s Closing Keynote.

Rachel Frick, the Director of the DLF Forum and a community builder, gave the closing keynote at Access 2013 entitled “Community, understanding, courage, and honesty”. Her diverse skills, experiences and interests clearly reflected in this uplifting talk, which underscored a few recurring themes in the conference program, and also allowed for a deeper reflection on our professional role.

Rachel began by pointing out what an exciting time it is to be in the library community, since so much opportunity and potential to reinvent the profession currently exists. In fact, according to Rachel, we are on the threshold of a new digital age of which we are not even aware. Using Johnny Cash as an icon of staying true to one’s values, she also brought up the notion of the Flipped Library. In other words, whether it is linked data that brings our local knowledge outward or the new ways we think about collection development in the digital age, it is certainly not business as usual in libraries today. While many trends and movements, such as Open Access, DPLA, RDA, Linked Data, MOOCs or Alt-Metrics are currently unfolding in library world, it is crucial for us to remember why we are in this line of work, and ultimately, remember the people we are serving. On that note, Davis Lankes’ reminder that the mission of librarians is to improve society by facilitating knowledge creation in their communities was particularly fitting.

All of these exciting opportunities are not without challenges, and as such, Rachel outlines three: openness, leadership, and courage. For example, while the discourse of openness in library service is easy to uphold, actually demonstrating it through action is often more difficult. In this way, how we digitize, describe, display, market, integrate, collaborate, and preserve our collections demonstrates our commitment to our values and mission. Traditionally, libraries held a social contract to keep cultural heritage safe, thereby acting as stewards of history. Now, however, we send our knowledge out into the wild, and must therefore think about the impact that has on a variety of potential users. Rachel cited the Rijksstudio in the Netherlands as an example of one organization walking the “openness” talk by making 125,000 high-quality masterpieces publically available to users on the web and tracking their creative, and often unexpected, use of this material. Will Noel’s Archimedes Palimpsest project, too, was one of the first to anticipate the need for machine-readable data to encourage accessibility and engagement with cultural digital content. Initiatives like these prompt us to stop and question our philosophies, values, and practices. They remind us to continue to align our actions with our larger mission.

Another key area in the changing landscape of library service is leadership – the intersection of creative ideas and people that makes things happen. Frick sees the library technology community, such as folks who come out to Access, as individuals with the aptitude to combine ideas with action. They understand the necessary questions and have the ability to respond, thereby further pushing on our leaders. Perhaps it is because they adopt what Bess Sadler calls the Hacker Epistemology, a pragmatic, problem-solving mindset, where the truth is what works and small acts of disobedience can accomplish big things.

Finally, courage is a necessity in the often-uncertain digital age. While this may be the time for the creatives, many barriers lie in the way of action. For instance, how do we get over the “big frickin’ wall” (Kathy Sierra) that seems to come between our big ideas and the current way of doing things? Rachel’s advice is:

1. get out of our backyards

2. play with others

3. connect strategic thinking to operational practice

Courage means not waiting for an invitation, but showing up to open meetings. Initiatives such as the LodLam and DPLA summits happen because people who attend them bring their best work and want to be part of the solution. Courage means staying true to a greater social purpose. Historypin, a project that inspires passion and serves a larger community is a perfect example of digital work guided by a larger social purpose. Courage means overcoming the loneliness of being a passionate person doing something new by finding allies, having faith in one’s colleagues, expecting more from each other, and by acting now.

How do we dare greatly? By asking ourselves what’s worth doing even if we fail.

You.

More.

Now.

These were the final words of advice from Rachel, and they succinctly capture the spirit of Access 2013 as we address the challenges ahead. What a great way to wrap up a rockin’ good time in St. John’s! Many thanks to everyone involved for putting this conference together.

Access 2013 notes: highlights.

  • North Carolina State University recently built a new Hunt Library on campus. In order to promote it, the library staff asked its users (students, faculty, and staff of NCSU) to share their interaction with the space through Instagram. The #MyHuntLibrary project demonstrates one way to market library spaces and to interact with users through a social media platform.
  • Finding material in a large library space, especially one that relies on Library of Congress classification system, can be intimidating. Ryerson University Libraries created a simple but robust user-oriented app called BookFinder.  Labelling every book stack and shelf with a number, taking into account where each stack starts and ends, as well as syncing item availability information tells the student the exact physical location of their material.
  • A couple of fellas from Montana State University wanted to “push the boundaries of the codex as a container for information”, and in a sense, break apart the book as we think of it. Essentially envisioning  the book as a networked API of content, they wanted to experiment with the book “as a medium to be hyperlinked, marked up, styled, and analyzed as a full participant in the web of data”. Here is an example of a cook book as reinvented by the students on MSU.
  • Riding (literally!) the recent trend of both mobile services and maker/hacker spaces, my friend Kim Martin from Western University discussed the challenges and lessons of buying, equipping and driving a DH Maker Bus around London, ON. Community creation spaces, serving rural communities, establishing partnerships, outreach services, peer training and fostering transliteracies were all goals of this project, which continues to draw supporters locally and across North America.
  • Finally, Jon Voss presented the intersection of digital tools, collective memory and social engagement through his project, Historypin, which allows users to overlay historical photographs over current streetviews, annotate and discuss the content shared through this platform. Community outreach, programming and citizen participation is a huge part of Historypin, which very much aligns this work with the values of public libraries.

Organizing historical menus: a data curation experiment

During the weeks of April 14 – May 3, I was fortunate to spend my MLIS practicum at MITH under the supervision of Trevor Munoz. My main reason for applying to MITH as a potential practicum site was my background in the digital humanities and library and information studies, as well as my interest in data curation and research data management in the humanities. I also wanted to apply the skills I learned in the “Data Curation for Digital Humanists” course at the Digital Humanities Winter Institute hosted by MITH and the University of Maryland in January of 2013 (taught by Trevor and Dorothea Salo from UW-Madison.) I was pleased, then, to discover what a unique project Trevor chose for us during my time at MITH.

Our main project was to imagine ways to curate the publicly available data from the New York Public Library’s “What’s on the Menu?” (WOTM) special collection of digitized historical menus. The Library crowdsourced the transcription of the digitized menus via the project website and compiled the completed menu transcriptions into large spreadsheets, which are updated every two weeks as more data continue to be produced. Recently, NYPL Labs added a geo-tagging tool to add geospatial information to the historical menus, where available. An award-winning digital project, NYPL’s “What’s on the Menu?” is a great resource for exploring data curation in practice as it is free, openly accessible, and a subject of interest to humanists.

Our task as data curators, therefore, involved cleaning up, organizing, classifying, describing, structuring, representing and otherwise making more accessible the data provided on the “What’s on the Menu?” site. A user exploring the project site will quickly discover that, while over 26,000 menus have been digitized and displayed online, one can only browse them by the year the menus were produced, the name of the restaurant that produced the menu, and the total number of dishes that appeared on the menu. In other words, there is no thematic or categorical classification of these menus that would appeal to the discovery of the rich historical and cultural information contained in this unique collection. We decided that categorizing and classifying the WOTM data to support richer browsing was curatorial activity that might add value.

Before we began to categorize and curate this data, however, it was important to take stock of our data at hand. Essentially, we downloaded two spreadsheets from the WOTM site: one “Menus” set containing over 28,000 rows of data (the names and types of menu pages) and one “Dishes” set containing over 400,000 rows of data (the individual names of dishes that appeared on those menus, including their prices). After assessing the data, we wrote a data management plan (in best Digital Humanist practice) for the work we were about to begin in curating the Menus and the Dishes data.

Use Cases

While we could speculate on the potential usefulness of the data in this collection, it’s always better to have actual evidence of what users want. NYPL Labs graciously shared summaries of the requests they have received from people requesting credentials to use the WOTM application programming interface (API). These short description served as evidence of user needs, which would help guide our decision-making process. Historians, social scientists, journalists, literary food scholars, chefs, novelists, teachers and students, as well as general enthusiasts all showed interest in access to this data. In order to make the menus more accessible for these potential users, we had to imagine what types of questions they might ask of this information. This would inform the general working taxonomy, or information architecture, of our data set.

Data Cleaning

First, however, we needed to clean it up. As you can imagine, crowdsourced data tends to me messy, containing many spelling variations, typos, ambiguities, and missing fields. We used OpenRefine(formerly GoogleRefine) software to cluster and rename data fields that we considered of particular use or interest to future users of this collection, principally names of the businesses offering these menus and also, where present, the names of the categories (supplied by original cataloguers?) to which the menus had been assigned. While OpenRefine helped with the initial clustering, the large number of name and spelling variations meant some tedious line-by-line editing. This is also the data curator’s job.

Categorization

Upon a general data cleanup, I was excited to launch into the categorizing! While working on my taxonomy of menus and dishes, I learned the difference between taxonomies, thesauri and ontologies, and have built up a nice working list of controlled vocabularies to consult for various subjects. Overall, when creating the taxonomy of the menus, I focused on what the potential users of this data might seek in this unique collection. As such, I developed three levels of classification with four broad categories by which to group the menus. These included theHosting Organization that sponsored the menu, Type of Meal the menu reflected, the Restaurant Address where the meal was held, as well as theType of Gathering for which the menu was produced. For example, we decided that users might be interested in exploring political and military meals held in the “What’s on the Menu?” collection, as well as browsing the menus created for special occasions, such as George Washington’s Birthday meals. While our taxonomy structure is not the definitive version of this data, Trevor and I nonetheless believe that providing users with the ability to browse all wedding menus, High Society Banquets or meals held for royal individuals, for example, was a value-added service that data curators can provide. In other words, our categories were additional ways to enter, explore and understand this special collection that lead to discovery and learning.

User testing and evaluation was another important step in the curation process, as it allowed us to see whether we were on the right track with our curatorial practice. While we may have been following good metadata standards, referring to controlled vocabularies and linking URIs to our classification terms, none of this work would ultimately have mattered if the users could not navigate the data quickly and easily. I discovered a useful tool called TreeJack, which allows information architects to test their tree structure (the organization of information) by providing anonymous users with several tasks to complete. We also asked our test subjects to answer three brief survey questions about the choice of terms used to create the categories for the menus. Based on the feedback we received, we changed the placement and labeling of certain categories. Ultimately, a final user testing of this data curation project would help evaluate it as a whole.

Prototyping

Finally, in my last week, we decided to develop a kind of proof-of-concept for our data curation work and display it online. Inspired by Aaron Straup Cope’s talk at the Library of Congress about his work on “Parallel Flickr”, Trevor and I tried to imagine our own “Parallel Menus” to experiment with our newly-minted categories for this data. Initially, after inserting the two spreadsheets into a MySQL database hosted on my University of Alberta web server, I created a query to display each category on its own static web page by writing some PHP scripts. Trevor, however, taught me about Bootstrap, a front-end design framework, that allowed us to make our Parallel Menus a little more interactive through Java Script. Finally, using the API key which Trevor obtained from NYPL, we were able to get thumbnail images of our menus from the “What’s on the Menu?” API service and insert them into our web code to entice the users of the taxonomy into browsing our site.

Next Steps

Clearly, my three weeks at MITH went by quickly, and I didn’t have a chance to work on the Dishes side of our WOTM data. Nonetheless, I kept documentation for my work as we went along, and even developed some categories and controlled vocabulary terms for organizing this much larger data set. Trevor will be teaching several Data Curation workshops in the coming months, and I hope that he will be able to expand on my practicum work by getting students to work on the Dishes part of this project. I hope that the “What’s on the Menu?” Data Curation project will eventually be handed over to NYPL to help make their site even more usable to the many visitors they receive each day. I would be honored if, upon handing it back to the New York Public Library, they used any of our suggestions with regard to organizing and classifying this data.

During my practicum, I appreciated participating both in the administrative duties of digital humanists, such as attending project meetings, policy and grant writing, and data audits as well as in more creative tasks such as attending talks at the Library of Congress, exploring new tools, and participating in the MITH Incubator project by helping librarians develop their own research projects. I especially valued the way Trevor combined the practical with the theoretical, getting me to think about the broader implications of the data curation process – its costs, its biases and limitations, its potential, its role in the creation of new knowledge. Working on this project gave me a whole new perspective on research data in the humanities, and I believe it will help me complete my thesis, as well as further my career. This was definitely a transformative experience.

Overall, I am incredibly grateful to have had the opportunity to put my skills and interests in digital humanities data curation at MITH with Trevor’s leadership. In addition to meeting the wonderful MITH team, many visiting scholars and the larger DC digital humanities community, I learned many exciting and useful things in my time at UMD: APIs, JQuery, LCSH Linked Data Service, OpenRefine and its RDF plug-in, TreeJack and tree testing, GitHub, Freebase, along with many others. I look forward to visiting MITH again in the future and launching DH curation projects of my own!