In 2008, in an article about managing recorded sound collections, I wrote this:
“In the United States, before the advent of sound recording devices, the earliest documented traditional heritage of indigenous American cultures was transcribed from first person accounts or oral history dialogues (Crawford 2001, 390). The trend of Westerners to study and understand Native American cultures grew throughout the 19th century. With the advent of sound recording equipment in 1877, a logical progression commenced from transcription to sound recording.
Today, after 130 years of the development of recording technology, it is daunting to imagine the sheer volume of recordings of traditional expression held in US libraries, museums, and archives. While it is reflexive to desire open access to these snapshots of the life and diversity of the human species, it is the responsibility of the stewards of these materials to protect the rights and privacy of the creators and owners of the captured expressions as well as their heirs and their affiliated communities.”
I would amend it today to recognize that the individuals and communities documented in these recordings should have at least equal footing in determining their disposition.
But I also want to focus on a point about open-access to information. The above is NOT an anti-open-access argument. It is a reminder for archivists and content managers that, with permission, information can be made open; yet, without permission, all information has creators who are not always visible in the information environment. These creators must be considered. They are present, always, in the archive.
I say this because of my awe of the great open access work being carried out at Harvard University’s DASH project. Permissions and deliberateness are central to open access concepts.
While I still work to bring IASA Journal writings into the public arena (currently, only IASA members and subscribers receive access to the Journal and its contents), I will continue my practice of posting my editorial here to the enjangada blog. Below is my editorial for the newest IASA Journal, Issue 44.
The Work of the Archive is Unending
I began my career as an archivist working in the archive of the late American folklorist and collector, Alan Lomax. My connection to the Alan Lomax archive was connected to my experiences as an independent sound engineer for various music groups in Memphis, Tennessee and other locales in the U.S. For over four years, I worked with colleagues to help bring order to the immensity of Lomax’s collections and to complete efforts to digitize the extent of the recordings that Lomax himself had collected over the course of his 60+ active years of documentary work.
Heavily steeped in the issues of sound archiving, I was led from this experience to acquire further education in a graduate program for Museum Studies at the University of Kansas in the United States. I had begun to see the richness that can be found in archives, but I wanted to see the bigger picture more clearly. I wanted to know more about the construction of archives, about the history, and about the theories that underlie the archival process. I began to read far and wide about archives — all types of archives — and I began to learn about the inherent politics of archives: the fact that memory can be created and erased within an archive, the fact that archivists themselves play central roles in the course of human social memory.
One archivist-author in particular spoke to me in his writings. This author had experienced the political complexity of archives firsthand, in a country that underwent a process of rapid and profound political change in the early 1990s. I learned from this archivist that archives can be agents for healing (individual and social) and for justice. Wrongs can be righted and hidden voices can be discovered within the archives. I also learned the value of looking at what archives do on the ground — the inner-workings, the people, the collections, the users — while also analysing what it is that archives symbolize in their locations — be it the community, the state, the country, or the world. That there is a multi-dimensional continuum from which archives can be measured.
At the IASA conference last year in Cape Town, South Africa, we all had the chance to hear from this author firsthand. His keynote spoke of ghosts in the archive, but for me I was hearing the words of a living South African archivist whose writings had been one of the foundational building blocks of my archival philosophies. Verne Harris, Director: Research and Archive, at the Nelson Mandela Foundation, opens this issue of the IASA Journal with a text adapted from his keynote at this year’s annual conference. I am honored to include Harris’ work in the IASA Journal. His text encourages archivists to reach out continually to those who are absent from the archive and to be aware that what is absent is always present, reminding us that there is much to be done to ensure that our archives are serving their intended publics. For Harris, and many others, justice is a central archival purpose. Harris’ text concludes reminding us that the fight for justice is ongoing.
This issue of the IASA Journal addresses another central challenge for archives today: the rush to preserve sound and audiovisual recordings before they are lost to obsolescence and degradation. This is not only a threat to physical objects. If we agree that archives serve as a source for social memory and heritage, then this is a definite threat to our global memory and heritage — and this fight is also ongoing.
Michael Casey, from the University of Indiana in the United States, is not only a proponent for organized and intentional action on behalf of archives to salvage recorded heritage but he has led the fight in the U.S. to secure institutional funding for such activity. And at IU, Casey and colleagues succeeded in mounting one of the largest institutionally-funded audiovisual preservation efforts to date. Casey’s call to action in this issue delineates the obstacles, articulates the solutions, and provides evidence of archives in the U.S. who are actively engaged in the process of programmatic reformatting.
New Zealand suffered from severe earthquakes in 2010. Marie O’Connell and her colleagues at the New Zealand Archive of Film, Television, and Sound were faced with the task of putting the archive back together after the tremors dislocated and shuffled the contents of the archive into a complete disarray. O’Connell shares her experiences and offers a reminder that even as obsolescence and degradation threaten audiovisual collections, we must make sure our disaster preparedness plans are up to date and that they take into account the most likely disasters that could occur in the archive’s vicinity.
This issue of the journal includes two articles from the NOA team in Austria. I think they are important articles to include because they propose arguments for two issues that are currently looming in the audiovisual archive world. In the context of broadcast archives, the first article (Kummer, Kuhnle, and Gabler) proposes a balance between production and preservation in terms of digital file management, specifically that of digital video files. With IASA-TC 06 in
process, this article provides useful arguments for scenarios when an archive might select an ffv1 codec over uncompressed or jpeg2000. The authors also argue for the use of AVI as an interim storage wrapper for broadcast archives. The second NOA contribution comes from Sebastian Gabler, who presents a method for managing digital time-based audiovisual assets using metadata at an abstracted level. In Gabler’s view, for access purposes, an archive needs only to work from one set of digital files. Access to content at full duration, in segments, or any other combination or slicing of files can be provided through a combination of well-designed metadata and automated file processing.
Building on the theme of degradation to audiovisual materials, specifically magnetic media, this issue offers two scientific studies of methods to combat Soft-Binder-Syndrome (SBS) in magnetic tape. Enric Giné Giux from the Sonology Department at the Escola Superior de Música de Catalunya in Barcelona, Spain, provides a study on a batch of 500 compact cassette tapes from the
collection of late pianist Alicia de Larrocha. Giné Giux’s article illustrates a re-lubrication method that provided successful outcomes for digitization of the audio contained on the cassettes. Federica Bressan, Sergio Canazza, and Roberta Bertani, all from the University of Padova, Italy, provide a careful evaluation of the effects that thermal treatment (i.e., baking) has on magnetic tape stocks. Although the authors do not condemn the process, they provide evidence that there are risks to the process and that there is not a “one-size-fits-all” recipe for thermal treatment.
Reporting on migration efforts in southern Africa, Thandie Puthologo and Ruth M. Abankwah, at the University of Namibia, have written a comparative analysis of the migration process from analogue to digital formats for broadcasting at Botswana Television and at the Namibian Broadcasting Corporation. This article largely evaluates the readiness of viewers to receive digital vs. analog broadcast signals. Although slightly tangential from the archives trajectory, the paper is important because it reminds us that technological changes in the generation and delivery of audiovisual content have ramifications at all stages of the access cycle, even within the homes of everyday citizens. Analog obsolescence, in this case, is unavoidable for TV viewers.
Back in the archive, Austin Schultz, from the Oregon State Archives in the U.S., reports from first-hand experience with technological obsolescence — The Sawyers Rols dictation machine, to be precise. Schultz and his colleagues were faced with zero access to over 1,400 Rols audio recordings. Read Schultz’s article to see how they overcame obsolescence to provide access to the first 20 Rols tapes, and what their plans are for the remainder.
Not all audiovisual archives are broadcast oriented, nor are they completely audiovisual oriented. Actually, most archives are of the sort that contain and provide access to a mixed array of content, including manuscripts, photographs, monographs, sound recordings, films, videos, 3-D materials, and any other type of documentary medium one can imagine. In these types of archives, historically, audiovisual content has been pushed to the side, overlooked, or hidden, as Megan McShea suggests in her contribution in this issue. McShea, of the Smithsonian Archives of American Art in the U.S., shares the results of a three-year project carried out at the Smithsonian
to investigate methodologies for processing mixed-media collections with more efficiency and with assurance that audiovisual content receives equal attention and coverage in the process. Additionally, as appendices, McShea provides thorough documentation for processing audiovisual material in mixed-media collections that can be useful to archives looking to improve processing times, minimize backlog, and improve access.
Wrapping up this issue of the IASA Journal is an article from a recent winner of IASA’s Research Grant Award, Gustavo Navarro. Navarro provides a report of his IASA-sponsored work to document and preserve recorded histories of the inhabitants of Southern Patagonia in Argentina in collaboration with the Municipal Archives of the province of Santa Cruz. Navarro’s report includes documentation of his work to collaborate with the Municipal Archives as well as an overview of how the project partners decided to provide access to the recordings that were created and preserved during his project.
As Editor, I want to express my thanks to all the contributors to this issue. The journal received enough expressions of interest for this issue that I was forced to request that twelve articles be placed on hold until the next issue (Issue 45). This is a great problem to have and I hope that the IASA community continues to desire to publish work in the journal. As I have said in past editorials, the IASA Journal is a mouthpiece for the audiovisual archives community and all are welcome to contribute. It is here that we can continue to engage in discourse about important contemporary issues, share information about ongoing activities, and philosophize about what the future holds.
Bertram Lyons, CA
IASA Journal, Editor
Currently, I have the honor of editing the Journal of the International Association of Sound and Audiovisual Archives (IASA). It is a real privilege to work with many wonderful authors and archivists from around the globe on a regular basis. Every issue of the journal is a learning experience for me and I deeply appreciate the opportunity to shepherd it along. The only sadness I feel with this responsibility is the fact that the journal is available only to members and via subscription. It is not yet open-access, which means that often I cannot easily share interesting content from the articles with my archives friends and colleagues. I understand the arguments on both sides of this issue and I am pretty hopeful that sometime in the near future there will be other options for wider access to the content of the IASA Journal. In the meantime, I wanted to share elements of the editorial I wrote for the most recent issue of the journal—issue 43, July 2014.
Editorial, IASA Journal, Issue 43 | Bertram Lyons, Editor | July 2014
Recently I have been in conversations with colleagues from small and medium archives in the US where the term “post-digitization” has been in focus. What does it mean to be in a state of post-digitization? In these conversations, post-digitization is the state of being that follows after an archive has met the rush to digitize its holdings, has established the sustainability of its digital assets at least at the bit-level, and has created descriptive access points and mechanisms for access to the content. In this state of being, an audiovisual archive has accomplished the goals that have been set in place over the past decades—assess, describe, digitize, store, preserve, make available. Post-digitization is the activity of questioning, What happens next? What are the next steps for an archive as they look to the future? This is a question that some organizations have the privilege to begin considering now—organizations that began their digitization journey in the early years of this century.
The future is long. If an archive follows best practices towards sustainable digital preservation of its collections, of course one might expect that the archive will commit to a continual future of obsolescence monitoring (ensuring that formats are accessible and migrating to new formats when at risk), fixity monitoring (verifying that unintended changes to the files in the archive are not occurring and taking action when changes are identified), technology refreshing (updating digital storage technologies as older technologies begin to fail or become obsolete), and, of course, continuous ingest of new digital content into the archive in consistent and manageable ways. But digital preservation is only one piece of the puzzle; continuous access into the future is another topic altogether. I have noticed over the past few years a serious gap in our discourse about the future of audiovisual collections: what are the protocols for keeping pace with the rapidly changing technologies that constitute the surface of the internet? What are we doing to maintain relevance through access?
The Internet, in many ways, has become the intended future location of the Reading Room. We talk about digitization as an opportunity; it is because we see an opportunity for access via the Internet. In some ways, the semantic web is an answer to the question I pose; but it is not the complete answer. Yes, the post-digitization future is about improved interoperability of data online. The persistence of the connections that we create between collections on the internet is of immense importance. An element of post-digitization will be to join in the development of this larger network of connected knowledge. For many people, however, that is an abstract goal because the technologies that support such infrastructure seem to change at a rapid rate. The technologies that constitute the ecology of the internet—markup languages, coding languages, abstracted database layers, operating systems, digital asset management systems, application programming interfaces—advance at such speed that post-digitization archives wonder how they will ever keep up. How will they keep from becoming obsolete themselves? The plight of post-digitization archives is to stay relevant in the information society by ensuring consistent technological upgrades to their new reading rooms in order to continue providing quality access, upgrades which invariably relate to their institutional web presence. My colleague, Ed Summers at the Library of Congress, recently spoke to the National Digital Forum in Wellington, New Zealand and stressed that “if you are not providing meaningful access in the present to digital content, then you are not preserving it.”
Of course it is a luxury to have this problem already in 2015 as we look down the barrel of a loaded gun called technological obsolescence. For the majority of audiovisual archives across the globe the issue is not one of web technologies—the issue is ultimately a race to digitize the international audiovisual record before it becomes impossible or unaffordable to do so. But today and for the next fifteen years the greatest challenge facing sound and audiovisual collections globally is not exactly format obsolescence and degradation. For now, at a broad level, the field has determined, with fairly unanimous agreement, the best methods and strategies to overcome obsolescence. Caring for physical collections is understood and well-documented. Digitization practices are mature and an entire industry now offers both boutique and high-throughput digitization services for the cultural heritage community. Learning from banks and the information-heavy corporations of the world, archives are equipped with the necessary role models for building, staffing, and sustaining digital repositories worthy of carrying our sound and audiovisual heritage into the future, that is, until we meet our next technological shift. Today, as a community, we are not burdened with the ignorance of how we should proceed or what we should do to save our collections. Today, one of our greatest challenges is determining how we are going to afford to do what we know we need to do to save our vanishing recordings and how we can communicate those needs into arguments that compel people—funders, stakeholders, ourselves, and our colleagues—to act. Funding for digitization, funding for building the necessary digital infrastructure to secure the outcomes of digitization (in addition to the born-digital sound and audiovisual heritage being acquired now and into the future), and the arguments to compel action—these are real obstacles.
The field is in need of tools that help quantify the problems we face and that translate the needs of audiovisual preservation into the rhetoric of business analysis. Administrators, executives, and potential funders need economic arguments surrounding audiovisual preservation that demonstrate why the money must be allocated for digitization and digital preservation and why inaction will result in real financial loss. The opening two articles in this issue of the IASA journal address this issue head on. Chris Lacinak, of AVPreserve in the US, introduces a new tool and a new framework to support convincing funders and administrators that the cost of digitization is of real value to the organization. “The Cost of Inaction” is Lacinak’s answer to a common financial phrase, Return on Investment, and Lacinak offers a new free online tool that can be used to generate graphs and statistics to articulate an organization’s potential loss of investment if digitization is not undertaken. Marcos Sueiro Bal, Senior Archivist at New York Public Radio, brings a theoretical framework to this conversation that uses three factors—signal-to-noise ratio, cost of extraction, and time—to offer logical evidence that “delaying signal extraction amounts to a less effective use of resources.” Together, these two opening articles empower us to advance our arguments for funding for digitization and to ensure the preservation of the international audiovisual record.
In relation to my earlier argument about post-digitization, Guy Maréchal, of the non-profit organization TITAN in Belgium, offers an article in this issue that places the concept of semantic technologies in the context of IASA. Maréchal’s text is a call for further awareness of web technologies and for an adoption of their use by audiovisual archives. IASA has an opportunity to set the foundation for audiovisual archives worldwide, and Maréchal suggests that IASA is entering a third phase of its existence—one where we have already completed the activities of identifying methods for digitization, physical care, cataloging, ethical use of collections, managing digital formats, and storage; one where we can now focus on setting guidelines for semantic interoperability of objects, subjects, and their relationships.
Description is not always for open access, though. As we know, we need various forms of descriptive and technical information to manage collections internally on a day-to-day basis. At the University of Illinois at Urbana-Champaign in the US, John Gough and Myung-Ja K. Han recently completed an effort to define a campus-wide protocol of required and optional metadata elements that can be used to consistently document audiovisual holdings of the University for preservation. Gough and Han also offer an overview of useful tools for automated metadata extraction to support efficient generation of metadata for large quantities of audiovisual content.
As I so strongly wrote above that we have little further to learn about digital preservation, I must admit that I am aware that I am necessarily exaggerating for effect, because we all know that we never quite know everything about anything. Daniel Teruggi and Luca Bagnoli of the Presto4U project remind us that there is still much work to be done in the audiovisual digital preservation domain, especially with regard to recordings created in music production environments. The “Music and Sound Archives Community of Practice” of the Presto4U project is very interested in working closely with music archives directly related to production because the content of these archives “present a general preservation problem due to the complexity of the production environment and the economical implications this may have for their activity.” Teruggi and Bagnoli report that action is being taken to propose solutions to this issue and that we can expect findings to be presented at conferences this year.
From opposite ends of the globe, this issue of the IASA journal brings forward two articles that discuss a post-custodial effort to open the archive to new agents and to new content, focusing on sensitivity and control. At Universiti Putra Malaysia (UPM), the Audiovisual Research Collection for Performing Arts (ARCPA) team—Ahmad Faudzi Musib, Gisa Jänichen, and Chinthaka Prageeth Meddegoda—discuss the experience of building a music archive in an environment where many colleagues had to be convinced of the value of preserving and providing access to sound recordings. At UPM, the ARCPA team brought collectors into the archive and allowed collectors to describe their own collections, removing the control of description from the hands of archivists and librarians. Most importantly, the ARCPA team reminds us that ethical treatment of performers in description is of continued importance, and that it is imperative that archival principles be integrated in as equal of a way as library information has been integrated into undergraduate and graduate studies of tertiary educational institutions in order to ensure a future appreciation of the importance of archives.
Meanwhile, at the University of Washington, in Seattle, Washington in the US, John Vallier is opening the academic library to born-digital Rock and Roll and grappling with the revolutionary act of being an archivist who is bringing alternative music into the forefront of library and archive collections while simultaneously struggling to comprehend the revolt of the fans who want unfettered, unchained access to the content. See how things turn out in Vallier’s engaging article in this issue.
Wrapping up this issue is an overview by JA Pryse of the Oklahoma Historical Society of contemporary efforts to reach out to the community through the development of a crowd-sourced description program. The Clara Luper Pilot Program proved to be “an inexpensive and effective manner to provide detailed and accurate data for user retrievability.” Pryse’s text offers a potential model for efficient and cost-effective description using community resources.
As is apparent through the texts in this issue, the IASA community is alive with activity, building tools and frameworks to invite greater funding of our collections, standardizing our descriptive practices for internal preservation and access and for external interoperability, continuing to identify at-risk content for digital preservation, opening the doors of the archives to new collections and new users, and strengthening connections between the archive and its communities. The next issue of the journal, issue 44, will likely cover highlights of this year’s annual conference in Cape Town, South Africa. However, I invite you also to think about my proposition that we are nearing a new phase of the audiovisual archive within the next 10-15 years: the post-digitization phase. When your archive has met the goals of stabilization, digitization, and preservation, what will be the next set of goals? In what areas should we begin to focus, and how will we get there? The deadline for submissions is 31 October 2014. Please consider sharing your work or your research with the IASA community. All are welcome to submit proposals to this, your IASA journal.
This month I had the privilege of leading a one-week intensive course on web and social media archiving for the SLIS at the University of Wisconsin – Madison.
What interests me most in this topic is that websites and social media are inherently digital. To archive web and social media requires a basic understanding of digital information writ large (e.g., formats, processes, and information structures). In order to collect content from social media websites and applications it is imperative that archivists understand the information architectures that support their use and development as well as the data systems that store and manage the underlying information. Websites and social media are also inherently about change and about time. There is a temporal element to the preservation of such content. We can look at the changing nature of such content in the way that we might document the changing landscape of a street corner and the surrounding businesses or buildings. Documenting the corner today may yield a different result than it will tomorrow, or next week, or next year. A primary consideration, then, is the frequency of collecting, which requires planning and a clear effort to document the context/the provenance of each capture.
In the class, I made a concerted effort to teach skills—command line scripting, working with web crawlers, quantifying data at the bit and file levels, understanding WARC as a file format, using and understandings APIs, creating checksums by hand, and using packaging tools to create checksums in batch and to perform automated verification—and to combine those skills with coverage of general digital preservation issues, especially those specifically related to the acquisition and preservation of web and social media data. I coupled selected readings with in-person discussions, and students worked alone and in small groups to complete daily projects that introduce technical skills and reinforce intellectual concepts.
Being an archivist, I also thought it was important to establish an ontological framework for this intensive. Since this was based in a university setting, we used the SAA Guidelines for College and University Archives as a reference point for core archival functions and expectations. Such guidelines and institutional frameworks separate the work of archives from that of academic digital humanities, records management, software development, or commerce, all areas where web and social media data are of great interest. Using core archival functions as guideposts, we worked through the process of developing a comprehensive archival approach to the activity of collecting web and social media content.
Thanks to Jefferson Bailey, I was able to peruse an aggregation of over 50 digital preservation syllabi from the past 5 years in preparation for the reading list. I culled the lists down to what I thought were the most important digital preservation resources and the most salient web archiving and social media archiving publications or presentations. I’m including a link here to a download of the entire syllabus, for those who have interest: http://bit.ly/1sp7m0K
Below, is a “best of” from each day of readings and activities, if you just want a quick look at the approach I took to the course:
Day 1 | June 2nd, Monday
The story of the Twitter archive at LC | Introduction to the theme
Maureen Pennock. March 2013. Web-Archiving: DPC Technology Watch Report 13-01. http://dx.doi.org/10.7207/twr13-01.
Lyman, Peter. “Archiving the World Wide Web” in Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving. Washington, DC: Council on Library and Information Resources, 2002, pp. 38-51. http://www.clir.org/pubs/reports/pub106/web.html
John, Jeremy Leighton, Ian Rowlands, Peter Williams, and Katrina Dean. “Digital Lives: Personal Digital Archives for the 21st Century >> an Initial Synthesis.” 2010. [Read: pages vi-xviii] http://britishlibrary.typepad.co.uk/files/digital-lives-synthesis02-1.pdf
U.S. National Archives and Records Administration. “Guidance on Managing Records in Web 2.0/Social Media Platforms,” October 20, 2010, http://www.archives.gov/records-mgmt/bulletins/2011/2011-02.html.
Smithsonian Institution Archives Blog. “To preserve or not to preserve: Social Media.” 2012. http://siarchives.si.edu/blog/preserve-or-not-preserve-social-media.
Society of American Archivists. “Archiving Social Media in Senators’ Offices.” 2012. http://www2.archivists.org/sites/all/files/Archiving_social_media_senators_apx2_drft.pdf.
National Digital Stewardship Alliance / Library of Congress. “Keeping Personal Websites, Blogs and Social Media.” 2012. http://www.digitalpreservation.gov/personalarchiving/websites.html.
Farrell, Susan ed. “A guide to web preservation.” 2010. http://jiscpowr.jiscinvolve.org/wp/files/2010/06/Guide-2010-final.pdf
Toyoda, M., Kitsuregawa, M. (2012). “The History of Web Archiving”. Proceedings of the IEEE, 100 (special centennial issue). doi:10.1109/JPROC.2012.2189920. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6182575
Prom, Chris. “Facilitating the Generation of Archives in the Facebook Era.” 2012. http://e-records.chrisprom.com/draft-facilitating-archives-in-facebook-era/.
O’Sullivan, Catherine. “Diaries, On-line, Diaries, and the Future Loss to Archives; or, Blogs and the Blogging Bloggers Who Blog Them.” American Archivist 68 (Spring/Summer): 53-73, 2005. http://archivists.metapress.com/content/7k7712167p6035vt/.
JISC-PoWR Team. PoWR: The Preservation of Web Resources Handbook. 2008. [READ SELECTIVELY] http://www.jisc.ac.uk/media/documents/programmes/preservation/powrhandbookv1.pdf.
Hoffman, Starr. “Preserving Access to Government Websites: Development and Practice in the CyberCemetery.” World Library and Information Congress: 74th IFLA General Conference and Council (10-14 August 2008, Québec, Canada). http://www.ifla.org/IV/ifla74/papers/130-Hoffman-en.pdf.
Glenn, Valerie D. (2007) ‘Preserving Government and Political Information: The Web–at–Risk Project’, First Monday, v.12 no.7: http://journals.uic.edu/ojs/index.php/fm/article/view/1917/1799
Digital Preservation Coalition – handbook on Web Archives: http://www.dpconline.org/advice/web-archiving
Europe’s Blog Forever project has an interesting repository design paper: https://zenodo.org/record/7494/#.U1W75uZdXv0
Madhava, Rakesh, “10 things to know about preserving social media”, 2011, ARMA (from the perspective of a Records Manager), accessible at http://content.arma.org/IMM/September-October2011/10thingstoknowaboutpreservingsocialmedia.aspx.
National Archives and Records Administration. “2004 Presidential Term Web Harvest.” 2005. http://www.webharvest.gov.
PADI Web Archiving Sections 1 and 2; dip into section 3: http://www.nla.gov.au/padi/topics/92.html
Day 2 | June 3rd, Tuesday
Behind the scenes | What are we archiving?
Internet Archive. “Wayback Machine Hits 4,000,000,000 web pages.” 2014. http://blog.archive.org/2014/05/09/wayback-machine-hits-400000000000/.
Kahle, Brewster. “Preserving WordPress Blogs.” Video. 2013. http://wordpress.tv/2013/08/26/brewster-kahle-internet-archive-and-preserving-wordpress-blogs/.
Hedstrom, Margaret and Christopher A. Lee. “Significant properties of digital objects: definitions, applications, implications.” In Proceedings of the DLM-Forum 2002, Barcelona, 6-8 May 2002 , 218-227. Luxembourg: Office for Official Publications of the European Communities, 2002. http://www.ils.unc.edu/callee/sigprops_dlm2002.pdf
Understanding JSON: http://code.tutsplus.com/tutorials/understanding-json–active-8817.
Perez, Sarah. “This is What a Tweet Looks Like.” 2009. http://readwrite.com/2010/04/19/this_is_what_a_tweet_looks_like#awesm=~oE6AvtuJaYBlvl.
Internet Archive Frequently Asked Questions. First section: “The Wayback Machine.” http://archive.org/about/faqs.php#The_Wayback_Machine.
Baker, Mary, Kimberly Keeton, Sean Martin. “Why Traditional Storage Systems Don’t Help Us Save Stuff Forever.” 2005. http://www.hpl.hp.com/techreports/2005/HPL-2005-120.pdf.
Brown, Adrian. “Selecting File Formats for Long-Term Preservation.” Digital Preservation Guidance Note 1. London: The National Archives, August 2008. http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf
Caroline R. Arms and Carl Fleischhauer.“Sustainability of Digital Formats: Planning for Library for Congress Collections.” [Familiarize yourself with this website.] http://www.digitalpreservation.gov/formats/index.shtml.
Kirschenbaum, Matthew G., Richard Ovenden, and Gabriela Redwine. “Digital Forensics and Born-Digital Content in Cultural Heritage Collections.” Washington, DC: Council on Library and Information Resources, 2010. http://clir.org/pubs/reports/pub149/pub149.pdf
Washington State Government. “Records Management Advice for Blogs, Twitter, and other social media accounts.” 2013. http://www.sos.wa.gov/_assets/archives/RecordsManagement/Blogs-Twitter-and-Managing-Public-Records-Nov-2013.PDF.
InSPECT, “Investigating the Significant Properties of Electronic Content over Time,” 2009, http://www.significantproperties.org.uk/inspect-finalreport.pdf
Timmer, John. “Preserving science: what data do we keep?” http://arstechnica.com/science/2010/11/preserving-science-choosing-what-data-to-discard/.
1) Compare/contrast social media archive exports: Follow Twitter and Facebook instructions to download your personal archive. Evaluate the results. We will all discuss the type of data these exports return, the quantity of files, the file and data types, and the method of access provided by the packages.
2) Use your browser’s Save As feature to archive a complex Web page, such as The New York Times home page. Or choose a URL on the Internet Archive’s Wayback Machine. Compare the file structure of the original and archived version. Operating your computer offline, try to reconstruct the page in its original form, and explain what if any obstacles you encountered. We will discuss together in class.
1) Data analysis: Download three data packages provided to you by the instructor. Answer questions about each dataset.
– DP1 (csv): file count, file sizes, file types, creation dates, record count
– DP2 (xml): file count, file sizes, file types, creation dates, record count
– DP3 (website): file count, file sizes, file types, creation dates, record count
Day 3 | June 4th, Wednesday
Technology and Tools | How do we do it?
Grotke, Abbie. NDSA National Agenda Digital Content Area: Web and Social Media. 2014. http://blogs.loc.gov/digitalpreservation/2014/01/ndsa-national-agenda-digital-content-area-web-and-social-media/
Grotke, Abbie. NDSA National Agenda Digital Content Area: Web and Social Media, 2014. http://blogs.loc.gov/digitalpreservation/files/2014/01/NDSACWG_WebSocialMedia_Overview_Grotke.pdf.
Internet Archive. Challenges of Collecting and Preserving the Social Web. 2013. http://blogs.loc.gov/digitalpreservation/files/2014/01/NDSA_CWG_120413_Carpenter.pdf.
UK National Archives. Social media archiving policy Press Release. 2014. http://blog.nationalarchives.gov.uk/blog/archiving-social-media/ & http://www.natiohttp://www.nationalarchives.gov.uk/news/929.htmnalarchives.gov.uk/news/929.htm
Wikipedia. “Web crawler.” http://en.wikipedia.org/wiki/Web_crawler.
International Standards Organization. ISO 28500. WARC format specification. http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf.
Milligan, Ian. “WARC Files: A Challenge for Historians and Finding Needles in Haystacks. 2012. http://ianmilligan.ca/2012/12/12/warc-files-a-challenge-for-historians-and-finding-needles-in-haystacks/.
SAA Web Archiving Roundtable. Guest post by Alex Duryee on Web Archiving, 2013. http://webarchivingrt.wordpress.com/2013/05/07/113/.
SAA Web Archiving Roundtable. Guest post by Nicholas Taylor: Personal Digital (web) Archiving, 2014. http://webarchivingrt.wordpress.com/2014/04/18/personal-digital-web-archiving-guest-post-by-nicholas-taylor/.
Reyes Ayala, Brenda. “Web Archiving @ UNT: Web Archiving Bibliography 2013.” 2013. http://digital.library.unt.edu/ark:/67531/metadc172362/.
Conversations about Archives Working with Writers to Preserve Their Social Media Content. 2013 Archives Next blog. http://www.archivesnext.com/?p=3691.
Archive-it blog: http://blog.archive-it.org/tag/archive-social-media/.
Archive-it help pages (browse to see issues associated with using Archive-it): https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=3113092
1) Using the command line interface (CLI) on your personal computer (mac and unix/linux environments)
Navigation (pwd, cd)
Printing ( >/)
Create directories/files (mkdir)
Read files (vi or cat)
Delete files (rm or rmdir)
2) Compare/contrast web archiving tools/strategies
– wget (http://en.wikipedia.org/wiki/Wget) (manual: http://www.gnu.org/software/wget/manual/) also, need homebrew (homebrew wget install) to install on mac os (examples of commands: http://www.tecmint.com/10-wget-command-examples-in-linux/ and http://www.kossboss.com/linux—wget-full-website)
Use Ian Milligan’s three step instructions to create WARC file and analyze it. (http://ianmilligan.ca/2012/12/13/warc-files-part-two-using-warc-tools/)
Day 4 | June 5th, Thursday
Making a case for selection | Why do we collect?
National Archives and Records Administration. “White Paper on Best Practices for the Capture of Social Media Records.” 2013. http://www.archives.gov/records-mgmt/resources/socialmediacapture.pdf.
Lynch, Clifford. “Authenticity and Integrity in the Digital Environment: An Exploratory Analysis of the Central Role of Trust.” In Authenticity in a Digital Environment Council on Library Resources, 2000. http://www.clir.org/pubs/reports/pub92/lynch.html.
Duranti, Luciana and Kenneth Thibodeau. “The Concept of Record in Interactive, Experiential and Dynamic Environments: the View of InterPARES,” Archival Science 6(1): 13-68, 2006. http://www.interpares.org/ip2/display_file.cfm?doc=ip2_book_appendix_02.pdf.
Hirtle, Peter B. “The History and Current State of Digital Preservation in the United States.” In: Metadata and Digital Collections: A Festschrift in Honor of Thomas P. Turner. Ithaca, NY: Cornell University, 2010, pp., 121-140. http://cip.cornell.edu/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=cul.pub/123860930.
Hirtle, Peter B. “Digital Preservation and Copyright. 2003. http://fairuse.stanford.edu/2003/11/10/digital_preservation_and_copyr/.
Coyle, Karen. “Rights in the PREMIS Data Model: A Report for the Library of Congress.” Washington, D.C.: Library of Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
O’Brien, Jeff. “Electronic records: Basic concepts in preservation and access.” 1998. http://scaa.usask.ca/e-paper.html.
Rothenberg, Jeff. “Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation.” Washington, D.C.: CLIR, 1999. http://www.clir.org/PUBS/reports/rothenberg/pub77.pdf.
Thibodeau, K. Overview of technological approaches to digital preservation and challenges in coming years. In The state of digital preservation: An international perspective (pp. 4- 31). Washington, DC: Council on Library and Information Resources, 2002. http://www.clir.org/pubs/reports/pub107/pub107.pdf.
Beagrie et al. “Digital preservation policies study.” 2008. http://www.jisc.ac.uk/media/documents/programmes/preservation/jiscpolicy_p1finalreport.pdf.
1) Continued use of tools:
– HTTrack (http://www.httrack.com/page/2/ – mostly windows only), on mac os can homebrew httrack install (httrack -h gets you help and examples in terminal)
– Heritrix (internet archive tool) -only for Linux really (difficult to manage on pc or mac) that’s why:
– web based options: http://archive.today/ (you can download a zip from the output – but it only goes ONE page deep)
-WARCREATE – http://warcreate.com/ (Chrome only) – will create a WARC file but no way to look at it and only goes one page deep as far as i can tell
– Web Archiving Integrated Layer – WAIL – http://matkelly.com/wail/
2) Exploring APIs:
-Using Glyphy API: https://github.com/giphy/GiphyAPI
– Using the DPLA API – learning APIs challenge: (Scalar API Explorer: http://scalar.usc.edu/tools/apiexplorer/ )
If you’re feeling very advanced, look at the instructions here to setup a very useful tool for grabbing Twitter feeds:
Social Feed Manager (Dan Chudnov’s software) http://dicarve.blogspot.com/2014/04/an-relatively-easy-way-for-installing.html
CodeAcademy on Twitter API – http://www.codecademy.com/tracks/twitter
CodeAcademy on Soundcloud – http://www.codecademy.com/tracks/soundcloud
Day 5 | June 6th, Friday
Sustaining the collection | Ensuring preservation and understanding the costs
Day, M. “The long-term preservation of Web Content.” In J. Masanes (Ed.), Web Archiving. Berlin: Springer, 2006. http://www.ukoln.ac.uk/preservation/publications/2006/webarchiving/md-final-draft.pdf.
Archive-It. “The Web Archiving Lifecycle Model.” 2013. https://archive-it.org/static/files/archiveit_life_cycle_model.pdf.
Richard Wright, Ant Miller, and Matthew Addis. “The Significance of Storage in the ‘Cost of Ris’’ of Digital Preservation.” International Journal of Digital Curation 4/3 (2009).http://www.ijdc.net/index.php/ijdc/article/view/138
McGovern, Nancy Y., Anne R. Kenney, Richard Entlich, William R. Kehoe, and Ellie Buckley. “Virtual Remote Control: Building a Preservation Risk Management Toolbox for Web Resources.” D-Lib Magazine 10, no. 4 (2004). http://dlib.org/dlib/april04/mcgovern/04mcgovern.html.
Sheldon, Madeline. “Digital preservation policies analysis.” 2013. http://blogs.loc.gov/digitalpreservation/2013/08/analysis-of-current-digital-preservation-policies-archives-libraries-and-museums/.
Besser, Howard. “Archiving Occupy Movements.” VIdeo. 2013. http://vimeo.com/43603604.
BagIt specification. http://www.digitalpreservation.gov/documents/bagitspec.pdf. [Please read the specification and try to understand how it works.]
The Blue Ribbon Task Force on Sustainable Digital Preservation and Access. “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information.” 2010. http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf Read Executive Summary; Browse the rest of the document. See also the BRTF website: http://brtf.sdsc.edu/about.html
Jantz, Ronald and Michael Giarlo. “Architecture and Technology for Trusted Digital Repositories. “ D-Lib Magazine, 2005. http://www.dlib.org/dlib/june05/jantz/06jantz.html.
California Digital Library. “Guidelines for Digital Objects.” http://www.cdlib.org/inside/diglib/guidelines/.
CCSDS Reference Model for an Open Archival Information System (OAIS), pp. 10-90 only < http://public.ccsds.org/publications/archive/650x0b1.pdf >
Ingest and Data Migration challenge:
Using checksums: create a checksum for a file. Change the file. Create another checksum for the changed version of the original file. Compare the two checksums.
On mac: md5 [filename]
Create a BagIt bag using Bagger or the BagIt library. We will create bags and dissect them in order to understand their structure. (http://sourceforge.net/projects/loc-xferutils/files/loc-bagger/2.1.3/bagger-2.1.3.zip/download)
Strongly recommended prerequisite readings/activities (at least skim these resources)
Preserving contemporary news applications in the news: http://www.pbs.org/mediashift/2014/04/future-proofing-news-apps/
Read through the Cornell tutorial on Digital Preservation Management: http://dpworkshop.org/dpm-eng/eng_index.html.
Read Ed Summers’ talk here (Web as a Preservation Medium): http://inkdroid.org/journal/2013/11/26/the-web-as-a-preservation-medium/.
Follow the Brooklyn Museum & Flickr takedown conversations… Ed Summers has had much to say: http://inkdroid.org/journal/.
Read basic information about how the internet works: http://www.w3.org/wiki/How_does_the_Internet_work#Introduction.
Read basic information about HTML (wikipedia): http://en.wikipedia.org/wiki/HTML.
Read Dave Raggett’s introduction to HTML: http://www.w3.org/MarkUp/Guide/.
Read basic information about CSS (wikipedia): http://en.wikipedia.org/wiki/CSS.
Read tutorial on CSS (HTML Dog): http://htmldog.com/guides/css/beginner/.
Read basic information about web archiving (wikipedia): http://en.wikipedia.org/wiki/Web_archiving.
ADDITIONAL GENERAL RESOURCES:
SAA Core Archival Functions (http://www2.archivists.org/node/14804)
Digital Curation Centre: http://www.dcc.ac.uk/resource/curation-manual/chapters/
National Digital Information Infrastructure and Preservation program, http://www.digitalpreservation.gov/
Research Libraries Group, RLG DigiNews, http://www.rlg.org/preserv/diginews/
Digital Preservation Europe: http://www.digitalpreservationeurope.eu/
Digital Preservation Coalition: http://www.dpconline.org/
Preserving Access to Digital Information (PADI), http://www.nla.gov.au/padi/
InterPARES (International Research on Permanent Authentic Records in Electronic Systems), http://www.interpares.org
SAA Web Archiving Roundtable website (http://webarchivingrt.wordpress.com/)
International Internet Preservation Consortium website (http://www.netpreserve.org/)
Web Archiving at the Library of Congress (http://www.infotoday.com/cilmag/dec11/Grotke.shtml/)
In a 2008 study, Geoffrey Yeo, semantically exploring archives, defines records as “persistent representations of occurrents” and as “species of representation” recognizing that there is “no claim made that records are in any sense perfect.”  Occurrents for Yeo imply a temporal entity, and it is important for Yeo to state that records are only one type—species—of representation, explicitly reminding us of their imperfection as representations of past occurrents. Ultimately, for Yeo, records reflect past occurrents; and perceptions of records are diverse, multifaceted, and dependent on cultural and temporal experience. These notions complicate the perceived naturalness of archival records. Archives are not only incomplete, but also differ from what we imagine them to be. Heather Beattie explicates this idea recognizing in her critique of archival description that “the custody that a diary passes through on its way to an archives and any other material that accompanies it or indeed that is not passed along form part of its provenance. This is because the record that arrives in the archives is not necessarily the same as the record that was originally created.”
Because of the nature of this paradox in archival thinking—archives are organic, yet incomplete; fixed, yet mutable—theorists and critics imply that the archive is not quite fully organic. They note that archives are by nature incomplete, yet that archives are organic unities with natural relationships to their creator(s). Geoffrey Yeo demands, “Consider the paradox emerging from recent literature, where records are seen as both fixed and mutable, as providing stable evidence of past events, but also constantly evolving over time.”
Critics hint at this paradox in their writings, which are usually focused on imperfections of the archival system or the records, but they commonly ignore the possibilities that the paradox lies not in an inconsistency or impossibility on behalf of archives, but in the very nature of interpretation and absence themselves. Take for instance, Carolyn Steedman’s suggestion that “nothing starts in the Archive, nothing, ever at all, though things certainly end up there. You find nothing in the Archive but stories caught half way through: the middle of things; discontinuities.” Steedman continues her analysis reflecting on Derrida, Freud, and human memory:
But the problem in using Derrida discussing Freud in order to discuss Archives, is that an Archive is not very much like human memory, and is not at all like the unconscious mind. An Archive may indeed take in stuff, heterogeneous, undifferentiated stuff…texts, documents, data…and order them by the principles of unification and classification. This stuff, reordered, remade, then emerges—some would say like a memory—when someone needs to find it, or just simply needs it, for new and current purposes. But in actual Archives, though the bundles may be mountainous, there isn’t in fact, very much there. The Archive is not potentially made up of everything, as is human memory; and it is not the fathomless and timeless place in which nothing goes away that is the unconscious. The Archive is made from selected and consciously chosen documentation from the past and also from the mad fragmentations that no one intended to preserve and that just ended up there…
The past comes to us in pieces. It will not be done away with. But it is the archivists who tell the present story of the past by assembling the pieces. The actual stories of the past’s past are for the telling of those whose past it was. Furthermore, these records come to us with another crucial paradox. For theorists, the functions and the context(s) of the records are the unique elements that make archives archival. They are the foundation upon which all archival theory is based. Their histories are the histories that archival practice aims to preserve. But what will we make of their disappearance? What will we make of understanding that the contexts and functions of the past are no longer present? How do they tell their past through the archivists’ eyes? In truth the paradox continues. The creators and the functions—the unique archival characters—are absent. They do not arrive with the records. The records hint of them—the records echo their voices. The archivists are left to tell the story.
For these reasons, I argue that archivists in fact are storytellers. Archivists create narratives about why they manage records the way they do and they simultaneously tell stories about the records in their care, arranging and describing them for use by others. None of this is to say that archival theory is incorrect. I hope, instead, to add depth and poetics to the process of performing theory and the practice of writing archives.
Geoffrey Yeo, “Concepts of Record (2): Prototypes and Boundary Objects,” 135, 136.
 Heater Beattie, “Where Narratives Meet: Archival Description, Provenance, and Women’s Diaries,” 94.
 Geoffrey Yeo, “Custodial History, Provenance, and the Description of Personal Records,” 59.
 Carolyn Steedman, Dust, 45.
 Carolyn Steedman, Dust, 68.
In order to explore archival practice, one must engage the substance of archives: what makes a thing archival? Early archival theorists defined archives according to key concepts of creation and preservation, as well as through ideas of legality, historical value, and public obligation. In the United States, canonical modern archival theorist Theodore Roosevelt Schellenberg defined the essential characteristics of archives in relation to the function of the records—the reasons why records came into being—and the element of selection—the reasons why they were preserved.
These characteristics privilege creator over record. If the reasons records are created meet a certain standard, then they are archival, regardless of their content. This allowed Schellenberg, and other archivists, to distinguish archives from manuscripts as separate species of records. According to Schellenberg the difference between archives and historical manuscripts is a systematic versus a haphazard original arrangement. He notes that “modern archives are kept for the use of others than those that created them, and that conscious decisions must be made as to their value for such use.”
According to this distinction, if an archival institution sees records as appropriate for preservation, then they become archival, regardless of their content. It is no mystery, then, why 1960s social and cultural critics implicated modern archival institutions in hegemonic collecting practices.
Trevor Livelton updated these definitions in his 1996 monograph, Archival Theory, Records, and the Public, challenging Schellenberg’s distinction between archives and manuscripts with a preference for simplicity. Although Livelton’s intent is to define the nature of ‘public’ records, i.e. not personal papers or historical manuscripts, he explores the theoretical foundation for all archives before focusing his gaze on public governance. His contention is that whether or not archivists consciously recognize the effect of theoretical principles on their work, in order to function as an archivist one must hold certain preconceptions about the nature of archival practice and archival materials because “archivists both have and use ideas…. They do their everyday work in certain ways because of the ideas they hold about the nature of the material they work with.”
According to Livelton theoretical ideas about the nature of records— that they are organic, structured, and authentic—“dictate the archival methodology by which a particular [record] is examined by the archivist, which in turn determines the resulting scholarly product.”
For Kathleen D. Roe, contemporary archival practitioner and author of Arranging & Describing Archives & Manuscripts, the “archivalness” of a body of records is determined by the extent to which the “primary and most significant defining characteristic is that it is evidence of the activity for or by which it was created, assembled, or collected.”
 T. R. Schellenberg, Modern Archives, 16.
 Trevor Livelton, Archival Theory, Records, and the Public, 65.
 T. R. Schellenberg, Modern Archives, 18.
 T. R. Schellenberg, Modern Archives, 14.
 Trevor Livelton, Archival Theory, Records, and the Public, 27.
 Trevor Livelton, Archival Theory, Records, and the Public, 45.
 Kathleen D. Roe, Arranging & Describing Archives & Manuscripts, 28. Roe provides an overview of practical approaches in the US and Canada to archival arrangement and description with examples and historical background. She writes for practicing archivists responsible for the management of historical manuscripts, personal papers, and government and institutional records.
Carolyn Steedman, in Dust: The Archive and Cultural History, evaluates the nature of writing history, or narrativizing the past. She distinguishes histories from life-writing, archives from histories, and history from the past. According to Steedman, what we say about the past is never actually what happened. Archives only provide a trace of that past we seek; anything we say about that past is new and never what was.
Steedman’s work, and that of her predecessors, including Jacques Derrida and Hayden White, influences my thinking about archives. They inform my perspective that nothing in an archival institution is unmediated.
Archives, as collections of documents from the past, are first and foremost records created by an entity at a time and place. In the simplest scenario these items are passed from the original creator to the archivist and in this case the archivist becomes the second-stage mediator, an interpreter of another’s interpretations.* In more complex situations, the original items flow through many hands, or interpreters, before finding a home in an archival institution.
By the time the researcher arrives on the scene, the silt is deep.
When I read the history of arrangement and description, I am reading for the foundations of the principles that underlie archival arrangement and description, but I am also reading for assumptions about the nature of these concepts, about the purity of archives.
Archives are always already mediated memories.
*See Heather Beattie’s work on archival description, provenance and women’s diaries, “Where Narratives Meet:Archival Description, Provenance, and Women’s Diaries,” 2008.