Annotated Bibliography

Before you begin your course project, you must spend some time learning about the general subject area. You don't have to have a specific project in mind just yet, but you should select an area that is of interest to you, and begin to read about it.

Your goal is to produce an annotated bibliography on a medium-size topic. An annotated bibliography is a list of publications accompanied by a short explanation of the value and purpose of each item. A good bibliography should serve as a guidepost for further research. It should identify what sort of research topics have been covered in the past, indicate the relationship between related publications, and suggest ideas for further research.

If you have no idea what you want to study, then begin by reading up on a very broad topic such as:

  • Distributed Filesystems
  • Mobile Computing
  • Peer-to-peer Computing
  • A bibliography on any of the three topics given above would be far too large! As you proceed with your research, look for themes that run through what you are reading. Begin to narrow your topic down to something more specific, such as:
  • Consistency Management in Distributed Filesystems
  • Finding Resources from a Mobile Computer
  • Ensuring Fairness in Peer to Peer Systems
  • Ideally, your bibliography will encompass your course project. Some ideas for bibliography titles are given in the list of project ideas. However, you aren't committed to a particular project at this point.

    Requirements

    Your annotated bibliography must be a collection of references all reasonably related to your chosen title. Each entry must be given a complete citation and must be accompanied by one solid paragraph summarizing the paper and its relevance to your topic area.

    The exact form of the citation is not crucial so long as you are consistent and complete. A citation should give the author's names, the title of the article, the title of the book/journal/conference, and enough information so that someone else could find it in another library. This means include the publisher, volume and number, page numbers, web address, and other details as appropriate. Each entry should explicitly indicate the type of citation: conference article, journal article, and so forth.

    The descriptive paragraph should give enough information to help you or another reader recall its relevance to the scientific community. Describe what the paper is trying to communicate. Is it proposing a new algorithm or architecture, comparing several existing systems, relating experience with an existing system, or something else entirely? What is the main idea expressed in the abstract? How does it relate to previous work? Does it build upon or discredit previous ideas? Either way, chase down several references within the paper, and add them to your bibliography if warranted.

    All told, your annotated bibliography should have:

  • At least 20 items total.
  • 1 to 5 books, book chapters, monographs, or dissertations.
  • At least 10 refereed journal, conference, or workshop articles not on the course reading list.
  • Less than 5 unrefereed items such as:
  • Magazine Articles.
  • Technical Reports
  • Web Pages
  • 2 items of any type from each of these decades: 1970, 1980, 1990, 2000.
  • Publication Types

    Following are the types of publications that you should concentrate on:

    Technical Reports: A technical report is a very preliminary research report that is written internally and then archived at an institution. For example, Notre Dame has its own technical report series. Technical reports are generally written and deposited without being refereed, or sometimes even proofread. However, they are an important vehicle that allow researchers to publically establish their activities or data without the delay of submitting to a conference or journal. Good technical reports are often revised and submitted to a conference or journal.

    Conference Articles: A conference is usually a yearly gathering of researchers in the same area of specialization. A conference committee solicits papers for the conference perhaps six months in advance. Papers are refereed by the committee, and those with the best reviews are accepted to the conference. The authors attend the conference and give a short lecture on the paper. After the conference, a book is published, usually called "Proceedings of the Conference on XYZ," containing the papers submitted. The best papers in a conference are often invited to be published in a journal.

    Journal Articles. Journals are typically published several times a year. Much like a conference, a journal has a primary editor and a committee of reviewers. Papers may be submitted to a journal at any time, but are generally longer and more polished than those submitted to a conference. If the paper is accepted, the referees may require the paper to be revised before publication. This whole process from submission to publication may take several years. In computer science, a journal paper is considered to be somewhat more valuable than a conference paper. (In other fields, a journal paper is far more important than a conference paper.)

    Books and Book Chapters. Academics often write books once they have gained a large amount of experience in a given field. Sometimes, an academic book will be have each chapter written by a different author. Books are solely the work of the author(s) and are generally not peer reviewed. Thus, books can serve as an introduction to or overview of a given field, but are not likely to contain any hard research results.

    Dissertations. A dissertation is the final result of a master's or doctoral degree. In some sense, it is peer-reviewed because it must pass the muster of the student's reviewing committee. Dissertations are usually a deposit of everything a student has learned in the last 2-7 years, and thus are long and quite detailed. A good dissertation should point you to other papers written of more digestible length by the same student.

    Hints on Research

    Tread lightly! You do not need to read each paper thoroughly. In fact, you do not have time to read all of the papers in your AB! Begin by reading the abstract. If it is not relevant to your topic, toss it out right away. If it is relevant, then read the introduction and conclusions and skim over the middle parts. Summarize the main points, save a copy or a printout, and move on. If there is a detailed algorithm or idea, jot it down and return to it later if you deem it to be important. Of course, you will have to return and read some of these papers carefully at a later time.

    Start in a Known Place. Begin by skimming the papers on the class reading list related to your topic, and then follow the references that seem important. Likewise, skim appropriate sections in the recommended textbooks.

    Be wary. Journal and conference articles vary widely. The vast majority are mediocre, and only a small number are of great value. Distinguishing between the two may be difficult at first -- that's ok! -- but you will gain confidence with this in time. If you are unsure about the value of a paper, there is no harm in mentioning this in the bibliography.

    Search Effectively. Although all of you are familiar with Google, I would not recommend using it for initial paper searches. (It does have a good use shown below.) Instead, the best place to search are the archives of the professional organizations related to computer science and engineering: ACM, IEEE, and USENIX. Here are their library pages:

  • ACM Digital Library
  • IEEE Computer Society Digital Library
  • USENIX Publications
  • Going a little deeper, try looking through the tables of contents of well-known conferences and journals. The following are well-known publications that cover a variety of areas:
  • SOSP - Symposium on Operating Systems Principles
  • OSDI - Operating Systems Design and Implementation
  • USENIX - Annual Conference
  • TOCS - Transactions on Computer Systems
  • The following are more specialized and may be appropriate, depending on your choice of bibliography:
  • ASPLOS - ACM Architectural Support of Programming Languages and Operating Systems
  • FAST - ACM File and Storage Systems
  • HPDC - IEEE High Performance Distributed Computing
  • MobiCom - ACM Mobile Computing
  • NSDI - USENIX Networked Systems Design and Implementation
  • P2P - IEEE Peer to Peer Computing
  • PODC - ACM Principles of Distributed Computing (Theory and Algorithms.)
  • Now, suppose that you come across a reference to an article that is either quite old or otherwise not online. For example, the following paper appeared in the conference HPDC but is not available online at the HPDC website:

    J. B. Weissman, A. S. Grimshaw, "Network Partitioning of Data Parallel Programs", Proceedings of the Third IEEE Symposium on High Performance Distributed Computing.

    Here is where Google comes in. Do a search for the entire title with quotes around it: "Network Partitioning of Data Parallel Programs" and you may find a copy placed online by the authors or other readers. Or, you may find nothing.

    Of course, before the web was invented, we all spent time in the library. Make a trip down to the first floor to get comfortable with an old friend. Conference proceedings are stored under the name of the conference, so find the ND library web site and search for "high performance distributed computing" in title keywords. The call number of the conference proceedings is QA 76.9 .D5 I593. Find the book and photocopy your article. Of course, while you are there, browse through other issues of the some conference to look for related work.

    Citeseer is good tool for determining popular papers. Citeseer indexes research papers and records relationships between citations. For example if you enter "mobile computing" and press "citations", you will be given a list of documents containing that phrase, sorted by how often they are cited. This can help to identify what ideas and publications are popular. Of course, this tool does not contain every paper in existence. Once you determine a valuable conference or publication, go back to the authoritative sources above to browse those conferences for other interesting papers.

    If you have difficulty finding what you are looking for, please see the instructor for some tips.

    Example Bibliography

    Transaction Support in File Systems
    Iam A. Student

    (Technical Report)Butler Lampson, Howard Sturgis, "Crash Recovery in a Distributed Data Storage System", Tech Report, Xerox Palo Alto Research Center 1979.
    A complete transaction system is build from the ground up in four layers of abstraction, emphasizing the compositional nature of software. Everything is proven using simply exhaustive case analysis. I didn't totally understand the difference between errors and disasters, so I'll have to go back and read that again. Although it's just a technical report, there are many references to it, so it must be a classic.

    (Conference Article) Michael A. Olson, "The Design and Implementation of the Inversion File System", Proceedings of the USENIX Winter 1993 Technical Conference.
    A simple idea is proposed: Build a filesystem on top of a database by using tables for metadata and directory structure. Not surprisingly, there is a significant performance hit: only 30-80 percent of NFS throughput. On the other hand, you get vastly increased flexibility, including the possibility of using the server itself for computing. It seems like there should be a more efficient way of getting transactions into files. This paper relies heavily on Margo Seltzer's work below.

    (Journal Article) M. Stonebraker, et al. "Mariposa: A Wide-Area Distributed Database System", VLDB Journal 5:1 January 1996, pages 48-63.
    This paper proposes that databases distributed over the WAN are fundamentally different from databases distributed over the LAN because of the independence of individual nodes and the expense of moving data over the wide area. Although this is nominally about databases, I think it will apply to filesystems as well, because the same distinction between LAN and WAN is necessary. There is a long section on bidding that will require some careful reading. Stonebraker appears in many database papers.

    (Book) Jim Gray and Andreas Reuter, "Transaction Processing: Concepts and Techniques", Morgan Kaufmann, San Francisco, 1993.
    This book is an algorithmic bible for building transaction based systems. Starting with the basics of storage devices, it builds up algorithms for logging, transactions, recovery and more. Two surprising elements: One, there is a big section on fault tolerance and the underlying sources of failures; Two, although the focus is on databases, there is an entire section on filesystems. Note that Jim Gray was a major player in the System-R and Tandem systems.

    (Dissertation) Margo Seltzer, "File System Performance and Transaction Support", Ph.D. Dissertation, University of California at Berkeley, 1992.
    This dissertation explores adding transactions to file systems in excruciating detail. The first few chapters focus on simulation of varying system structures and workloads. Once a structure is chosen, a transaction-based filesystem is built and evaluated. I'll have to return to this to see exactly what designs were considered or discarded.