Project Ideas
The following are rough project ideas for your consideration.
Part of your job will be to crystallize the purpose,
methods, and scope of your specific project into a project proposal.
Note that not all projects involving writing a lot of code,
but all involve a thorough quantitative evaluation of a system.
Students may undertake projects not listed here, but
should consult with the instructor before submitting a proposal.
You may certainly take any of these ideas and add a "twist"
to make it more interesting or challenging.
Many of these project ideas make use of software and systems
that are already deployed at Notre Dame, such as the
Chirp distributed filesystem,
the Condor distributed batch system,
and the Hadoop data processing system.
I encourage you to use these systems, so that you will have a ready-made testbed to work with,
and will become familiar with tools useful later in your career.
Distributed RAID.
The Chirp filesystem
gives easy access to lots of different storage devices. Chirp allows the
user to read and write files on a single remote disk, but does not provide
any kind of replication or error checking. Remedy this by creating a library
for creating and accessing large files that are striped across
multiple Chirp servers. Your library support several different RAID configurations,
selected when the file is created. The library should have a simple interface
like chirp_raid_open, chirp_raid_pread, chirp_raid_pwrite,
chirp_raid_stat, and so forth. Get started by using the Chirp API. Explore the performance of this library on a variety of workloads, varying
the RAID configuration and the number of servers in use. How does the performance
compare to using a single local disk or a single remote Chirp server?
A Data-Preserving Filesystem. A traditional filesystem allows the
end user to permanently delete data, either by writing over existing data in files,
or by deleting a file outright. This is unacceptable in many situations where
the user is not fully trusted, is prone to accidents, or simply likes to have a
record of everything they have done. Using FUSE or Parrot, build a filesystem wrapper
that allows the user to access their files normally, but saves an extra copy of
a file when it is overwritten or lost. (There are several ways to accomplish this.)
For safety, the backup copy should be sent to a remote system like a Chirp server.
Read about the old but influential idea of Lifestreams,
and design a method for users to browse, search, and select files from the past.
Deploy the system with you and your friends for a few days to measure how much
data a "typical" user really needs to save. Compare the performance to a conventional filesystem.
A Robust Data Forwarding Network. In many scientific fields, it is now common
to collect large amounts of data automatically from digital instruments. Such data must be
reliably processed, reduced, analyzed, and transported to archive servers and other interested parties.
If the volume of data is high enough, multiple servers may be necessary at each strage.
For example, a continuously running surveillance camera might take one picture a second,
depositing each in a directly attached disk. Each image needs to be moved off the camera,
converted to a more compact camera, and run though a face detector. If the face detector returns
true, then the image should be dropped into an "alert" queue for a human, otherwise, copied out
To support this, create a simple data forwarding server that can be chained together to
build a complex system. Each server should accept incoming files, perform some locally-determined
processing, and then forward the results on to one or more servers. The challenge is to make
the system work correctly and robustly even in the face of network and server failures.
Automatic Concurrency Control. Modern computers have physical concurrency
in many different places: a machine might have eight cores, four disks, two network cards,
and access to three software licenses. Programmers and end users can exploit concurrency
by running multiple programs at once, but running too many at once can actually
slow the system down. Unfortunately, it's not always obvious what resources a program
needs, so it's hard to tell exactly what level of concurrency to use. For example,
if you have eight cores and 1 GB of memory, you could run eight programs at once.
But, if each one is writing extensively to the disk, then perhaps you should only run
four at a time. To help the user in this situation, write a simple
parallel process manager that takes a arbitrary list of commands to run in batch.
The manager's job is simply to run all of them to completion in the fastest time possible,
adjusting the level of concurrency as needed. Try several different techniques, such
as a TCP-like feedback loop, or explicit measurement of applications and resourcse.
The challenge is to come up with an algorithm that works on a wide variety of machines
and applications. (We can provide you with access to several machines with 2, 8, and 32 cores.)
User-Level Distributed Shared Memory. A distributed shared memory allows
processes running of different machines to effectively read and write the same memory
space, as if they were on the same machine. This allows a conventional multi-threaded
program to scale beyond the limits of one machine. Start with the user level page table from the undergrad OS class, and build a user-level library that allows
cooperating processes to share memory across multiple machines. (It's actually not as hard as it sounds!)
Use it to implement a few multithreaded applications, and compare the performance
tothe same application running on a single machine.
Distributed Mach. Read ahead to learn about the Mach microkernel operating system.
Create a user-level library with the same basic concepts as Mach -- messages, ports, and tasks --
that allows for easy communication between processes, whether they run on the same machine,
or different machines. Use your library to build up some simple operating system services
or parallel applications. Explore the performance of this system, and compare it to using
multiple processes on the same machine.
Distributed Virtual Machines.
For better or worse, most software depends on a very specific operating system.
For example, most software written for Red Hat 5 does not run on Red Hat 4,
and vice versa. This makes it very difficult to use a system like our
Condor pool, where there are several different operating systems installed.
Devise a system that always provides the user's expected environment,
no matter where it runs. If a job lands on a machine with the desired OS,
then just run it. If a job lands on a machine with the wrong OS, then
start a virtual machine with User Mode Linux.
and an appropriate disk image. One complication will be the size of the
OS disk images, which are measured in gigabytes. Measure the performance
overhead of using the virtual machine, and find a clever way to minimize
the costs of the disk image.
Improving Software Installation with Disk Images.
Traditional software installation is very inefficient.
Users must download (or copy) a ZIP, TAR, or RPM file to a local disk,
then unpack the software by writing lots of fiddly little files
to all sorts of directories all across the disk.
If you have ever installed Office, you know how long this can take!
A potentially more efficient way is to distribute a software package
a single disk image that can be written once sequentially
and then mounted into the filesystem view.
(This is called "loopback" mounting on Linux.)
Come up with a system for managing and installing software this way.
Measure the performance of installing disk images versus unpacking archives
for a large set of open source software.
How do you deal with the problem of managing the user's PATH
and similar configuration variables? Can this system scale to
100s or 1000s of software packages?
User Filesystem Study.
Many assumptions about user behavior in operating systems are based
on studies that are decades old. (example one, example two.) Produce a new study of how users
behave in the ND CSE network. Examine tools such as strace,
tcpdump, and fstrace for the purpose of recording logs of
filesystem activity. Demonstrate that you can record and analyze
a few hours of activity. Then, get permission from Curt and a few
of your friends to trace activity on a few workstations for several weeks.
Write a comprehensive report on the file access behavior of those
people over the semester.
Visualizing Complex Applications. As you saw in the warm-up assignment, even a simple application like ls can have surprising complexity. Debugging the system behavior of applications can be challenging, because it is often difficult to even determine which files and resources an application accesses, and whether it does so efficiently. Simple mistakes like opening and closing the same file repeatedly can have a significant effect on performance. Build a system visualizer to address this problem. Start by using tools like strace to collect the behavior of complex multi-process applications like web browsers, editors, and similar tools. Build a visualization tool that reads these traces, and produces some sort of (interactive?) display that presents what is going on in the application. Include enough information so that a programmer can use this tool to find common inefficiencies. N.B. In this context, "visualiation" does not mean creating pretty pictures, it means creating compact, expressive, informative methods of presenting dense information. Read books by Edward Tufte to get the idea.