Project 3: Thread Scheduling

Update: Please read the FAQ for Project Three.

Project Goals

By undertaking this project, you will:
  • learn the basic elements of the World Wide Web and its protocols.
  • learn how real-world multi-threaded servers are built.
  • apply scheduling algorithms to a working system.
  • develop your skills in programming C and pthreads.
  • gain experience in reading and modifying existing code.
  • Introduction

    Internet servers of various kinds are the most likely place where you will encounter threads in real life. Because an internet server must handle activities of multiple simultaneous users, it has many of the same structures and difficulties as an operating system. A good example of this is a web server, which must provide data access to many web browsers and other programs at once.

    Let's briefly review how the web works. The web consists of servers and clients. A web server is a process that runs on a machine and makes data available to any client that may call it up and ask for it. A client that you are certainly familiar with is a web browser, but there are many other kinds of clients that interact with servers in quieter ways. For example, the wget command line tool can be used to request files from a web server without any of the graphical hoopla.

    A file on a server is identified by a Uniform Resource Locator (URL), which looks like this: http://www.cse.nd.edu:80/index.html The parts of the URL mean the following:


    httpThe protocol used to contact the server.
    www.cse.nd.eduThe server to be contacted.
    80The port number on the server to call. (80 if not given)
    /index.htmlThe file to retrieve from the server.

    Most web applications use HTTP, the HyperText Transfer Protocol. This began life as a simple protocol, but has become very complex over the years. The basic idea is this. First, the client connects to the server on the TCP port given by the URL. It then sends an HTTP request stating what file it wishes to retrieve, along with the version of the protocol that it understands.
        GET /index.html HTTP/1.0
    
    The server examines this request, and then sends a response header:
        HTTP/1.1 200 OK
        Date: Tue, 11 Jan 2005 21:31:45 GMT
        Server: Apache/1.3.27
        Connection: close
        Content-Type text/html
    
    ...followed by the actual data of the file in question. If you are curious, you can speak to web servers directly without an intervening browser by using the telnet tool. Try this to see the raw output of a web server:
        % telnet www.cse.nd.edu 80
        GET /index.html HTTP/1.0
        (type return one more time)
    
    Most HTTP requests to the CSE web server are for static (unchanging) content stored in plain files. However, a URL can just as easily refer to dynamic (changing) content. The "file" portion of a URL can refer to a program that must be run to generate a web page on the fly. This would be common in a web server found at an online auction site. The web server might run a program that queries the auction database to determine the state of a sale and produce the appropriate web page. Most real-world web servers have a mix of static and dynamic content.

    Now that you know the basic underpinnings of the web, let's consider how one might build a web server designed to handle many incoming users. A good place to start is a single-threaded web server, shown below. (Click on the image to enlarge it.)

    In a single threaded web server, there is just one main loop. It sets up a listening socket, and then waits for incoming client connections. Once connected, it reads the HTTP request from the client and creates a "request" data structure that describes the caller and the nature of the request. Another routine services this request and sends back the HTTP response to the client. Once done, the main loop deletes the request and returns to waiting for incoming clients.

    A single threaded web server is not likely to scale up to many simultaneous clients very well. The routine to send the HTTP response could be delayed for any number of reasons: the file system could be slow, the client could be unprepared to receive data, the network could be interrupted, or perhaps the response is dynamic will take some time to produce. If this response is delayed, no other client will be able to obtain service until the blockage is repaired.

    One response to this problem might be to create a new thread for every incoming connection. Such a multi-threaded web server is shown below:

    In a multi-threaded web server, the main loop listens for incoming client connections. When it discovers one, it reads the HTTP request and creates a request structure. It then creates a new thread and passes it the request structure. The new thread sends the HTTP response, deletes the request structure, and exits.

    The multi-threaded web server will certainly scale better than a singly-threaded web server. If any server thread is delayed for any of the reasons mentioned above, new threads will still be created and the users will be happy. However, this design still has some problems. For starters, most thread packages are limited to a fixed maximum number of threads. (303 threads on our machines) If more clients than maximum threads arrive, the server is in trouble. Second, thread creation and deletion can be relatively expensive operations that are unnecessarily repeated under high load. Finally, a given machine may achieve optimum performance for a certain number of threads, independent of the number of actual clients. If we could control the number of threads without regard to the number of clients, the server can be tuned to maximum performance.

    So, must real-world servers use a thread-pool approach, shown below:

    In the thread-pool approach, the main thread creates a fixed number of worker threads. The main thread is still responsible for accepting connections. As it does so, it creates request objects and places them into a linked list. Each worker threads pulls requests off of the list according to some scheduling algorithm, and then produces the necessary response. Thus, the main thread can accept connections as long as memory is available, while a fixed number of threads churn away at maximum efficiency.

    For this project, you will take a single-threaded web server and convert it into a thread- pool web server. Of course, you must apply lessons from the previous project, such as using a monitor to protect the request list. You will also apply new knowledge by implementing several scheduling algorithms on the request list. Finally, you will get some more practice in developing your C skills.

    Note that you won't have to write or modify any code that deals with sockets or the HTTP protocol: this isn't a networking class. Your job is to deal with the threading aspects, leaving the WWW aspects to the existing code.

    Getting Started

    Begin by downloading the following code files:
    wwwserver.c, wwwdriver.c, Makefile

    And the following data files for testing your server:
    horse.jpg, car.jpg, schooner.jpg, hovercraft.jpg

    The provided code is a single-threaded web server. To build and run it:

        % make
        % ./wwwserver 1 fcfs
    
    The first argument is the number of threads to create in the thread pool. The second argument is the scheduling algorithm to apply to the request list, which may be fcfs, dfirst, or sfirst. As provided, the single-threaded web server will only run with the arguments 1 fcfs. The web server listens on port 7090 by default. So, you should be able to start up a web browser and connect to the server at the following urls:
        http://HOSTNAME:7090/auction
        http://HOSTNAME:7090/schooner.jpg
    
    (Of course, replace HOSTNAME with the name of the host you are running it on.) The web server serves up two kinds of content. Static content is available in the form of JPEG images found in the web server's directory. A dynamic page is found under /auction. This simulates a program that connects to a database by printing a message, pausing for a few seconds, and then showing an image of the item that you "bought".

    Now, open up two web browser windows at once. In one, load up the auction URL. In the other window, load an image URL while the auction URL is still loading. (You may have to try this a few times to get the timing right.) Notice that the second web browser is stuck while the server is busy producing the auction page. This demonstrates the problem of a single-threaded web server.

    A web browser is a nice way for demonstrating that the server works, but it isn't very convenient for testing lots of clients at once. For that purpose, we have written wwwdriver, a simple program that creates an arbitrary number of threads and uses them to load the web server. The program is invoked as follows:

        % ./wwwdriver http://HOSTNAME/schooner.jpg 10 20
    
    This will cause the given URL to be retrieved by 10 simultaneous threads, each requesting the file 20 times. That is, it will generate a total of 200 HTTP requests. So as not to create a mess, all output is placed in the file output.dat. You should use this tool to drive your web server and test how it reacts to multiple clients and repeated connections.

    Warning: You are responsible for what you do with wwwdriver. Only use it to apply load to your own web server. Do not use it to load web sites that you do not control.

    Technical Requirements

    Your job is to convert the given single-threaded web server into a thread-pool web server as described above. The total number of threads in the pool should be determined by the first argument to wwwserver. Because the linked list of requests will be accessed by many threads simultaneously, you must protect it with a monitor, much as in the previous assignment.

    In addition, the order in which queued requests are serviced must be controlled by the scheduling algorithm given as the second arguments. You must implement the following three algorithms:
    fcfs - First come, first served. Requests should be serviced in the order in which they are received.
    dfirst - Dynamic first. All queued requests for dynamic content (/auction) should be serviced before any requests for static content. (.jpg files)
    sfirst - Static first. All queued requests for static content (.jpg files) should be serviced before any requests for dynamic content.

    You must come up with a testing method that demonstrates that your scheduling algorithms works correctly. Think carefully about how the thread-pool server should work and use a web browser, wwwdriver, or some combination of the two to demonstrate that your server is scheduling requests properly.

    To accomplish this project, you may modify the main main function and add any other functions or variables that you deem necessary. You may not modify any of the existing functions, nor may you defeat the simulated pause in process_dynamic_request() in any way.

    If you need a refresher on how to create and manage threads, please read the YoLinux Threads Tutorial. You could also simply review the code provided for you in the last project, this includes an example of creating threads.

    Handing In

    This project is due at 5pm on Thursday, March 17th.

    Your entire program should be contained in a single file wwwserver.c. To hand in this file, simply copy the source file wwwserver.c and nothing else into /afs/nd.edu/coursesp.05/cse/cse341.01/dropbox/YOURNAME/p3. We will compile and test your code on the Fitzpatrick cluster, so be sure to build and test your code on those machines.

    The grade on this project will take into account the safe creation and synchronization of threads (50%), the correctness of the scheduling algorithms (40%), and good coding style(10%). Be sure to test your server carefully and thoroughly, using a wide variety of configurations and number of threads.


    Notre Dame - CSE Dept - CSE 341