Linux Kernel Basics

Some Linux Kernel Source Files
with Annotations by Felix Gärtner

Table of Contents

Prelude

This document contains a few source files from the Linux kernel enhanced with a few comments. The motivation for me to create this file was to show students of a course in operating systems what ``real'' system code looks like.

All files are from the Suse Linux Kernel 2.0.36 [cite SUSE]. They are copyrighted under the GNU General Public Licence and in base are the work of the wise and honorable Linus Torvalds. [Other authors include Jes Sorensen, Leonard N. Zubkoff.] Change logs are omitted for brevity. Filenames reside in the include subdirectory if not stated otherwise.

If you have the source of this document (linux-kernel-basics.nw) you can generate a couple of slides from this file using the noweb literate programming tool [cite NOWEB, NOWEBWWW]. For more information on Literate Programming see Dave Thompson's great FAQ [cite LPFAQ].

Here are some WWW sites with related information (but not with as much commentary as here):

File asm/system.h

This is the version for the 68000 architecture. You should have a machine instruction reference card handy.

<asm/system.h>= [D->]
#ifndef _M68K_SYSTEM_H
#define _M68K_SYSTEM_H

#include <linux/config.h> /* get configuration macros */
#include <linux/linkage.h>
#include <asm/segment.h>

Here are two function dealing with the user stack pointer (USP). The first one (rdusp) seems to read the USP, the next seems to write to it.

<asm/system.h>+= [<-D->]
extern inline unsigned long rdusp(void) {
        unsigned long usp;

        __asm__ __volatile__("movec %/usp,%0"
                             : "=d" (usp));
        return usp;
}

extern inline void wrusp(unsigned long usp) {
        __asm__ __volatile__("movec %0,%/usp"
                             :
                             : "d" (usp));
}
Defines rdusp, wrusp (links are to index).

switch_to(prev,next) should switch tasks to task ptr, first checking that ptr isn't the current task, in which case it does nothing. This also clears the TS-flag if the task we switched to has used the math co-processor latest.

The tss ist a thread_struct defined in asm/processor.h. It contains the kernel stack pointer, user stack pointer, the status register and a lot of other characteristics of a process or thread. The expression

(int)&((struct task_struct *)0)->tss;
casts a 0 to a task_struct???

switch_to() saves the extra registers, that are not saved automatically by SAVE_SWITCH_STACK in resume(), i.e., d0--d5 and a0--a1. Some of these are used by schedule() and its predecessors and so we might get see unexpected behaviors when a task returns with unexpected register values.

Syscall stores these registers itself and none of them are used by syscall after the function in the syscall has been called.

Beware that resume now expects *next to be in d1 and the offset of tss to be in a1. This saves a few instructions as we no longer have to push them onto the stack and read them back right after.

<the actual context switch>= (U-> U->)
asmlinkage void resume(void);
#define switch_to(prev,next) { \
  register int k __asm__ ("a1") = (int)&((struct task_struct *)0)->tss; \
  register int n __asm__ ("d1") = (int)next; \
  __asm__ __volatile__("jbsr " SYMBOL_NAME_STR(resume) "\n\t" \
                       : : "a" (k), "d" (n) \
                       : "d0", "d1", "d2", "d3", "d4", "d5", "a0", "a1"); \
}
Defines resume, switch_to, tss, %use (links are to index).

<asm/system.h>+= [<-D->]
<the actual context switch>

Here's a definition of the ``test and set'' operation.

<test and set and friends>= (U-> U->)
#define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
#define tas(ptr) (xchg((ptr),1))

struct __xchg_dummy { unsigned long a[100]; };
#define __xg(x) ((volatile struct __xchg_dummy *)(x))
Defines tas, %use, __xchg, xchg, __xchg_dummy, __xg (links are to index).

<asm/system.h>+= [<-D->]
<test and set and friends>

Functions sti and cli

The cli and sti functions seem important.

An andiw is an ``and immediate word'', which ``ands'' the value of 0xf8ff onto the status register. The status register's bits numbers 8, 9 and 10 are responsible for interrupt handling. Setting them all to 0 means that all interrupts are allowed. Thus, sti means something like ``restore interrupts''. Analagously, cli ``ors'' the value of 0x0700 onto the status register and thus sets all those 3 bits. This means, that all interrupts are masked except those of highest priority (4--7). This means, cli clears all interrupts. The following figure shows these values and the bits of the status register. Remeber that bits 8--10 are the ``interrupt mask'':

           1111 11
  SR bits  5432 1098 7654 3210        
  0xf8ff = 1111 1000 1111 1111
  0x0700 = 0000 0111 0000 0000

<sti and cli>= (U-> U->)
#define sti() __asm__ __volatile__ ("andiw #0xf8ff,%/sr": : : "memory")
#define cli() __asm__ __volatile__ ("oriw  #0x0700,%/sr": : : "memory")
#define nop() __asm__ __volatile__ ("nop"::)
#define mb()  __asm__ __volatile__ (""   : : :"memory")
Defines cli, mb, nop, sti (links are to index).

<asm/system.h>+= [<-D->]
<sti and cli>

Here are two functions that save the status register to some memory location x and restore it again. This can be done to restore the previous interrupt level when using cli and sti. Note that restore_flags does an implicit sti because it restores the entire status register including interrupt bits.

<save and restore flags>= (U-> U->)
#define save_flags(x) \
__asm__ __volatile__("movew %/sr,%0":"=d" (x) : /* no input */ :"memory")

#define restore_flags(x) \
__asm__ __volatile__("movew %0,%/sr": /* no outputs */ :"d" (x) : "memory")
Defines restore_flags, save_flags (links are to index).

<asm/system.h>+= [<-D->]
<save and restore flags>

This is the simple rte (return from exception) machine instruction. Recall that it pops the status register and the program counter from the system stack.

<asm/system.h>+= [<-D->]
#define iret() __asm__ __volatile__ ("rte": : :"memory", "sp", "cc")
Defines iret, rte (links are to index).

Here's a routine to exchange one to four bytes in memory atomically.

<atomic value exchange>= (U-> U->)
static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
{
  unsigned long tmp, flags;
  save_flags(flags);
  cli();
  switch (size) {
  case 1:    __asm__ __volatile__
    ("moveb %2,%0\n\t"
     "moveb %1,%2"    : "=&d" (tmp) : "d" (x), "m" (*__xg(ptr)) : "memory");
    break;
  case 2:    __asm__ __volatile__
    ("movew %2,%0\n\t"
     "movew %1,%2"    : "=&d" (tmp) : "d" (x), "m" (*__xg(ptr)) : "memory");
    break;
  case 4:    __asm__ __volatile__
    ("movel %2,%0\n\t"
     "movel %1,%2"    : "=&d" (tmp) : "d" (x), "m" (*__xg(ptr)) : "memory");
    break;
  }
  restore_flags(flags);
  return tmp;
}
Defines cli, restore_flags, %use, __xchg, __xg (links are to index).

<asm/system.h>+= [<-D]
<atomic value exchange>

#endif /* _M68K_SYSTEM_H */

File sched.h

<sched.h>= [D->]
#ifndef _LINUX_SCHED_H
#define _LINUX_SCHED_H

#include <asm/param.h>  /* for definition of HZ */

extern unsigned long event;

#include <linux/binfmts.h>
#include <linux/personality.h>
#include <linux/tasks.h>
#include <linux/kernel.h>

#include <asm/system.h>
#include <asm/semaphore.h>
#include <asm/page.h>

#include <linux/smp.h>
#include <linux/tty.h>
#include <linux/sem.h>

/*
 * cloning flags:
 */
#define CSIGNAL         0x000000ff      /* signal mask to be sent at exit */
#define CLONE_VM        0x00000100      /* set if VM shared between processes */
#define CLONE_FS        0x00000200      /* set if fs info shared between processes */
#define CLONE_FILES     0x00000400      /* set if open files shared between processes */
#define CLONE_SIGHAND   0x00000800      /* set if signal handlers shared */
#define CLONE_PID       0x00001000      /* set if pid shared */

These are the constant used to fake the fixed-point load-average counting. Some notes:

What does fixed point mean?

<sched.h>+= [<-D->]
extern unsigned long avenrun[];         /* Load averages */

#define FSHIFT          11              /* nr of bits of precision */
#define FIXED_1         (1<<FSHIFT)     /* 1.0 as fixed-point */
#define LOAD_FREQ       (5*HZ)          /* 5 sec intervals */
#define EXP_1           1884            /* 1/exp(5sec/1min) as fixed-point */
#define EXP_5           2014            /* 1/exp(5sec/5min) */
#define EXP_15          2037            /* 1/exp(5sec/15min) */
Defines avenrun, EXP_1, EXP_15, EXP_5, FIXED_1, FSHIFT, LOAD_FREQ (links are to index).

<sched.h>+= [<-D->]
#define CALC_LOAD(load,exp,n) \
        load *= exp; \
        load += n*(FIXED_1-exp); \
        load >>= FSHIFT;

#define CT_TO_SECS(x)   ((x) / HZ)
#define CT_TO_USECS(x)  (((x) % HZ) * 1000000/HZ)
Defines CALC_LOAD, CT_TO_SECS, CT_TO_USECS, FIXED_1, FSHIFT, %use (links are to index).

nr_running is the current number of running processes, nr_tasks is the current number of runnable tasks.

<sched.h>+= [<-D->]
extern int nr_running, nr_tasks;
extern int last_pid;

#define FIRST_TASK task[0]
#define LAST_TASK task[NR_TASKS-1]

#include <linux/head.h>
#include <linux/fs.h>
#include <linux/signal.h>
#include <linux/time.h>
#include <linux/param.h>
#include <linux/resource.h>
#include <linux/ptrace.h>
#include <linux/timer.h>

#include <asm/processor.h>

Process states

These seem to be the process states (in this order): ready to run, blocked and waiting for a signal, blocked and waiting for an interrupt, zombie, stopped (e.g., by a debugger), waiting for data to swap.

<Process states>= (U-> U->)
#define TASK_RUNNING            0
#define TASK_INTERRUPTIBLE      1
#define TASK_UNINTERRUPTIBLE    2
#define TASK_ZOMBIE             3
#define TASK_STOPPED            4
#define TASK_SWAPPING           5
Defines TASK_INTERRUPTIBLE, TASK_RUNNING, TASK_STOPPED, TASK_SWAPPING, TASK_UNINTERRUPTIBLE, TASK_ZOMBIE (links are to index).

<sched.h>+= [<-D->]
<Process states>

Here are the scheduling policies: Processes may take on one of the following types: FIFO, Round-Robin and ``other''.

<Scheduling policies>= (U-> U->)
#define SCHED_OTHER             0
#define SCHED_FIFO              1
#define SCHED_RR                2

struct sched_param {
        int sched_priority;
};
Defines SCHED_FIFO, SCHED_OTHER, sched_param, sched_priority, SCHED_RR (links are to index).

<sched.h>+= [<-D->]
<Scheduling policies>

<sched.h>+= [<-D->]
#ifdef __KERNEL__

extern void sched_init(void);
extern void show_state(void);
extern void trap_init(void);

asmlinkage void schedule(void);

/* Open file table structure */
struct files_struct {
        int count;
        fd_set close_on_exec;
        fd_set open_fds;
        struct file * fd[NR_OPEN];
};

#define INIT_FILES { \
        1, \
        { { 0, } }, \
        { { 0, } }, \
        { NULL, } \
}

struct fs_struct {
        int count;
        unsigned short umask;
        struct inode * root, * pwd;
};

#define INIT_FS { \
        1, \
        0022, \
        NULL, NULL \
}

struct mm_struct {
        int count;
        pgd_t * pgd;
        unsigned long context;
        unsigned long start_code, end_code, start_data, end_data;
        unsigned long start_brk, brk, start_stack, start_mmap;
        unsigned long arg_start, arg_end, env_start, env_end;
        unsigned long rss, total_vm, locked_vm;
        unsigned long def_flags;
        struct vm_area_struct * mmap;
        struct vm_area_struct * mmap_avl;
        struct semaphore mmap_sem;
};

#define INIT_MM { \
                1, \
                swapper_pg_dir, \
                0, \
                0, 0, 0, 0, \
                0, 0, 0, 0, \
                0, 0, 0, 0, \
                0, 0, 0, \
                0, \
                &init_mmap, &init_mmap, MUTEX }

struct signal_struct {
        int count;
        struct sigaction action[32];
};

#define INIT_SIGNALS { \
                1, \
                { {0,}, } }

Here's the declaration of an entry in the process table.

<sched.h>+= [<-D->]
struct task_struct {
  <process table entries>
};
Defines task_struct (links are to index).

Let's scan through the individual entries. For these first ones, the documentation says here: ``these are hardcoded - don't touch''.

<process table entries>= (<-U) [D->]
volatile long state;    /* -1 unrunnable, 0 runnable, >0 stopped */
long counter;
long priority;
unsigned long signal;
unsigned long blocked;  /* bitmap of masked signals */
unsigned long flags;    /* per process flags, defined below */
int errno;
long debugreg[8];  /* Hardware debugging registers */
struct exec_domain *exec_domain;

Here it says: ``various fields''.

The linux_binfmt is a structure defined in binfmts.h which contains pointers to functions associated with a particluar binary format that Linux understands. These are for example functions to load a binary or a shared libary.

<Binary format pointer in process table>= (U-> U->)
struct linux_binfmt *binfmt;
Defines binfmt (links are to index).

<process table entries>+= (<-U) [<-D->]
<Binary format pointer in process table>

Here are some linked list structures of the tasks. next_task and prev_task are possibly simply a means to browse through all tasks, whereas next_run and prev_run cyclicly access the ``ready'' tasks. This seems to be an implementation of the ready queue.

<Linked lists in process table>= (U-> U->)
struct task_struct *next_task, *prev_task;
struct task_struct *next_run,  *prev_run;
Defines next_run, next_task, prev_run, prev_task (links are to index).

<process table entries>+= (<-U) [<-D->]
<Linked lists in process table>

What is this?

<process table entries>+= (<-U) [<-D->]
unsigned long saved_kernel_stack;
unsigned long kernel_stack_page;
int exit_code, exit_signal;

The term ``personality'' seems to refer to different types of process types and is defined in personality.h. Possible personality types are for example LINUX, LINUX_32BIT, SVR4, BSD and others.

dumpable and did_exec are one Bit.

<process table entries>+= (<-U) [<-D->]
unsigned long personality;
int dumpable:1;
int did_exec:1;
Defines did_exec, dumpable, personality (links are to index).

Here's the pid. I have no idea what pid_t is.

<PID declaration>= (U-> U->)
int pid;        /* shouldn't this be pid_t? */
Defines pid (links are to index).

<process table entries>+= (<-U) [<-D->]
<PID declaration>
int pgrp;
int tty_old_pgrp;
int session;
/* boolean value for session group leader */
int leader;
int     groups[NGROUPS];

Here are some pointers with which the process tree structure may be traversed. These are: p_opptr (original parent process), p_pptr (parent process), p_cptr (youngest child), p_ysptr (younger sibling), p_osptr (older sibling).

For example: ``p->father'' can be replaced with ``p->p_pptr->pid''.

<Tree structures in process table>= (U-> U->)
struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;

<process table entries>+= (<-U) [<-D->]
<Tree structures in process table>

This entry is obviously needed to implement the wait4 system call. The wait4 function suspends execution of the current process until a specified child process has exited, or until a signal is delivered whose action is to terminate the current process or to call a signal handling function. Any system resources used by the child are freed.

<process table entries>+= (<-U) [<-D->]
struct wait_queue *wait_chldexit;

Here are the entries for the user id and the effective user id. I wonder how the suid entry is used and what fsuid is? The same counts for the group ids.

<UID and friends in process table>= (U-> U->)
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
Defines egid, euid, fsgid, fsuid, gid, sgid, suid, uid (links are to index).

<process table entries>+= (<-U) [<-D->]
<UID and friends in process table>

policy describes the scheduling policy used for the current process. It takes on values of SCHED_RR, SCHED_FIFO or SCHED_OTHER.

<Scheduling policy in process table>= (U-> U->)
unsigned long timeout, policy, rt_priority;
Defines policiy, rt_priority, timeout (links are to index).

<process table entries>+= (<-U) [<-D->]
<Scheduling policy in process table>
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
struct timer_list real_timer;

I suppose these are the entries that keep track of the time. utime means ``user time'', stime is ``system time''. The additional c means ``cumulative'' (takes in times of children I suppose). start_time refers to the time when the process was started.

<Timing variables in process table>= (U-> U->)
long utime, stime, cutime, cstime, start_time;
Defines cstime, cutime, start_time, stime, utime (links are to index).

<process table entries>+= (<-U) [<-D->]
<Timing variables in process table>
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1;
unsigned long swap_address;
unsigned long old_maj_flt;      /* old value of maj_flt */
unsigned long dec_flt;          /* page fault count of the last time */
unsigned long swap_cnt;         /* number of pages to swap on next pass */

<process table entries>+= (<-U) [<-D]
/* limits */
        struct rlimit rlim[RLIM_NLIMITS];
        unsigned short used_math;
        char comm[16];
/* file system info */
        int link_count;
        struct tty_struct *tty; /* NULL if no tty */
/* ipc stuff */
        struct sem_undo *semundo;
        struct sem_queue *semsleeping;
/* ldt for this task - used by Wine.  If NULL, default_ldt is used */
        struct desc_struct *ldt;
/* tss for this task */
        struct thread_struct tss;
/* filesystem information */
        struct fs_struct *fs;
/* open file information */
        struct files_struct *files;
/* memory management info */
        struct mm_struct *mm;
/* signal handlers */
        struct signal_struct *sig;
#ifdef __SMP__
        int processor;
        int last_processor;
        int lock_depth;         /* Lock depth. We can context switch in and out of holding a syscall kernel lock... */  
#endif  

<sched.h>+= [<-D->]
/*
 * Per process flags
 */
#define PF_ALIGNWARN    0x00000001      /* Print alignment warning msgs */
                                        /* Not implemented yet, only for 486*/
#define PF_PTRACED      0x00000010      /* set if ptrace (0) has been called. */
#define PF_TRACESYS     0x00000020      /* tracing system calls */
#define PF_FORKNOEXEC   0x00000040      /* forked but didn't exec */
#define PF_SUPERPRIV    0x00000100      /* used super-user privileges */
#define PF_DUMPCORE     0x00000200      /* dumped core */
#define PF_SIGNALED     0x00000400      /* killed by a signal */

#define PF_STARTING     0x00000002      /* being created */
#define PF_EXITING      0x00000004      /* getting shut down */

#define PF_USEDFPU      0x00100000      /* Process used the FPU this quantum (SMP only) */
#define PF_DTRACE       0x00200000      /* delayed trace (used on m68k) */

/*
 * Limit the stack by to some sane default: root can always
 * increase this limit if needed..  8MB seems reasonable.
 */
#define _STK_LIM        (8*1024*1024)

#define DEF_PRIORITY    (20*HZ/100)     /* 200 ms time slices */

Here are the initial values for all the slots of the process table.

<sched.h>+= [<-D->]
/*
 *  INIT_TASK is used to set up the first task table, touch at
 * your own risk!. Base=0, limit=0x1fffff (=2MB)
 */
#define INIT_TASK \
/* state etc */ { 0,DEF_PRIORITY,DEF_PRIORITY,0,0,0,0, \
/* debugregs */ { 0, },            \
/* exec domain */&default_exec_domain, \
/* binfmt */    NULL, \
/* schedlink */ &init_task,&init_task, &init_task, &init_task, \
/* stack */     0,(unsigned long) &init_kernel_stack, \
/* ec,brk... */ 0,0,0,0,0, \
/* pid etc.. */ 0,0,0,0,0, \
/* suppl grps*/ {NOGROUP,}, \
/* proc links*/ &init_task,&init_task,NULL,NULL,NULL,NULL, \
/* uid etc */   0,0,0,0,0,0,0,0, \
/* timeout */   0,SCHED_OTHER,0,0,0,0,0,0,0, \
/* timer */     { NULL, NULL, 0, 0, it_real_fn }, \
/* utime */     0,0,0,0,0, \
/* flt */       0,0,0,0,0,0, \
/* swp */       0,0,0,0,0, \
/* rlimits */   INIT_RLIMITS, \
/* math */      0, \
/* comm */      "swapper", \
/* fs info */   0,NULL, \
/* ipc */       NULL, NULL, \
/* ldt */       NULL, \
/* tss */       INIT_TSS, \
/* fs */        &init_fs, \
/* files */     &init_files, \
/* mm */        &init_mm, \
/* signals */   &init_signals, \
}

extern struct   mm_struct init_mm;
extern struct task_struct init_task;
extern struct task_struct *task[NR_TASKS];
extern struct task_struct *last_task_used_math;
extern struct task_struct *current_set[NR_CPUS];
/*
 *      On a single processor system this comes out as current_set[0] when cpp
 *      has finished with it, which gcc will optimise away.
 */
#define current (0+current_set[smp_processor_id()])     /* Current on this processor */
extern unsigned long volatile jiffies;
extern unsigned long itimer_ticks;
extern unsigned long itimer_next;
extern struct timeval xtime;
extern int need_resched;
extern void do_timer(struct pt_regs *);

extern unsigned int * prof_buffer;
extern unsigned long prof_len;
extern unsigned long prof_shift;

extern int securelevel; /* system security level */

#define CURRENT_TIME (xtime.tv_sec)

extern void sleep_on(struct wait_queue ** p);
extern void interruptible_sleep_on(struct wait_queue ** p);
extern void wake_up(struct wait_queue ** p);
extern void wake_up_interruptible(struct wait_queue ** p);
extern void wake_up_process(struct task_struct * tsk);

extern void notify_parent(struct task_struct * tsk, int signal);
extern void force_sig(unsigned long sig,struct task_struct * p);
extern int send_sig(unsigned long sig,struct task_struct * p,int priv);
extern int in_group_p(gid_t grp);

extern int request_irq(unsigned int irq,
                       void (*handler)(int, void *, struct pt_regs *),
                       unsigned long flags, 
                       const char *device,
                       void *dev_id);
extern void free_irq(unsigned int irq, void *dev_id);

/*
 * This has now become a routine instead of a macro, it sets a flag if
 * it returns true (to do BSD-style accounting where the process is flagged
 * if it uses root privs). The implication of this is that you should do
 * normal permissions checks first, and check suser() last.
 */
extern inline int suser(void)
{
        if (current->euid == 0) {
                current->flags |= PF_SUPERPRIV;
                return 1;
        }
        return 0;
}

extern void copy_thread(int, unsigned long, unsigned long, struct task_struct *, struct pt_regs *);
extern void flush_thread(void);
extern void exit_thread(void);

extern void exit_mm(struct task_struct *);
extern void exit_fs(struct task_struct *);
extern void exit_files(struct task_struct *);
extern void exit_sighand(struct task_struct *);
extern void release_thread(struct task_struct *);

extern int do_execve(char *, char **, char **, struct pt_regs *);
extern int do_fork(unsigned long, unsigned long, struct pt_regs *);


/* See if we have a valid user level fd.
 * If it makes sense, return the file structure it references.
 * Otherwise return NULL.
 */
extern inline struct file *file_from_fd(const unsigned int fd)
{

        if (fd >= NR_OPEN)
                return NULL;
        /* either valid or null */
        return current->files->fd[fd];
}

The wait-queues are circular lists implemented using the wait_queue data structure. You have to be very sure to keep them correct. Use only these two functions to add/remove entries in the queues.

<sched.h>+= [<-D]
extern inline void __add_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
{
        struct wait_queue *head = *p;
        struct wait_queue *next = WAIT_QUEUE_HEAD(p);

        if (head)
                next = head;
        *p = wait;
        wait->next = next;
}

extern inline void add_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
{
        unsigned long flags;

        save_flags(flags);
        cli();
        __add_wait_queue(p, wait);
        restore_flags(flags);
}

extern inline void __remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
{
        struct wait_queue * next = wait->next;
        struct wait_queue * head = next;

        for (;;) {
                struct wait_queue * nextlist = head->next;
                if (nextlist == wait)
                        break;
                head = nextlist;
        }
        head->next = next;
}

extern inline void remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait)
{
        unsigned long flags;

        save_flags(flags);
        cli();
        __remove_wait_queue(p, wait);
        restore_flags(flags);
}

extern inline void select_wait(struct wait_queue ** wait_address, select_table * p)
{
        struct select_table_entry * entry;

        if (!p || !wait_address)
                return;
        if (p->nr >= __MAX_SELECT_TABLE_ENTRIES)
                return;
        entry = p->entry + p->nr;
        entry->wait_address = wait_address;
        entry->wait.task = current;
        entry->wait.next = NULL;
        add_wait_queue(wait_address,&entry->wait);
        p->nr++;
}

#define REMOVE_LINKS(p) do { unsigned long flags; \
        save_flags(flags) ; cli(); \
        (p)->next_task->prev_task = (p)->prev_task; \
        (p)->prev_task->next_task = (p)->next_task; \
        restore_flags(flags); \
        if ((p)->p_osptr) \
                (p)->p_osptr->p_ysptr = (p)->p_ysptr; \
        if ((p)->p_ysptr) \
                (p)->p_ysptr->p_osptr = (p)->p_osptr; \
        else \
                (p)->p_pptr->p_cptr = (p)->p_osptr; \
        } while (0)

#define SET_LINKS(p) do { unsigned long flags; \
        save_flags(flags); cli(); \
        (p)->next_task = &init_task; \
        (p)->prev_task = init_task.prev_task; \
        init_task.prev_task->next_task = (p); \
        init_task.prev_task = (p); \
        restore_flags(flags); \
        (p)->p_ysptr = NULL; \
        if (((p)->p_osptr = (p)->p_pptr->p_cptr) != NULL) \
                (p)->p_osptr->p_ysptr = p; \
        (p)->p_pptr->p_cptr = p; \
        } while (0)

#define for_each_task(p) \
        for (p = &init_task ; (p = p->next_task) != &init_task ; )

#endif /* __KERNEL__ */

#endif

File tasks.h

The file tasks.h simply defines some important values like NR_TASKS, which is the maximum number of processes that can concurrently exist within the system. If you need more tasks, this value may be changed, but be sure to have all files recompiled that depend on it.

<Definition of maximum number of tasks>= (U-> U->)
#define NR_TASKS        512
Defines NR_TASKS (links are to index).

The value of NR_CPUS is for symmetric multiprocessing (SMP) and is still experimental.

<tasks.h>=
#ifndef _LINUX_TASKS_H
#define _LINUX_TASKS_H

#ifdef __SMP__
#define NR_CPUS 32
#else
#define NR_CPUS 1
#endif

<Definition of maximum number of tasks>

#define MAX_TASKS_PER_USER (NR_TASKS/2)
#define MIN_TASKS_LEFT_FOR_ROOT 4

#endif

File sem.h

<semaphore.h>=
#ifndef _LINUX_SEM_H
#define _LINUX_SEM_H
#include <linux/ipc.h>

/* semop flags */
#define SEM_UNDO        0x1000  /* undo the operation on exit */

/* semctl Command Definitions. */
#define GETPID  11       /* get sempid */
#define GETVAL  12       /* get semval */
#define GETALL  13       /* get all semval's */
#define GETNCNT 14       /* get semncnt */
#define GETZCNT 15       /* get semzcnt */
#define SETVAL  16       /* set semval */
#define SETALL  17       /* set all semval's */

/* One semid data structure for each set of semaphores in the system. */
struct semid_ds {
  struct ipc_perm sem_perm;            /* permissions .. see ipc.h */
  time_t          sem_otime;           /* last semop time */
  time_t          sem_ctime;           /* last change time */
  struct sem      *sem_base;           /* ptr to first semaphore in array */
  struct sem_queue *sem_pending;       /* pending operations to be processed */
  struct sem_queue **sem_pending_last; /* last pending operation */
  struct sem_undo *undo;               /* undo requests on this array */
  ushort          sem_nsems;           /* no. of semaphores in array */
};

/* semop system calls takes an array of these. */
struct sembuf {
  ushort  sem_num;        /* semaphore index in array */
  short   sem_op;         /* semaphore operation */
  short   sem_flg;        /* operation flags */
};

/* arg for semctl system calls. */
union semun {
  int val;                      /* value for SETVAL */
  struct semid_ds *buf;         /* buffer for IPC_STAT & IPC_SET */
  ushort *array;                /* array for GETALL & SETALL */
  struct seminfo *__buf;        /* buffer for IPC_INFO */
  void *__pad;
};

struct  seminfo {
    int semmap;
    int semmni;
    int semmns;
    int semmnu;
    int semmsl;
    int semopm;
    int semume;
    int semusz;
    int semvmx;
    int semaem;
};

#define SEMMNI  128             /* ?  max # of semaphore identifiers */
#define SEMMSL  32              /* <= 512 max num of semaphores per id */
#define SEMMNS  (SEMMNI*SEMMSL) /* ? max # of semaphores in system */
#define SEMOPM  32              /* ~ 100 max num of ops per semop call */
#define SEMVMX  32767           /* semaphore maximum value */

/* unused */
#define SEMUME  SEMOPM          /* max num of undo entries per process */
#define SEMMNU  SEMMNS          /* num of undo structures system wide */
#define SEMAEM  (SEMVMX >> 1)   /* adjust on exit max value */
#define SEMMAP  SEMMNS          /* # of entries in semaphore map */
#define SEMUSZ  20              /* sizeof struct sem_undo */

#ifdef __KERNEL__

/* One semaphore structure for each semaphore in the system. */
struct sem {
  short   semval;         /* current value */
  short   sempid;         /* pid of last operation */
};

/* ipcs ctl cmds */
#define SEM_STAT 18
#define SEM_INFO 19

/* One queue for each semaphore set in the system. */
struct sem_queue {
    struct sem_queue *  next;    /* next entry in the queue */
    struct sem_queue ** prev;    /* previous entry in the queue, *(q->prev) == q */
    struct wait_queue * sleeper; /* sleeping process */
    struct sem_undo *   undo;    /* undo structure */
    int                 pid;     /* process id of requesting process */
    int                 status;  /* completion status of operation */
    struct semid_ds *   sma;     /* semaphore array for operations */
    struct sembuf *     sops;    /* array of pending operations */
    int                 nsops;   /* number of operations */
};

/* Each task has a list of undo requests. They are executed automatically
 * when the process exits.
 */
struct sem_undo {
    struct sem_undo *  proc_next; /* next entry on this process */
    struct sem_undo *  id_next;   /* next entry on this semaphore set */
    int                semid;     /* semaphore set identifier */
    short *            semadj;    /* array of adjustments, one per semaphore */
};

asmlinkage int sys_semget (key_t key, int nsems, int semflg);
asmlinkage int sys_semop (int semid, struct sembuf *sops, unsigned nsops);
asmlinkage int sys_semctl (int semid, int semnum, int cmd, union semun arg);

#endif /* __KERNEL__ */

#endif /* _LINUX_SEM_H */

File semaphore.h

This is the header file for semaphores, but it's the Intel achitecture version. I don't know where the 68000 version is (if any).

<asm/semaphore.h>= [D->]
#ifndef _I386_SEMAPHORE_H
#define _I386_SEMAPHORE_H

#include <linux/linkage.h>
#include <asm/system.h>

/*
 * SMP- and interrupt-safe semaphores..
 *
 * (C) Copyright 1996 Linus Torvalds
 *
 * Modified 1996-12-23 by Dave Grothe <dave@gcom.com> to fix bugs in
 *                     the original code and to make semaphore waits
 *                     interruptible so that processes waiting on
 *                     semaphores can be killed.
 *
 * If you would like to see an analysis of this implementation, please
 * ftp to gcom.com and download the file
 * /pub/linux/src/semaphore/semaphore-2.0.24.tar.gz.
 *
 */

<Definition of the semaphore structure>= (U-> U->)
struct semaphore {
        int count;
        int waking;
        int lock ;                      /* to make waking testing atomic */
        struct wait_queue * wait;
};
Defines semaphore, %use, wait_queue (links are to index).

<Definition of MUTEX and friends>= (U-> U->)
#define MUTEX ((struct semaphore) { 1, 0, 0, NULL })
#define MUTEX_LOCKED ((struct semaphore) { 0, 0, 0, NULL })
Defines MUTEX, MUTEX_LOCKED, semaphore (links are to index).

<asm/semaphore.h>+= [<-D]
<Definition of the semaphore structure>
<Definition of MUTEX and friends>

asmlinkage void down_failed(void /* special register calling convention */);
asmlinkage void up_wakeup(void /* special register calling convention */);

extern void __down(struct semaphore * sem);
extern void __up(struct semaphore * sem);

/*
 * This is ugly, but we want the default case to fall through.
 * "down_failed" is a special asm handler that calls the C
 * routine that actually waits. See arch/i386/lib/semaphore.S
 */
extern inline void down(struct semaphore * sem)
{
        __asm__ __volatile__(
                "# atomic down operation\n\t"
                "movl $1f,%%eax\n\t"
#ifdef __SMP__
                "lock ; "
#endif
                "decl 0(%0)\n\t"
                "js " SYMBOL_NAME_STR(down_failed) "\n"
                "1:\n"
                :/* no outputs */
                :"c" (sem)
                :"ax","dx","memory");
}

/*
 * Primitives to spin on a lock.  Needed only for SMP version.
 */
extern inline void get_buzz_lock(int *lock_ptr)
{
#ifdef __SMP__
        while (xchg(lock_ptr,1) != 0) ;
#endif
} /* get_buzz_lock */

extern inline void give_buzz_lock(int *lock_ptr)
{
#ifdef __SMP__
        *lock_ptr = 0 ;
#endif
} /* give_buzz_lock */

asmlinkage int down_failed_interruptible(void);  /* params in registers */

/*
 * This version waits in interruptible state so that the waiting
 * process can be killed.  The down_failed_interruptible routine
 * returns negative for signalled and zero for semaphore acquired.
 */
extern inline int down_interruptible(struct semaphore * sem)
{
        int     ret ;

        __asm__ __volatile__(
                "# atomic interruptible down operation\n\t"
                "movl $2f,%%eax\n\t"
#ifdef __SMP__
                "lock ; "
#endif
                "decl 0(%1)\n\t"
                "js " SYMBOL_NAME_STR(down_failed_interruptible) "\n\t"
                "xorl %%eax,%%eax\n"
                "2:\n"
                :"=a" (ret)
                :"c" (sem)
                :"ax","dx","memory");

        return(ret) ;
}

/*
 * Note! This is subtle. We jump to wake people up only if
 * the semaphore was negative (== somebody was waiting on it).
 * The default case (no contention) will result in NO
 * jumps for both down() and up().
 */
extern inline void up(struct semaphore * sem)
{
        __asm__ __volatile__(
                "# atomic up operation\n\t"
                "movl $1f,%%eax\n\t"
#ifdef __SMP__
                "lock ; "
#endif
                "incl 0(%0)\n\t"
                "jle " SYMBOL_NAME_STR(up_wakeup)
                "\n1:"
                :/* no outputs */
                :"c" (sem)
                :"ax", "dx", "memory");
}

#endif

File sched.c

Here's sched.c, it is the main kernel file. It contains scheduling primitives (sleep_on, wakeup, schedule, etc.) as well as a number of simple system call functions (of type getpid()), which just extract a field from the process table and the currently running task.

<sched.c>= [D->]
#include <linux/signal.h>
#include <linux/sched.h>
#include <linux/timer.h>
#include <linux/kernel.h>
#include <linux/kernel_stat.h>
#include <linux/fdreg.h>
#include <linux/errno.h>
#include <linux/time.h>
#include <linux/ptrace.h>
#include <linux/delay.h>
#include <linux/interrupt.h>
#include <linux/tqueue.h>
#include <linux/resource.h>
#include <linux/mm.h>
#include <linux/smp.h>

#include <asm/system.h>
#include <asm/io.h>
#include <asm/segment.h>
#include <asm/pgtable.h>
#include <asm/mmu_context.h>

#include <linux/timex.h>

Here are a few kernel variables.

<sched.c>+= [<-D->]
int securelevel = 0;                    /* system security level */

long tick = (1000000 + HZ/2) / HZ;      /* timer interrupt period */
volatile struct timeval xtime;          /* The current time */
int tickadj = 500/HZ ? 500/HZ : 1;      /* microsecs */

DECLARE_TASK_QUEUE(tq_timer);
DECLARE_TASK_QUEUE(tq_immediate);
DECLARE_TASK_QUEUE(tq_scheduler);

/*
 * phase-lock loop variables
 */
/* TIME_ERROR prevents overwriting the CMOS clock */
int time_state = TIME_ERROR;    /* clock synchronization status */
int time_status = STA_UNSYNC;   /* clock status bits */
long time_offset = 0;           /* time adjustment (us) */
long time_constant = 2;         /* pll time constant */
long time_tolerance = MAXFREQ;  /* frequency tolerance (ppm) */
long time_precision = 1;        /* clock precision (us) */
long time_maxerror = NTP_PHASE_LIMIT;   /* maximum error (us) */
long time_esterror = NTP_PHASE_LIMIT;   /* estimated error (us) */
long time_phase = 0;            /* phase offset (scaled us) */
long time_freq = ((1000000 + HZ/2) % HZ - HZ/2) << SHIFT_USEC;  /* frequency offset (scaled ppm) */
long time_adj = 0;              /* tick adjust (scaled 1 / HZ) */
long time_reftime = 0;          /* time at last adjustment (s) */

long time_adjust = 0;
long time_adjust_step = 0;

int need_resched = 0;
unsigned long event = 0;

extern int _setitimer(int, struct itimerval *, struct itimerval *);
unsigned int * prof_buffer = NULL;
unsigned long prof_len = 0;
unsigned long prof_shift = 0;

#define _S(nr) (1<<((nr)-1))

extern void mem_use(void);
extern unsigned long get_wchan(struct task_struct *);

static unsigned long init_kernel_stack[1024] = { STACK_MAGIC, };
unsigned long init_user_stack[1024] = { STACK_MAGIC, };
static struct vm_area_struct init_mmap = INIT_MMAP;
static struct fs_struct init_fs = INIT_FS;
static struct files_struct init_files = INIT_FILES;
static struct signal_struct init_signals = INIT_SIGNALS;

struct mm_struct init_mm = INIT_MM;
struct task_struct init_task = INIT_TASK;

unsigned long volatile jiffies=0;

struct task_struct *current_set[NR_CPUS];
struct task_struct *last_task_used_math = NULL;

Here's the actual process table. The task_struct is defined in sched.h. Slot 0 is filled with an ``empty'' task, used to keep track of the run queue for example (e.g., init_task.prev_run is the tail of the run queue).

The kernel_stat is a data structure (defined in kernel_stat.h) used to collect some kernel statistics (e.g., cpu usage, context switches). This data is used for example by the rstatd kernel statistics daemon (try the commands rup or ruptime).

<Definition of the process table>= (U-> U->)
struct task_struct * task[NR_TASKS] = {&init_task, };
struct kernel_stat kstat = { 0 };
Defines init_task, kstat, NR_TASKS, task, %use (links are to index).

<sched.c>+= [<-D->]
<Definition of the process table>

Function add_to_runqueue

I've removed the SMP code from the following functions to make them more readable.

The function add_to_run_queue takes a process and tries to add it to the ``ready to run'' queue. Remember that the ready queue was implemented using two pointers next_run and prev_run in the process table. If either the forward or backward pointer of the process under consideration is non-null, then it is assumed to already be on the run queue (this is the sanity test).

need_resched means ``needs rescheduling''. counter seems to be a kind of time slice left for the current process.

After the sanity test, both forward and backward pointers should be NULL. The number of running processes is incremented and p is added to the back of the run queue (remember that init_task is the process table entry 0): Consider the line

(p->prev_run = init_task.prev_run)->next_run = p;
First, init_task.prev_run (the tail of the run queue) is assigned to prev_run of p. The result of the expression is still the tail of the runqueue; so we set next_run to p. Now p is successfully added to the runqueue tail. The next two assignments deal with the pointers bewteen p and init_task.

init_task is basically INIT_TASK (the initial values of a task).

<Function to add a task to the run queue>= (U-> U->)
static inline void add_to_runqueue(struct task_struct * p) \
{
        <sanity test before adding: already on run queue?>

        if (p->policy != SCHED_OTHER || p->counter > current->counter + 3)
                need_resched = 1;
        nr_running++;
        (p->prev_run = init_task.prev_run)->next_run = p;
        p->next_run = &init_task;
        init_task.prev_run = p;
}
Defines add_to_runqueue, counter, current, init_task, need_resched, next_run, nr_running, policy, prev_run, SCHED_OTHER, %use (links are to index).

<sched.c>+= [<-D->]
<Function to add a task to the run queue>

printk is declared in kernel.h.

<sanity test before adding: already on run queue?>= (<-U)
#if 1   /* sanity tests */
        if (p->next_run || p->prev_run) {
                printk("task already on run-queue\n");
                return;
        }
#endif

Function del_from_runqueue

Care must be taken so that the idle task 0 is not removed from the run queue. The number of runable processes is decremented and the pointer structures of the run queue are fixed.

<Function to remove a task from the runqueue>= (U-> U->)
static inline void del_from_runqueue(struct task_struct * p)
{
        struct task_struct *next = p->next_run;
        struct task_struct *prev = p->prev_run;
        <sanity test before deleting: task not on run queue?>
        if (p == &init_task) {
                static int nr = 0;
                if (nr < 5) {
                        nr++;  printk("idle task may not sleep\n");
                }
                return;
        }
        nr_running--;
        next->prev_run = prev;
        prev->next_run = next;
        p->next_run = NULL;
        p->prev_run = NULL;
}
Defines del_from_runqueue, init_task, next_run, nr_running, prev_run, printk, %use (links are to index).

<sched.c>+= [<-D->]
<Function to remove a task from the runqueue>

<sanity test before deleting: task not on run queue?>= (<-U)
#if 1   /* sanity tests */
        if (!next || !prev) {
                printk("task not on run-queue\n");
                return;
        }
#endif

Function move_last_runqueue

This function moves a given task to the end of the run queue. First it is removed from the list and then added to the back of the list in much the same way as in the previous two functions.

<Function to move a task to the end of the run queue>= (U-> U->)
static inline void move_last_runqueue(struct task_struct * p)
{
        struct task_struct *next = p->next_run;
        struct task_struct *prev = p->prev_run;

        next->prev_run = prev;
        prev->next_run = next;

        p->next_run = &init_task;
        prev = init_task.prev_run;
        init_task.prev_run = p;
        p->prev_run = prev;
        prev->next_run = p;
}

<sched.c>+= [<-D->]
<Function to move a task to the end of the run queue>

Function wake_up_process

This function puts a given process on the run-queue if it's not already there. cli() is an assembler routine declared in asm/system.h.

Since the current process is always on the run-queue (except when the actual re-schedule is in progress), you're allowed to do the simpler current->state = TASK_RUNNING to mark yourself runnable without the overhead of this (only if you are the current process!).

<Function to wake up a process>= (U-> U->)
inline void wake_up_process(struct task_struct * p)
{
        unsigned long flags;

        save_flags(flags);
        cli();
        p->state = TASK_RUNNING;
        if (!p->next_run)
                add_to_runqueue(p);
        restore_flags(flags);
}

<sched.c>+= [<-D->]
<Function to wake up a process>

Function process_timeout

<sched.c>+= [<-D->]
static void process_timeout(unsigned long __data)
{
        struct task_struct * p = (struct task_struct *) __data;

        p->timeout = 0;
        wake_up_process(p);
}

Function goodness

This is the function that decides how desirable a process is. You can weigh different processes against each other depending on what CPU they've run on lately etc to try to handle cache and TLB miss penalties.

The return values are: -1000 (never select this), 0 (out of time, recalculate counters, but it might still be selected), +v (``goodness'' value, the larger, the better), > +1000 (realtime process, select this).

SMP bits have again been removed.

<Function to calculate goodness of a process>= (U-> U->)
static inline int goodness(struct task_struct * p, struct task_struct * prev, int this_cpu)
{
        int weight;
        <handle realtime processes>
        <calculate goodness according to remaining clock ticks>
        return weight;
}
Defines goodness, weight (links are to index).

<sched.c>+= [<-D->]
<Function to calculate goodness of a process>

``Realtime processes'' seem not to have SCHED_OTHER as their policy. Nevertheless, they have high priority: we'll try to select the first one on the runqueue by giving it a goodness of 1000+. Note that others may overtake him because they can even have higher priorities.

<handle realtime processes>= (<-U)
if (p->policy != SCHED_OTHER)
        return 1000 + p->rt_priority;

Give the process a first-approximation goodness value according to the number of clock-ticks it has left. We'll give a slight advantage to the current process.

Documentation says: ``Don't do any other calculations if the time slice is over...'' What does this refer to?

<calculate goodness according to remaining clock ticks>= (<-U)
weight = p->counter;
if (weight) {
        if (p == prev)
                weight += 1;
}

This looks funny but is the 68000 version of a deadlock prevention fix by Leonard N. Zubkoff (see original file for details).

<sched.c>+= [<-D->]
void allow_interrupts(void) {}

Function schedule

schedule() is the scheduler function. It's a very simple and nice scheduler: it's not perfect, but certainly works for most things. There's one goto in there which is ``interesting''. Note that task 0 is the `idle' task, which gets called when no other tasks can run. It cannot be killed, and it cannot sleep. The `state' information in task[0] is never used.

<sched.c>+= [<-D->]
asmlinkage void schedule(void)
{
        int c;
        struct task_struct * p;
        struct task_struct * prev, * next;
        unsigned long timeout = 0;
        int this_cpu=smp_processor_id();

        <check alarm, wake up any interruptible tasks that have got a signal>
        <move an exhausted RR process to be last>
        <SMP: definition of idle task>
        <the scheduler proper>
        return;

scheduling_in_interrupt:
        printk("Aiee: scheduling in interrupt %p\n",
                __builtin_return_address(0));
}

The function allow_interrupts is empty if we're not doing SMP on an Intel architecture.

run_task_queue is defined in tqueue.h.

<check alarm, wake up any interruptible tasks that have got a signal>= (<-U)
allow_interrupts();
if (intr_count)
        goto scheduling_in_interrupt;
if (bh_active & bh_mask) {
        intr_count = 1;
        do_bottom_half();
        intr_count = 0;
}
run_task_queue(&tq_scheduler);

<move an exhausted RR process to be last>= (<-U)
need_resched = 0;
prev = current;
cli();
if (!prev->counter && prev->policy == SCHED_RR) {
        prev->counter = prev->priority;
        move_last_runqueue(prev);
}
switch (prev->state) {
        case TASK_INTERRUPTIBLE:
            if (prev->signal & ~prev->blocked)
                    goto makerunnable;
            timeout = prev->timeout;
            if (timeout && (timeout <= jiffies)) {
                    prev->timeout = 0;
                    timeout = 0;
                makerunnable:
                    prev->state = TASK_RUNNING;
                    break;
            }
        default:
            del_from_runqueue(prev);
        case TASK_RUNNING:
}
p = init_task.next_run;
sti();

This is the actual scheduler. It scans through the run queue and selects the process with the highest ``goodness''. By default, the next process to run is the idle task.

If all runnable processes have a counter equal to 0, the counters are re-calculated (this is the if (!c) bit).

A context switch is only performed if the next process is really different from that currently running (prev!=next).

Note that there may appear new tasks on the run-queue during this, as interrupts are enabled. However, they will be put on front of the list, so our list starting at p is essentially fixed.

<the scheduler proper>= (<-U U->)
c = -1000;     next = idle_task; /* task[0] */
while (p != &init_task) {
        int weight = goodness(p, prev, this_cpu);
        if (weight > c)  c = weight, next = p;
        p = p->next_run;
}
if (!c) {
        for_each_task(p)  p->counter = (p->counter >> 1) + p->priority;
}
if (prev != next) {
        struct timer_list timer;
        kstat.context_swtch++;
        if (timeout) {
                init_timer(&timer);
                timer.expires = timeout;
                timer.data = (unsigned long) prev;
                timer.function = process_timeout;
                add_timer(&timer);
        }
        get_mmu_context(next);
        switch_to(prev,next);
        if (timeout)  del_timer(&timer);
}

<SMP: definition of idle task>= (<-U)
#ifdef __SMP__
        /*
         *      This is safe as we do not permit re-entry of schedule()
         */
        prev->processor = NO_PROC_ID;
#define idle_task (task[cpu_number_map[this_cpu]])
#else
#define idle_task (&init_task)
#endif  

<sched.c>+= [<-D->]
#ifndef __alpha__

/*
 * For backwards compatibility?  This can be done in libc so Alpha
 * and all newer ports shouldn't need it.
 */
asmlinkage int sys_pause(void)
{
        current->state = TASK_INTERRUPTIBLE;
        schedule();
        return -ERESTARTNOHAND;
}

#endif

Function wake_up

wake_up doesn't wake up stopped processes; they have to be awakened with signals or similar.

Note that this doesn't need cli/sti pairs: interrupts may not change the wait-queue structures directly, but only call wake_up() to wake a process. The process itself must remove the queue once it has woken.

<sched.c>+= [<-D->]
void wake_up(struct wait_queue **q)
{
        struct wait_queue *next;
        struct wait_queue *head;

        if (!q || !(next = *q))
                return;
        head = WAIT_QUEUE_HEAD(q);
        while (next != head) {
                struct task_struct *p = next->task;
                next = next->next;
                if (p != NULL) {
                        if ((p->state == TASK_UNINTERRUPTIBLE) ||
                            (p->state == TASK_INTERRUPTIBLE))
                                wake_up_process(p);
                }
                if (!next)
                        goto bad;
        }
        return;
bad:
        printk("wait_queue is bad (eip = %p)\n",
                __builtin_return_address(0));
        printk("        q = %p\n",q);
        printk("       *q = %p\n",*q);
}

Function wake_up_interruptible

<sched.c>+= [<-D->]
void wake_up_interruptible(struct wait_queue **q)
{
        struct wait_queue *next;
        struct wait_queue *head;

        if (!q || !(next = *q))
                return;
        head = WAIT_QUEUE_HEAD(q);
        while (next != head) {
                struct task_struct *p = next->task;
                next = next->next;
                if (p != NULL) {
                        if (p->state == TASK_INTERRUPTIBLE)
                                wake_up_process(p);
                }
                if (!next)
                        goto bad;
        }
        return;
bad:
        printk("wait_queue is bad (eip = %p)\n",
                __builtin_return_address(0));
        printk("        q = %p\n",q);
        printk("       *q = %p\n",*q);
}

Implementation of Semaphores

Semaphores are implemented using a two-way counter: The count variable is decremented for each process that tries to sleep, while the waking variable is incremented when the up() code goes to wake up waiting processes.

Notably, the inline up() and down() functions can efficiently test if they need to do any extra work (up needs to do something only if count was negative before the increment operation.

The following routine must execute atomically.

<sched.c>+= [<-D->]
static inline int waking_non_zero(struct semaphore *sem)
{
        int     ret ;
        long    flags ;

        get_buzz_lock(&sem->lock) ;
        save_flags(flags) ;
        cli() ;

        if ((ret = (sem->waking > 0)))
                sem->waking-- ;

        restore_flags(flags) ;
        give_buzz_lock(&sem->lock) ;
        return(ret) ;
}
Defines get_buzz_lock, give_buzz_lock, restore_flags, save_flags, %use, waking_non_zero (links are to index).

When __up() is called, the count was negative before incrementing it, and we need to wake up somebody.

This routine adds one to the count of processes that need to wake up and exit. All waiting processes actually wake up but only the one that gets to the waking field first will gate through and acquire the semaphore. The others will go back to sleep.

Note that these functions are only called when there is contention on the lock, and as such all this is the ``non-critical'' part of the whole semaphore business. The critical part is the inline stuff in <asm/semaphore.h> here we want to avoid any extra jumps and calls.

<sched.c>+= [<-D->]
void __up(struct semaphore *sem)
{
        atomic_inc(&sem->waking) ;
        wake_up(&sem->wait);
}

Perform the down function. Return zero for semaphore acquired, return negative for signalled out of the function.

If called from __down, the return is ignored and the wait loop is not interruptible. This means that a task waiting on a semaphore using down() cannot be killed until someone does an up() on the semaphore.

If called from __down_interruptible, the return value gets checked upon return. If the return value is negative then the task continues with the negative value in the return register (it can be tested by the caller).

Either form may be used in conjunction with up().

<sched.c>+= [<-D->]
int __do_down(struct semaphore * sem, int task_state)
{
        struct task_struct *tsk = current;
        struct wait_queue wait = { tsk, NULL };
        int               ret = 0 ;

        tsk->state = task_state;
        add_wait_queue(&sem->wait, &wait);

        /*
         * Ok, we're set up.  sem->count is known to be less than zero
         * so we must wait.
         *
         * We can let go the lock for purposes of waiting.
         * We re-acquire it after awaking so as to protect
         * all semaphore operations.
         *
         * If "up()" is called before we call waking_non_zero() then
         * we will catch it right away.  If it is called later then
         * we will have to go through a wakeup cycle to catch it.
         *
         * Multiple waiters contend for the semaphore lock to see
         * who gets to gate through and who has to wait some more.
         */
        for (;;)
        {
                if (waking_non_zero(sem))       /* are we waking up?  */
                    break ;                     /* yes, exit loop */

                if (   task_state == TASK_INTERRUPTIBLE
                    && (tsk->signal & ~tsk->blocked)    /* signalled */
                   )
                {
                    ret = -EINTR ;              /* interrupted */
                    atomic_inc(&sem->count) ;   /* give up on down operation */
                    break ;
                }

                schedule();
                tsk->state = task_state;
        }

        tsk->state = TASK_RUNNING;
        remove_wait_queue(&sem->wait, &wait);
        return(ret) ;

} /* __do_down */

<sched.c>+= [<-D->]
void __down(struct semaphore * sem)
{
        __do_down(sem,TASK_UNINTERRUPTIBLE) ; 
}

int __down_interruptible(struct semaphore * sem)
{
        return(__do_down(sem,TASK_INTERRUPTIBLE)) ; 
}

<Generic sleep routine for semaphores>= (U-> U->)
static inline void __sleep_on(struct wait_queue **p, int state)
{
        unsigned long flags;
        struct wait_queue wait = { current, NULL };

        if (!p)
                return;
        if (current == task[0])
                panic("task[0] trying to sleep");
        current->state = state;
        save_flags(flags);
        cli();
        __add_wait_queue(p, &wait);
        sti();
        schedule();
        cli();
        __remove_wait_queue(p, &wait);
        restore_flags(flags);
}
Defines __add_wait_queue, cli, __remove_wait_queue, restore_flags, save_flags, schedule, __sleep_on, state, sti, task[0], %use (links are to index).

<sched.c>+= [<-D->]
<Generic sleep routine for semaphores>
void interruptible_sleep_on(struct wait_queue **p)
{
        __sleep_on(p,TASK_INTERRUPTIBLE);
}

void sleep_on(struct wait_queue **p)
{
        __sleep_on(p,TASK_UNINTERRUPTIBLE);
}
Defines interruptible_sleep_on, __sleep_on, sleep_on, TASK_INTERRUPTIBLE, TASK_UNINTERRUPTIBLE, %use (links are to index).

Timers

<sched.c>+= [<-D->]
#define TVN_BITS 6
#define TVR_BITS 8
#define TVN_SIZE (1 << TVN_BITS)
#define TVR_SIZE (1 << TVR_BITS)
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)

#define SLOW_BUT_DEBUGGING_TIMERS 0

struct timer_vec {
        int index;
        struct timer_list *vec[TVN_SIZE];
};

struct timer_vec_root {
        int index;
        struct timer_list *vec[TVR_SIZE];
};

static struct timer_vec tv5 = { 0 };
static struct timer_vec tv4 = { 0 };
static struct timer_vec tv3 = { 0 };
static struct timer_vec tv2 = { 0 };
static struct timer_vec_root tv1 = { 0 };

static struct timer_vec * const tvecs[] = {
        (struct timer_vec *)&tv1, &tv2, &tv3, &tv4, &tv5
};

#define NOOF_TVECS (sizeof(tvecs) / sizeof(tvecs[0]))

static unsigned long timer_jiffies = 0;

static inline void insert_timer(struct timer_list *timer,
                                struct timer_list **vec, int idx)
{
        if ((timer->next = vec[idx]))
                vec[idx]->prev = timer;
        vec[idx] = timer;
        timer->prev = (struct timer_list *)&vec[idx];
}

static inline void internal_add_timer(struct timer_list *timer)
{
        /*
         * must be cli-ed when calling this
         */
        unsigned long expires = timer->expires;
        unsigned long idx = expires - timer_jiffies;

        if (idx < TVR_SIZE) {
                int i = expires & TVR_MASK;
                insert_timer(timer, tv1.vec, i);
        } else if (idx < 1 << (TVR_BITS + TVN_BITS)) {
                int i = (expires >> TVR_BITS) & TVN_MASK;
                insert_timer(timer, tv2.vec, i);
        } else if (idx < 1 << (TVR_BITS + 2 * TVN_BITS)) {
                int i = (expires >> (TVR_BITS + TVN_BITS)) & TVN_MASK;
                insert_timer(timer, tv3.vec, i);
        } else if (idx < 1 << (TVR_BITS + 3 * TVN_BITS)) {
                int i = (expires >> (TVR_BITS + 2 * TVN_BITS)) & TVN_MASK;
                insert_timer(timer, tv4.vec, i);
        } else if (expires < timer_jiffies) {
                /* can happen if you add a timer with expires == jiffies,
                 * or you set a timer to go off in the past
                 */
                insert_timer(timer, tv1.vec, tv1.index);
        } else if (idx < 0xffffffffUL) {
                int i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
                insert_timer(timer, tv5.vec, i);
        } else {
                /* Can only get here on architectures with 64-bit jiffies */
                timer->next = timer->prev = timer;
        }
}

void add_timer(struct timer_list *timer)
{
        unsigned long flags;
        save_flags(flags);
        cli();
#if SLOW_BUT_DEBUGGING_TIMERS
        if (timer->next || timer->prev) {
                printk("add_timer() called with non-zero list from %p\n",
                       __builtin_return_address(0));
                goto out;
        }
#endif
        internal_add_timer(timer);
#if SLOW_BUT_DEBUGGING_TIMERS
out:
#endif
        restore_flags(flags);
}

static inline int detach_timer(struct timer_list *timer)
{
        int ret = 0;
        struct timer_list *next, *prev;
        next = timer->next;
        prev = timer->prev;
        if (next) {
                next->prev = prev;
        }
        if (prev) {
                ret = 1;
                prev->next = next;
        }
        return ret;
}


int del_timer(struct timer_list * timer)
{
        int ret;
        unsigned long flags;
        save_flags(flags);
        cli();
        ret = detach_timer(timer);
        timer->next = timer->prev = 0;
        restore_flags(flags);
        return ret;
}

static inline void cascade_timers(struct timer_vec *tv)
{
        /* cascade all the timers from tv up one level */
        struct timer_list *timer;
        timer = tv->vec[tv->index];
        /*
         * We are removing _all_ timers from the list, so we don't  have to
         * detach them individually, just clear the list afterwards.
         */
        while (timer) {
                struct timer_list *tmp = timer;
                timer = timer->next;
                internal_add_timer(tmp);
        }
        tv->vec[tv->index] = NULL;
        tv->index = (tv->index + 1) & TVN_MASK;
}

static inline void run_timer_list(void)
{
        cli();
        while ((long)(jiffies - timer_jiffies) >= 0) {
                struct timer_list *timer;
                if (!tv1.index) {
                        int n = 1;
                        do {
                                cascade_timers(tvecs[n]);
                        } while (tvecs[n]->index == 1 && ++n < NOOF_TVECS);
                }
                while ((timer = tv1.vec[tv1.index])) {
                        void (*fn)(unsigned long) = timer->function;
                        unsigned long data = timer->data;
                        detach_timer(timer);
                        timer->next = timer->prev = NULL;
                        sti();
                        fn(data);
                        cli();
                }
                ++timer_jiffies; 
                tv1.index = (tv1.index + 1) & TVR_MASK;
        }
        sti();
}

static inline void run_old_timers(void)
{
        struct timer_struct *tp;
        unsigned long mask;

        for (mask = 1, tp = timer_table+0 ; mask ; tp++,mask += mask) {
                if (mask > timer_active)
                        break;
                if (!(mask & timer_active))
                        continue;
                if (tp->expires > jiffies)
                        continue;
                timer_active &= ~mask;
                tp->fn();
                sti();
        }
}

void tqueue_bh(void)
{
        run_task_queue(&tq_timer);
}

void immediate_bh(void)
{
        run_task_queue(&tq_immediate);
}

unsigned long timer_active = 0;
struct timer_struct timer_table[32];

A location to store the last 3 load averages.

<sched.c>+= [<-D->]
unsigned long avenrun[3] = { 0,0,0 };

Function count_active_tasks

Counts the number of active tasks (counted in fixed-point numbers). */

<sched.c>+= [<-D->]
static unsigned long count_active_tasks(void)
{
        struct task_struct **p;
        unsigned long nr = 0;

        for(p = &LAST_TASK; p > &FIRST_TASK; --p)
                if (*p && ((*p)->state == TASK_RUNNING ||
                           (*p)->state == TASK_UNINTERRUPTIBLE ||
                           (*p)->state == TASK_SWAPPING))
                        nr += FIXED_1;
        <SMP: adjust active count for multiple processors>
        return nr;
}

<SMP: adjust active count for multiple processors>= (<-U)
#ifdef __SMP__
        nr-=(smp_num_cpus-1)*FIXED_1;
#endif                  

Function calc_load

This function uses the number of active tasks to calculate the load average.

<sched.c>+= [<-D->]
static inline void calc_load(unsigned long ticks)
{
        unsigned long active_tasks; /* fixed-point */
        static int count = LOAD_FREQ;

        count -= ticks;
        if (count < 0) {
                count += LOAD_FREQ;
                active_tasks = count_active_tasks();
                CALC_LOAD(avenrun[0], EXP_1, active_tasks);
                CALC_LOAD(avenrun[1], EXP_5, active_tasks);
                CALC_LOAD(avenrun[2], EXP_15, active_tasks);
        }
}

<sched.c>+= [<-D->]
/*
 * this routine handles the overflow of the microsecond field
 *
 * The tricky bits of code to handle the accurate clock support
 * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame.
 * They were originally developed for SUN and DEC kernels.
 * All the kudos should go to Dave for this stuff.
 *
 */
static void second_overflow(void)
{
    long ltemp;

    /* Bump the maxerror field */
    time_maxerror += time_tolerance >> SHIFT_USEC;
    if ( time_maxerror > NTP_PHASE_LIMIT ) {
        time_maxerror = NTP_PHASE_LIMIT;
        time_state = TIME_ERROR;        /* p. 17, sect. 4.3, (b) */
        time_status |= STA_UNSYNC;
    }

    /*
     * Leap second processing. If in leap-insert state at
     * the end of the day, the system clock is set back one
     * second; if in leap-delete state, the system clock is
     * set ahead one second. The microtime() routine or
     * external clock driver will insure that reported time
     * is always monotonic. The ugly divides should be
     * replaced.
     */
    switch (time_state) {

    case TIME_OK:
        if (time_status & STA_INS)
            time_state = TIME_INS;
        else if (time_status & STA_DEL)
            time_state = TIME_DEL;
        break;

    case TIME_INS:
        if (xtime.tv_sec % 86400 == 0) {
            xtime.tv_sec--;
            time_state = TIME_OOP;
            printk(KERN_NOTICE "Clock: inserting leap second 23:59:60 UTC\n");
        }
        break;

    case TIME_DEL:
        if ((xtime.tv_sec + 1) % 86400 == 0) {
            xtime.tv_sec++;
            time_state = TIME_WAIT;
            printk(KERN_NOTICE "Clock: deleting leap second 23:59:59 UTC\n");
        }
        break;

    case TIME_OOP:
        time_state = TIME_WAIT;
        break;

    case TIME_WAIT:
        if (!(time_status & (STA_INS | STA_DEL)))
            time_state = TIME_OK;
    }

    /*
     * Compute the phase adjustment for the next second. In
     * PLL mode, the offset is reduced by a fixed factor
     * times the time constant. In FLL mode the offset is
     * used directly. In either mode, the maximum phase
     * adjustment for each second is clamped so as to spread
     * the adjustment over not more than the number of
     * seconds between updates.
     */
    if (time_offset < 0) {
        ltemp = -time_offset;
        if (!(time_status & STA_FLL))
            ltemp >>= SHIFT_KG + time_constant;
        if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
            ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
        time_offset += ltemp;
        time_adj = -ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
    } else {
        ltemp = time_offset;
        if (!(time_status & STA_FLL))
            ltemp >>= SHIFT_KG + time_constant;
        if (ltemp > (MAXPHASE / MINSEC) << SHIFT_UPDATE)
            ltemp = (MAXPHASE / MINSEC) << SHIFT_UPDATE;
        time_offset -= ltemp;
        time_adj = ltemp << (SHIFT_SCALE - SHIFT_HZ - SHIFT_UPDATE);
    }

    /*
     * Compute the frequency estimate and additional phase
     * adjustment due to frequency error for the next
     * second. When the PPS signal is engaged, gnaw on the
     * watchdog counter and update the frequency computed by
     * the pll and the PPS signal.
     */
    pps_valid++;
    if (pps_valid == PPS_VALID) {       /* PPS signal lost */
        pps_jitter = MAXTIME;
        pps_stabil = MAXFREQ;
        time_status &= ~(STA_PPSSIGNAL | STA_PPSJITTER |
                         STA_PPSWANDER | STA_PPSERROR);
    }
    ltemp = time_freq + pps_freq;
    if (ltemp < 0)
        time_adj -= -ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);
    else
        time_adj +=  ltemp >> (SHIFT_USEC + SHIFT_HZ - SHIFT_SCALE);

#if HZ == 100
    /* Compensate for (HZ==100) != (1 << SHIFT_HZ).
     * Add 25% and 3.125% to get 128.125; => only 0.125% error (p. 14)
     */
    if (time_adj < 0)
        time_adj -= (-time_adj >> 2) + (-time_adj >> 5);
    else
        time_adj += (time_adj >> 2) + (time_adj >> 5);
#endif
}

/* in the NTP reference this is called "hardclock()" */
static void update_wall_time_one_tick(void)
{
        if ( (time_adjust_step = time_adjust) != 0 ) {
            /* We are doing an adjtime thing. 
             *
             * Prepare time_adjust_step to be within bounds.
             * Note that a positive time_adjust means we want the clock
             * to run faster.
             *
             * Limit the amount of the step to be in the range
             * -tickadj .. +tickadj
             */
             if (time_adjust > tickadj)
                time_adjust_step = tickadj;
             else if (time_adjust < -tickadj)
                time_adjust_step = -tickadj;
             
            /* Reduce by this step the amount of time left  */
            time_adjust -= time_adjust_step;
        }
        xtime.tv_usec += tick + time_adjust_step;
        /*
         * Advance the phase, once it gets to one microsecond, then
         * advance the tick more.
         */
        time_phase += time_adj;
        if (time_phase <= -FINEUSEC) {
                long ltemp = -time_phase >> SHIFT_SCALE;
                time_phase += ltemp << SHIFT_SCALE;
                xtime.tv_usec -= ltemp;
        }
        else if (time_phase >= FINEUSEC) {
                long ltemp = time_phase >> SHIFT_SCALE;
                time_phase -= ltemp << SHIFT_SCALE;
                xtime.tv_usec += ltemp;
        }
}

/*
 * Using a loop looks inefficient, but "ticks" is
 * usually just one (we shouldn't be losing ticks,
 * we're doing this this way mainly for interrupt
 * latency reasons, not because we think we'll
 * have lots of lost timer ticks
 */
static void update_wall_time(unsigned long ticks)
{
        do {
                ticks--;
                update_wall_time_one_tick();
        } while (ticks);

        if (xtime.tv_usec >= 1000000) {
            xtime.tv_usec -= 1000000;
            xtime.tv_sec++;
            second_overflow();
        }
}

static inline void do_process_times(struct task_struct *p,
        unsigned long user, unsigned long system)
{
        long psecs;

        p->utime += user;
        p->stime += system;

        psecs = (p->stime + p->utime) / HZ;
        if (psecs > p->rlim[RLIMIT_CPU].rlim_cur) {
                /* Send SIGXCPU every second.. */
                if (psecs * HZ == p->stime + p->utime)
                        send_sig(SIGXCPU, p, 1);
                /* and SIGKILL when we go over max.. */
                if (psecs > p->rlim[RLIMIT_CPU].rlim_max)
                        send_sig(SIGKILL, p, 1);
        }
}

static inline void do_it_virt(struct task_struct * p, unsigned long ticks)
{
        unsigned long it_virt = p->it_virt_value;

        if (it_virt) {
                if (it_virt <= ticks) {
                        it_virt = ticks + p->it_virt_incr;
                        send_sig(SIGVTALRM, p, 1);
                }
                p->it_virt_value = it_virt - ticks;
        }
}

static inline void do_it_prof(struct task_struct * p, unsigned long ticks)
{
        unsigned long it_prof = p->it_prof_value;

        if (it_prof) {
                if (it_prof <= ticks) {
                        it_prof = ticks + p->it_prof_incr;
                        send_sig(SIGPROF, p, 1);
                }
                p->it_prof_value = it_prof - ticks;
        }
}

static __inline__ void update_one_process(struct task_struct *p,
        unsigned long ticks, unsigned long user, unsigned long system)
{
        do_process_times(p, user, system);
        do_it_virt(p, user);
        do_it_prof(p, ticks);
}       

static void update_process_times(unsigned long ticks, unsigned long system)
{
#ifndef  __SMP__
        struct task_struct * p = current;
        unsigned long user = ticks - system;
        if (p->pid) {
                p->counter -= ticks;
                if (p->counter < 0) {
                        p->counter = 0;
                        need_resched = 1;
                }
                if (p->priority < DEF_PRIORITY)
                        kstat.cpu_nice += user;
                else
                        kstat.cpu_user += user;
                kstat.cpu_system += system;
        }
        update_one_process(p, ticks, user, system);
#else
        int cpu,j;
        cpu = smp_processor_id();
        for (j=0;j<smp_num_cpus;j++)
        {
                int i = cpu_logical_map[j];
                struct task_struct *p;
                
#ifdef __SMP_PROF__
                if (test_bit(i,&smp_idle_map)) 
                        smp_idle_count[i]++;
#endif
                p = current_set[i];
                /*
                 * Do we have a real process?
                 */
                if (p->pid) {
                        /* assume user-mode process */
                        unsigned long utime = ticks;
                        unsigned long stime = 0;
                        if (cpu == i) {
                                utime = ticks-system;
                                stime = system;
                        } else if (smp_proc_in_lock[j]) {
                                utime = 0;
                                stime = ticks;
                        }
                        update_one_process(p, ticks, utime, stime);

                        if (p->priority < DEF_PRIORITY)
                                kstat.cpu_nice += utime;
                        else
                                kstat.cpu_user += utime;
                        kstat.cpu_system += stime;

                        p->counter -= ticks;
                        if (p->counter >= 0)
                                continue;
                        p->counter = 0;
                } else {
                        /*
                         * Idle processor found, do we have anything
                         * we could run?
                         */
                        if (!(0x7fffffff & smp_process_available))
                                continue;
                }
                /* Ok, we should reschedule, do the magic */
                if (i==cpu)
                        need_resched = 1;
                else
                        smp_message_pass(i, MSG_RESCHEDULE, 0L, 0);
        }
#endif
}

static unsigned long lost_ticks = 0;
static unsigned long lost_ticks_system = 0;

static inline void update_times(void)
{
        unsigned long ticks;

        ticks = xchg(&lost_ticks, 0);

        if (ticks) {
                unsigned long system;

                system = xchg(&lost_ticks_system, 0);
                calc_load(ticks);
                update_wall_time(ticks);
                update_process_times(ticks, system);
        }
}

static void timer_bh(void)
{
        update_times();
        run_old_timers();
        run_timer_list();
}

void do_timer(struct pt_regs * regs)
{
        (*(unsigned long *)&jiffies)++;
        lost_ticks++;
        mark_bh(TIMER_BH);
        if (!user_mode(regs)) {
                lost_ticks_system++;
                if (prof_buffer && current->pid) {
                        extern int _stext;
                        unsigned long ip = instruction_pointer(regs);
                        ip -= (unsigned long) &_stext;
                        ip >>= prof_shift;
                        if (ip < prof_len)
                                prof_buffer[ip]++;
                }
        }
        if (tq_timer)
                mark_bh(TQUEUE_BH);
}

#ifndef __alpha__

/*
 * For backwards compatibility?  This can be done in libc so Alpha
 * and all newer ports shouldn't need it.
 */
asmlinkage unsigned int sys_alarm(unsigned int seconds)
{
        struct itimerval it_new, it_old;
        unsigned int oldalarm;

        it_new.it_interval.tv_sec = it_new.it_interval.tv_usec = 0;
        it_new.it_value.tv_sec = seconds;
        it_new.it_value.tv_usec = 0;
        _setitimer(ITIMER_REAL, &it_new, &it_old);
        oldalarm = it_old.it_value.tv_sec;
        /* ehhh.. We can't return 0 if we have an alarm pending.. */
        /* And we'd better return too much than too little anyway */
        if (it_old.it_value.tv_usec)
                oldalarm++;
        return oldalarm;
}

Access functions to the current process

The Alpha uses getxpid, getxuid, and getxgid instead. Maybe this should be moved into arch/i386 instead?

<sched.c>+= [<-D->]
asmlinkage int sys_getpid(void)
{
        return current->pid;
}

asmlinkage int sys_getppid(void)
{
        return current->p_opptr->pid;
}

asmlinkage int sys_getuid(void)
{
        return current->uid;
}

asmlinkage int sys_geteuid(void)
{
        return current->euid;
}

asmlinkage int sys_getgid(void)
{
        return current->gid;
}

asmlinkage int sys_getegid(void)
{
        return current->egid;
}

<sched.c>+= [<-D->]
/*
 * This has been replaced by sys_setpriority.  Maybe it should be
 * moved into the arch dependent tree for those ports that require
 * it for backward compatibility?
 */
asmlinkage int sys_nice(int increment)
{
        unsigned long newprio;
        int increase = 0;

        newprio = increment;
        if (increment < 0) {
                if (!suser())
                        return -EPERM;
                newprio = -increment;
                increase = 1;
        }
        if (newprio > 40)
                newprio = 40;
        /*
         * do a "normalization" of the priority (traditionally
         * unix nice values are -20..20, linux doesn't really
         * use that kind of thing, but uses the length of the
         * timeslice instead (default 150 msec). The rounding is
         * why we want to avoid negative values.
         */
        newprio = (newprio * DEF_PRIORITY + 10) / 20;
        increment = newprio;
        if (increase)
                increment = -increment;
        newprio = current->priority - increment;
        if ((signed) newprio < 1)
                newprio = 1;
        if (newprio > DEF_PRIORITY*2)
                newprio = DEF_PRIORITY*2;
        current->priority = newprio;
        return 0;
}

#endif

Function find_process_by_pid

<sched.c>+= [<-D->]
static struct task_struct *find_process_by_pid(pid_t pid) {
        struct task_struct *p, *q;

        if (pid == 0)
                p = current;
        else {
                p = 0;
                for_each_task(q) {
                        if (q && q->pid == pid) {
                                p = q;
                                break;
                        }
                }
        }
        return p;
}

Function setscheduler

This function can be used to adjust the scheduling policy for a process. The sched_param argument basically is a priority value.

<sched.c>+= [<-D->]
static int setscheduler(pid_t pid, int policy, 
                        struct sched_param *param)
{
        int error;
        struct sched_param lp;
        struct task_struct *p;

        if (!param || pid < 0)
                return -EINVAL;

        error = verify_area(VERIFY_READ, param, sizeof(struct sched_param));
        if (error)
                return error;
        memcpy_fromfs(&lp, param, sizeof(struct sched_param));

        p = find_process_by_pid(pid);
        if (!p)
                return -ESRCH;
                        
        if (policy < 0)
                policy = p->policy;
        else if (policy != SCHED_FIFO && policy != SCHED_RR &&
                 policy != SCHED_OTHER)
                return -EINVAL;
        
        /*
         * Valid priorities for SCHED_FIFO and SCHED_RR are 1..99, valid
         * priority for SCHED_OTHER is 0.
         */
        if (lp.sched_priority < 0 || lp.sched_priority > 99)
                return -EINVAL;
        if ((policy == SCHED_OTHER) != (lp.sched_priority == 0))
                return -EINVAL;

        if ((policy == SCHED_FIFO || policy == SCHED_RR) && !suser())
                return -EPERM;
        if ((current->euid != p->euid) && (current->euid != p->uid) &&
            !suser())
                return -EPERM;

        p->policy = policy;
        p->rt_priority = lp.sched_priority;
        cli();
        if (p->next_run)
                move_last_runqueue(p);
        sti();
        need_resched = 1;
        return 0;
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_setscheduler(pid_t pid, int policy, 
                                      struct sched_param *param)
{
        return setscheduler(pid, policy, param);
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_setparam(pid_t pid, struct sched_param *param)
{
        return setscheduler(pid, -1, param);
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_getscheduler(pid_t pid)
{
        struct task_struct *p;

        if (pid < 0)
                return -EINVAL;

        p = find_process_by_pid(pid);
        if (!p)
                return -ESRCH;
                        
        return p->policy;
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_getparam(pid_t pid, struct sched_param *param)
{
        int error;
        struct task_struct *p;
        struct sched_param lp;

        if (!param || pid < 0)
                return -EINVAL;

        error = verify_area(VERIFY_WRITE, param, sizeof(struct sched_param));
        if (error)
                return error;

        p = find_process_by_pid(pid);
        if (!p)
                return -ESRCH;

        lp.sched_priority = p->rt_priority;
        memcpy_tofs(param, &lp, sizeof(struct sched_param));

        return 0;
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_yield(void)
{
        cli();
        move_last_runqueue(current);
        current->counter = 0;
        need_resched = 1;
        sti();
        return 0;
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_get_priority_max(int policy)
{
        switch (policy) {
              case SCHED_FIFO:
              case SCHED_RR:
                return 99;
              case SCHED_OTHER:
                return 0;
        }

        return -EINVAL;
}

asmlinkage int sys_sched_get_priority_min(int policy)
{
        switch (policy) {
              case SCHED_FIFO:
              case SCHED_RR:
                return 1;
              case SCHED_OTHER:
                return 0;
        }

        return -EINVAL;
}

<sched.c>+= [<-D->]
asmlinkage int sys_sched_rr_get_interval(pid_t pid, struct timespec *interval)
{
        int error;
        struct timespec t;

        error = verify_area(VERIFY_WRITE, interval, sizeof(struct timespec));
        if (error)
                return error;

        /* Values taken from 2.1.38 */  
        t.tv_sec = 0;
        t.tv_nsec = 150000;   /* is this right for non-intel architecture too?*/
        memcpy_tofs(interval, &t, sizeof(struct timespec));

        return 0;
}

/*
 * change timeval to jiffies, trying to avoid the 
 * most obvious overflows..
 */
static unsigned long timespectojiffies(struct timespec *value)
{
        unsigned long sec = (unsigned) value->tv_sec;
        long nsec = value->tv_nsec;

        if (sec > (LONG_MAX / HZ))
                return LONG_MAX;
        nsec += 1000000000L / HZ - 1;
        nsec /= 1000000000L / HZ;
        return HZ * sec + nsec;
}

static void jiffiestotimespec(unsigned long jiffies, struct timespec *value)
{
        value->tv_nsec = (jiffies % HZ) * (1000000000L / HZ);
        value->tv_sec = jiffies / HZ;
        return;
}

asmlinkage int sys_nanosleep(struct timespec *rqtp, struct timespec *rmtp)
{
        int error;
        struct timespec t;
        unsigned long expire;

        error = verify_area(VERIFY_READ, rqtp, sizeof(struct timespec));
        if (error)
                return error;
        memcpy_fromfs(&t, rqtp, sizeof(struct timespec));
        if (rmtp) {
                error = verify_area(VERIFY_WRITE, rmtp,
                                    sizeof(struct timespec));
                if (error)
                        return error;
        }

        if (t.tv_nsec >= 1000000000L || t.tv_nsec < 0 || t.tv_sec < 0)
                return -EINVAL;

        if (t.tv_sec == 0 && t.tv_nsec <= 2000000L &&
            current->policy != SCHED_OTHER) {
                /*
                 * Short delay requests up to 2 ms will be handled with
                 * high precision by a busy wait for all real-time processes.
                 */
                udelay((t.tv_nsec + 999) / 1000);
                return 0;
        }

        expire = timespectojiffies(&t) + (t.tv_sec || t.tv_nsec) + jiffies;
        current->timeout = expire;
        current->state = TASK_INTERRUPTIBLE;
        schedule();

        if (expire > jiffies) {
                if (rmtp) {
                        jiffiestotimespec(expire - jiffies -
                                          (expire > jiffies + 1), &t);
                        memcpy_tofs(rmtp, &t, sizeof(struct timespec));
                }
                return -EINTR;
        }

        return 0;
}

Function show_task

<sched.c>+= [<-D->]
static void show_task(int nr,struct task_struct * p)
{
        unsigned long free;
        static const char * stat_nam[] = { "R", "S", "D", "Z", "T", "W" };

        printk("%-8s %3d ", p->comm, (p == current) ? -nr : nr);
        if (((unsigned) p->state) < sizeof(stat_nam)/sizeof(char *))
                printk(stat_nam[p->state]);
        else
                printk(" ");
#if ((~0UL) == 0xffffffff)
        if (p == current)
                printk(" current  ");
        else
                printk(" %08lX ", thread_saved_pc(&p->tss));
        printk("%08lX ", get_wchan(p));
#else
        if (p == current)
                printk("   current task   ");
        else
                printk(" %016lx ", thread_saved_pc(&p->tss));
        printk("%08lX ", get_wchan(p) & 0xffffffffL);
#endif
        for (free = 1; free < PAGE_SIZE/sizeof(long) ; free++) {
                if (((unsigned long *)p->kernel_stack_page)[free])
                        break;
        }
        printk("%5lu %5d %6d ", free*sizeof(long), p->pid, p->p_pptr->pid);
        if (p->p_cptr)
                printk("%5d ", p->p_cptr->pid);
        else
                printk("      ");
        if (p->p_ysptr)
                printk("%7d", p->p_ysptr->pid);
        else
                printk("       ");
        if (p->p_osptr)
                printk(" %5d\n", p->p_osptr->pid);
        else
                printk("\n");
}

<sched.c>+= [<-D->]
void show_state(void)
{
        int i;

#if ((~0UL) == 0xffffffff)
        printk("\n"
               "                                  free                        sibling\n");
        printk("  task             PC     wchan   stack   pid father child younger older\n");
#else
        printk("\n"
               "                                           free                        sibling\n");
        printk("  task                 PC         wchan    stack   pid father child younger older\n");
#endif
        for (i=0 ; i<NR_TASKS ; i++)
                if (task[i])
                        show_task(i,task[i]);
}

Function sched_init

<sched.c>+= [<-D]
void sched_init(void)
{
        /*
         *      We have to do a little magic to get the first
         *      process right in SMP mode.
         */
        int cpu=smp_processor_id();
#ifndef __SMP__ 
        current_set[cpu]=&init_task;
#else
        init_task.processor=cpu;
        for(cpu = 0; cpu < NR_CPUS; cpu++)
                current_set[cpu] = &init_task;
#endif
        init_bh(TIMER_BH, timer_bh);
        init_bh(TQUEUE_BH, tqueue_bh);
        init_bh(IMMEDIATE_BH, immediate_bh);
}

File kernel/fork.c

<fork.c>= [D->]
/*
 *  linux/kernel/fork.c
 *
 *  Copyright (C) 1991, 1992  Linus Torvalds
 */

/*
 *  'fork.c' contains the help-routines for the 'fork' system call
 * (see also system_call.s).
 * Fork is rather simple, once you get the hang of it, but the memory
 * management can be a bitch. See 'mm/mm.c': 'copy_page_tables()'
 */

#include <linux/errno.h>
#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/stddef.h>
#include <linux/unistd.h>
#include <linux/ptrace.h>
#include <linux/malloc.h>
#include <linux/ldt.h>
#include <linux/smp.h>

#include <asm/segment.h>
#include <asm/system.h>
#include <asm/pgtable.h>

<Definition of the number of tasks and friends>= (U-> U->)
int nr_tasks=1;
int nr_running=1;
unsigned long int total_forks=0;        /* Handle normal Linux uptimes. */
int last_pid=0;

<fork.c>+= [<-D]
<Definition of the number of tasks and friends>

static inline int find_empty_process(void)
{
        int i;

        if (nr_tasks >= NR_TASKS - MIN_TASKS_LEFT_FOR_ROOT) {
                if (current->uid)
                        return -EAGAIN;
        }
        if (current->uid) {
                long max_tasks = current->rlim[RLIMIT_NPROC].rlim_cur;

                max_tasks--;    /* count the new process.. */
                if (max_tasks < nr_tasks) {
                        struct task_struct *p;
                        for_each_task (p) {
                                if (p->uid == current->uid)
                                        if (--max_tasks < 0)
                                                return -EAGAIN;
                        }
                }
        }
        for (i = 0 ; i < NR_TASKS ; i++) {
                if (!task[i])
                        return i;
        }
        return -EAGAIN;
}

static int get_pid(unsigned long flags)
{
        struct task_struct *p;

        if (flags & CLONE_PID)
                return current->pid;
repeat:
        if ((++last_pid) & 0xffff8000)
                last_pid=1;
        for_each_task (p) {
                if (p->pid == last_pid ||
                    p->pgrp == last_pid ||
                    p->session == last_pid)
                        goto repeat;
        }
        return last_pid;
}

static inline int dup_mmap(struct mm_struct * mm)
{
        struct vm_area_struct * mpnt, **p, *tmp;

        mm->mmap = NULL;
        p = &mm->mmap;
        for (mpnt = current->mm->mmap ; mpnt ; mpnt = mpnt->vm_next) {
                tmp = (struct vm_area_struct *) kmalloc(sizeof(struct vm_area_struct), GFP_KERNEL);
                if (!tmp) {
                        /* exit_mmap is called by the caller */
                        return -ENOMEM;
                }
                *tmp = *mpnt;
                tmp->vm_flags &= ~VM_LOCKED;
                tmp->vm_mm = mm;
                tmp->vm_next = NULL;
                if (tmp->vm_inode) {
                        tmp->vm_inode->i_count++;
                        /* insert tmp into the share list, just after mpnt */
                        tmp->vm_next_share->vm_prev_share = tmp;
                        mpnt->vm_next_share = tmp;
                        tmp->vm_prev_share = mpnt;
                }
                if (copy_page_range(mm, current->mm, tmp)) {
                        /* link into the linked list for exit_mmap */
                        *p = tmp;
                        p = &tmp->vm_next;
                        /* exit_mmap is called by the caller */
                        return -ENOMEM;
                }
                if (tmp->vm_ops && tmp->vm_ops->open)
                        tmp->vm_ops->open(tmp);
                *p = tmp;
                p = &tmp->vm_next;
        }
        build_mmap_avl(mm);
        flush_tlb_mm(current->mm);
        return 0;
}

static inline int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
{
        if (!(clone_flags & CLONE_VM)) {
                struct mm_struct * mm = kmalloc(sizeof(*tsk->mm), GFP_KERNEL);
                if (!mm)
                        return -ENOMEM;
                *mm = *current->mm;
                mm->count = 1;
                mm->def_flags = 0;
                mm->mmap_sem = MUTEX;
                tsk->mm = mm;
                tsk->min_flt = tsk->maj_flt = 0;
                tsk->cmin_flt = tsk->cmaj_flt = 0;
                tsk->nswap = tsk->cnswap = 0;
                if (new_page_tables(tsk)) {
                        tsk->mm = NULL;
                        exit_mmap(mm);
                        goto free_mm;
                }
                down(&mm->mmap_sem);
                if (dup_mmap(mm)) {
                        up(&mm->mmap_sem);
                        tsk->mm = NULL;
                        exit_mmap(mm);
                        free_page_tables(mm);
free_mm:
                        kfree(mm);
                        return -ENOMEM;
                }
                up(&mm->mmap_sem);
                return 0;
        }
        SET_PAGE_DIR(tsk, current->mm->pgd);
        current->mm->count++;
        return 0;
}

static inline int copy_fs(unsigned long clone_flags, struct task_struct * tsk)
{
        if (clone_flags & CLONE_FS) {
                current->fs->count++;
                return 0;
        }
        tsk->fs = kmalloc(sizeof(*tsk->fs), GFP_KERNEL);
        if (!tsk->fs)
                return -1;
        tsk->fs->count = 1;
        tsk->fs->umask = current->fs->umask;
        if ((tsk->fs->root = current->fs->root))
                tsk->fs->root->i_count++;
        if ((tsk->fs->pwd = current->fs->pwd))
                tsk->fs->pwd->i_count++;
        return 0;
}

static inline int copy_files(unsigned long clone_flags, struct task_struct * tsk)
{
        int i;
        struct files_struct *oldf, *newf;
        struct file **old_fds, **new_fds;

        oldf = current->files;
        if (clone_flags & CLONE_FILES) {
                oldf->count++;
                return 0;
        }

        newf = kmalloc(sizeof(*newf), GFP_KERNEL);
        tsk->files = newf;
        if (!newf)
                return -1;
                        
        newf->count = 1;
        newf->close_on_exec = oldf->close_on_exec;
        newf->open_fds = oldf->open_fds;

        old_fds = oldf->fd;
        new_fds = newf->fd;
        for (i = NR_OPEN; i != 0; i--) {
                struct file * f = *old_fds;
                old_fds++;
                *new_fds = f;
                new_fds++;
                if (f)
                        f->f_count++;
        }
        return 0;
}

static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk)
{
        if (clone_flags & CLONE_SIGHAND) {
                current->sig->count++;
                return 0;
        }
        tsk->sig = kmalloc(sizeof(*tsk->sig), GFP_KERNEL);
        if (!tsk->sig)
                return -1;
        tsk->sig->count = 1;
        memcpy(tsk->sig->action, current->sig->action, sizeof(tsk->sig->action));
        return 0;
}

/*
 *  Ok, this is the main fork-routine. It copies the system process
 * information (task[nr]) and sets up the necessary registers. It
 * also copies the data segment in its entirety.
 */
int do_fork(unsigned long clone_flags, unsigned long usp, struct pt_regs *regs)
{
        int nr;
        int error = -ENOMEM;
        unsigned long new_stack;
        struct task_struct *p;

        p = (struct task_struct *) kmalloc(sizeof(*p), GFP_KERNEL);
        if (!p)
                goto bad_fork;
        new_stack = alloc_kernel_stack();
        if (!new_stack)
                goto bad_fork_free_p;
        error = -EAGAIN;
        nr = find_empty_process();
        if (nr < 0)
                goto bad_fork_free_stack;

        *p = *current;

        if (p->exec_domain && p->exec_domain->use_count)
                (*p->exec_domain->use_count)++;
        if (p->binfmt && p->binfmt->use_count)
                (*p->binfmt->use_count)++;

        p->did_exec = 0;
        p->swappable = 0;
        p->kernel_stack_page = new_stack;
        *(unsigned long *) p->kernel_stack_page = STACK_MAGIC;
        p->state = TASK_UNINTERRUPTIBLE;
        p->flags &= ~(PF_PTRACED|PF_TRACESYS|PF_SUPERPRIV);
        p->flags |= PF_FORKNOEXEC;
        p->pid = get_pid(clone_flags);
        p->next_run = NULL;
        p->prev_run = NULL;
        p->p_pptr = p->p_opptr = current;
        p->p_cptr = NULL;
        init_waitqueue(&p->wait_chldexit);
        p->signal = 0;
        p->it_real_value = p->it_virt_value = p->it_prof_value = 0;
        p->it_real_incr = p->it_virt_incr = p->it_prof_incr = 0;
        init_timer(&p->real_timer);
        p->real_timer.data = (unsigned long) p;
        p->leader = 0;          /* session leadership doesn't inherit */
        p->tty_old_pgrp = 0;
        p->utime = p->stime = 0;
        p->cutime = p->cstime = 0;
#ifdef __SMP__
        p->processor = NO_PROC_ID;
        p->lock_depth = 1;
#endif
        p->start_time = jiffies;
        task[nr] = p;
        SET_LINKS(p);
        nr_tasks++;

        error = -ENOMEM;
        /* copy all the process information */
        if (copy_files(clone_flags, p))
                goto bad_fork_cleanup;
        if (copy_fs(clone_flags, p))
                goto bad_fork_cleanup_files;
        if (copy_sighand(clone_flags, p))
                goto bad_fork_cleanup_fs;
        if (copy_mm(clone_flags, p))
                goto bad_fork_cleanup_sighand;
        copy_thread(nr, clone_flags, usp, p, regs);
        p->semundo = NULL;

        /* ok, now we should be set up.. */
        p->swappable = 1;
        p->exit_signal = clone_flags & CSIGNAL;
        p->counter = (current->counter >>= 1);
        wake_up_process(p);                     /* do this last, just in case */
        ++total_forks;
        return p->pid;

bad_fork_cleanup_sighand:
        exit_sighand(p);
bad_fork_cleanup_fs:
        exit_fs(p);
bad_fork_cleanup_files:
        exit_files(p);
bad_fork_cleanup:
        if (p->exec_domain && p->exec_domain->use_count)
                (*p->exec_domain->use_count)--;
        if (p->binfmt && p->binfmt->use_count)
                (*p->binfmt->use_count)--;
        task[nr] = NULL;
        REMOVE_LINKS(p);
        nr_tasks--;
bad_fork_free_stack:
        free_kernel_stack(new_stack);
bad_fork_free_p:
        kfree(p);
bad_fork:
        return error;
}

Slides

Here are the slides produced from the above code. They appear in the sequence I chose to present them in class.

<slides.tex>=
\documentclass{seminar}
% slides for BS 1 -- generated automatically
% DO NOT EDIT! See file linux-kernel-basics.nw first!
\slideframe{none}% no fancy frames on slides
\slidesmag{2}% make slides a little smaller
\usepackage{url}
\begin{document}
\begin{slide}
<slide1>
\end{slide}
\begin{slide}
<slide2>
\end{slide}
\begin{slide}
<slide3>
\end{slide}
\begin{slide}
<slide4>
\end{slide}
\begin{slide}
<slide5>
\end{slide}
\begin{slide}
<slide6>
\end{slide}
\begin{slide}
<slide7>
\end{slide}
\begin{slide}
<slide8>
\end{slide}
\begin{slide}
<slide9>
\end{slide}
\begin{slide}
<slide10>
\end{slide}
\begin{slide}
<slide11>
\end{slide}
\begin{slide}
<slide12>
\end{slide}
\begin{slide}
<slide13>
\end{slide}
\begin{slide}
<slide14>
\end{slide}
\end{document}

Mutual Exclusion

The first slides contains mainly the cli and sti routines for the 68000 architecture and some other mutual exclusion code.

<slide1>= (<-U)
\begin{verbatim}
<sti and cli>

<save and restore flags>

<test and set and friends>
\end{verbatim}

<slide2>= (<-U)
\begin{verbatim}
<atomic value exchange>
\end{verbatim}

Process States and Process Table

<slide3>= (<-U)
\begin{verbatim}
<Process states>

<Scheduling policies>
\end{verbatim}

<slide4>= (<-U)
\begin{verbatim}
struct task_struct {
  <Linked lists in process table>
  <PID declaration>
  <Binary format pointer in process table>
  <Tree structures in process table>
  <UID and friends in process table>
  <Scheduling policy in process table>
  <Timing variables in process table>
  ...
};

<Definition of maximum number of tasks>
<Definition of the process table>
<Definition of the number of tasks and friends>
\end{verbatim}

<slide5>= (<-U)
\begin{verbatim}
<Function to add a task to the run queue>
\end{verbatim}

<slide6>= (<-U)
\begin{verbatim}
<Function to remove a task from the runqueue>
\end{verbatim}

<slide7>= (<-U)
\begin{verbatim}
<Function to move a task to the end of the run queue>
\end{verbatim}

<slide8>= (<-U)
\begin{verbatim}
<Function to wake up a process>
\end{verbatim}

<slide9>= (<-U)
\begin{verbatim}
<Function to calculate goodness of a process>
\end{verbatim}

<slide10>= (<-U)
\begin{verbatim}
<the scheduler proper>
\end{verbatim}

<slide11>= (<-U)
\begin{verbatim}
<the actual context switch>
\end{verbatim}

Semaphores

<slide12>= (<-U)
\begin{verbatim}
<Definition of the semaphore structure>

<Definition of MUTEX and friends>
\end{verbatim}

<slide13>= (<-U)
\begin{verbatim}
<Generic sleep routine for semaphores>
\end{verbatim}

References

<slide14>= (<-U)
\begin{center}
\bfseries\Large Mehr Infos:
\end{center}
\begin{itemize}
\item Linux Kernel Hacker's Guide: \\
  \url{http://khg.redhat.com/HyperNews/get/khg.html}
\item Linux Documentation Project Homepage: \\
  \url{http://metalab.unc.edu/LDP/}
\item Ausf\"uhrliche Darstellung von heute: \\
\url{http://www.informatik.tu-darmstadt.de/BS/Lehre/BS-I-Material/linux-kernel-basics.html}
\item Maurice J. Bach: The Design of the UNIX Operating System, Prentice Hall, 1986.
\end{itemize}

Chunk Index

Identifier Index

References

[1] Norman Ramsey: Literate Programming Simplified. IEEE Software, 11(5):97--105, September 1994.

[2] Norman Ramsey: noweb Homepage, Internet: http://www.cs.virginia.edu/~nr/noweb/intro.html

[3] S.u.S.E. GmbH: S.u.S.E. --- Die Linux-Spezialisten (in German), Internet: http://www.suse.de/

[4] David B. Thompson: The Literate Programming FAQ , Internet: http://shelob.ce.ttu.edu/daves/faq.html