6.828 Lab4（上）

Ctrl CV程序员

在一开始merge的时候出现了一些冲突，一个是之前的linker script, 我用了最新的版本；另一个是pmap.c中的一句话，我暂时将两句话都保留了，后面应该会改。

Exercise 1

这里能够保证pa是对齐的，所以只要注意将size向上取整。

检查溢出时，要考虑整个32位整数溢出的情况，加一个判断以防万一。

	// First, round up the size
	size = ROUNDUP(size, PGSIZE);
	// watch out for overflow
	if((base + size > MMIOLIM) || (base + size < base))
		panic("Overflow in mmio region");
	boot_map_region(kern_pgdir, base, size, pa, PTE_PCD | PTE_PWT | PTE_W);
	base += size;
	return (void*)(base - size);

Exercise 2

因为CPU在启动时处于实模式，只能访问低640K的内容，所以我们把MPENTRYPADDR作为保留页，放置其他cpu的启动代码。所以我们需要修改page_init函数，不把这个页加入free list:

	struct PageInfo* mp_entry_page = pa2page(MPENTRY_PADDR);
	size_t i;
	for (i = 1; i < npages_basemem; i++) {
		if(pages + i == mp_entry_page)
			continue;
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];
	}
	i = PADDR(boot_alloc(0)) / PGSIZE;
	for(; i < npages; i++)
	{
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];
	}

Question 1

这个问题的关键是，为什么mpentry.S中需要将虚拟地址转换为物理地址，而entry.S不需要？

使用objdump查看bootloader 的代码，或者直接打开boot.asm，就会发现，boot loader的VMA 和 LMA 都是0x7c00, 因此生成的重定位的symbol都是低地址（物理地址），可以直接用。然而，mpentry.S中的代码片段原来属于内核，也就是说VMA是0xf000000，所以里面的symbol地址都是高地址，需要转换后使用。

Exercise 3

按照注释说的，把每个内核栈映射到相对应的虚拟地址。

这里的percpu_kstacks是位于内核数据段的内容，现在应该处于低地址，但是虚拟地址在高地址，所以可以用PADDR转换。

这次操作后，bootstack就没用了。

	// LAB 4: Your code here:
	uintptr_t kstacktop_i;
	for(int i = 0; i < NCPU; i++)
	{
		kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP);
		boot_map_region(kern_pgdir, kstacktop_i - KSTKSIZE, KSTKSIZE, 
		PADDR(percpu_kstacks[i]), PTE_W);
	}

Exercise 4

将原来设置tss的代码切换为对当前cpu切换的代码。这里要注意一下加载TSS 选择子的语句，因为上面的gdt表项变动了，这里也要跟着变动。

	// LAB 4: Your code here:

	// Setup a TSS so that we get the right stack
	// when we trap to the kernel.
	thiscpu->cpu_ts.ts_esp0 = (uintptr_t)(percpu_kstacks[cpunum()] + KSTKSIZE);
	thiscpu->cpu_ts.ts_ss0 = GD_KD;
	thiscpu->cpu_ts.ts_iomb = sizeof(struct Taskstate);

	// Initialize the TSS slot of the gdt.
	gdt[(GD_TSS0 >> 3) + cpunum()] = SEG16(STS_T32A, (uint32_t)(&(thiscpu->cpu_ts)),
					sizeof(struct Taskstate) - 1, 0);
	gdt[(GD_TSS0 >> 3) + cpunum()].sd_s = 0;

	// Load the TSS selector (like other segment selectors, the
	// bottom three bits are special; we leave them 0)
	ltr(GD_TSS0 + (cpunum() << 3));

	// Load the IDT
	lidt(&idt_pd);
}

Exercise 5

这里需要按照它的要求加锁。具体这些锁是怎么协同运作的，必须做后面的Exercise才能理解。

	// Acquire the big kernel lock before waking up APs
	// Your code here:
	lock_kernel();



	// Now that we have finished some basic setup, call sched_yield()
	// to start running processes on this CPU.  But make sure that
	// only one CPU can enter the scheduler at a time!
	//
	// Your code here:
	lock_kernel();
	sched_yield();


	if ((tf->tf_cs & 3) == 3) {
		// Trapped from user mode.
		// Acquire the big kernel lock before doing any
		// serious kernel work.
		// LAB 4: Your code here.
		lock_kernel();
		assert(curenv);


	lcr3(PADDR(curenv->env_pgdir));
	unlock_kernel();
	env_pop_tf(&curenv->env_tf);

Question 2

假设这样的场景：CPU0正在处理用户态的中断，内核栈中存储了栈帧，此时CPU1的用户态程序也发生中断。因为锁是在trap函数，所以中断上下文仍然会被压栈，导致栈帧被破坏。

Exercise 6

在上面的练习中，我们给内核的进入口都加了锁，并且只在env_ pop _ tf函数中解锁。这个写法会导致什么后果呢？

通过函数调用图，我们能够发现，env_pop tf函数只有一个上层调用者，就是env run函数。也就是说，刚刚其他核阻塞在 mp _ main函数，只有通过env_ run函数才能开始运行用户态代码；只有当他们用户态通过yield系统调用,然后这个核才有机会执行内核代码。

这个练习需要我们完成的函数作用就是选择一个env执行，根据它给的round robin算法。执行env就要用到env_ run 函数，这样就解锁了，能让程序继续执行下去。

这个练习完成后，如果我们把断点设在sched_yield 函数，会发现之前的锁确实生效了，如图：

此刻，CPU1 CPU2都在用户态执行，只有CPU0在内核态中。打印内核锁，发现确实是CPU0正持有这把锁。

这个练习要注意round robin的写法，如何写才能实现从上次运行过的env的下一个开始。上次运行过的env的优先级应该是最低的。

	// LAB 4: Your code here.
	// idle is where we start searching
	idle = (curenv == NULL) ? envs : (curenv + 1);
	// A flag, indicating whether find an runnable env
	bool flag = false;
	for(struct Env* e = idle; e != envs + NENV; e++)
	{
		if(e->env_status == ENV_RUNNABLE)
		{
			flag = true;
			env_run(e);
			break;
		}
	}
	// do the circular searching
	if(!flag)
		for(struct Env* e = envs; e != idle; e++)
		{
			if(e->env_status == ENV_RUNNABLE)
			{
				flag = true;
				env_run(e);
				break;
			}		
		}
	// check idle for the last time, for the time it is running
	if(!flag && curenv != NULL && curenv->env_status == ENV_RUNNING)
		env_run(curenv);
	// sched_halt never returns
	if(!flag)
		sched_halt();
}

问题3

因为这个地址（e）是内核栈，这块内容被一起从kern_pgdir 复制到环境的页表了。

问题4

为什么要保存恢复上下文？因为寄存器只有一套，只能存储一个env的上下文。

在哪里save的？在trap处理函数中，将栈帧地址赋值给env_tf域。

Exercise 7

这个问题要求我们实现一系列系统调用函数，实现一个dummy fork.

// Allocate a new environment.
// Returns envid of new environment, or < 0 on error.  Errors are:
//	-E_NO_FREE_ENV if no free environment is available.
//	-E_NO_MEM on memory exhaustion.
static envid_t
sys_exofork(void)
{
	// Create the new environment with env_alloc(), from kern/env.c.
	// It should be left as env_alloc created it, except that
	// status is set to ENV_NOT_RUNNABLE, and the register set is copied
	// from the current environment -- but tweaked so sys_exofork
	// will appear to return 0.

	// LAB 4: Your code here.
	struct Env* store = NULL;
	// val used to store the return value
	// Two kinds of error return value is passed directly from env_alloc
	int val;
	if((val = env_alloc(&store, curenv->env_id)) < 0)
		return val;
	store->env_status = ENV_NOT_RUNNABLE;
	store->env_tf = curenv->env_tf;
	// Child environment will return zero instead; change the eax register
	store->env_tf.tf_regs.reg_eax = 0;
	// curenv will return the envid of child environment
	return store->env_id;
}

// Set envid's env_status to status, which must be ENV_RUNNABLE
// or ENV_NOT_RUNNABLE.
//
// Returns 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if status is not a valid status for an environment.
static int
sys_env_set_status(envid_t envid, int status)
{
	// Hint: Use the 'envid2env' function from kern/env.c to translate an
	// envid to a struct Env.
	// You should set envid2env's third argument to 1, which will
	// check whether the current environment has permission to set
	// envid's status.

	// LAB 4: Your code here.
	// First, check whether status argument is valid
	if(status != ENV_RUNNABLE && status != ENV_NOT_RUNNABLE)
		return -E_INVAL;
	struct Env* store = NULL;
	if(envid2env(envid, &store, 1) < 0)
		return -E_BAD_ENV;
	// All Preconditions met, now do the assignment part
	store->env_status = status;
	return 0;
}

// Allocate a page of memory and map it at 'va' with permission
// 'perm' in the address space of 'envid'.
// The page's contents are set to 0.
// If a page is already mapped at 'va', that page is unmapped as a
// side effect.
//
// perm -- PTE_U | PTE_P must be set, PTE_AVAIL | PTE_W may or may not be set,
//         but no other bits may be set.  See PTE_SYSCALL in inc/mmu.h.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if va >= UTOP, or va is not page-aligned.
//	-E_INVAL if perm is inappropriate (see above).
//	-E_NO_MEM if there's no memory to allocate the new page,
//		or to allocate any necessary page tables.
static int
sys_page_alloc(envid_t envid, void *va, int perm)
{
	// Hint: This function is a wrapper around page_alloc() and
	//   page_insert() from kern/pmap.c.
	//   Most of the new code you write should be to check the
	//   parameters for correctness.
	//   If page_insert() fails, remember to free the page you
	//   allocated!

	// LAB 4: Your code here.
	struct Env* store = NULL;
	struct PageInfo* page = NULL;
	// First, do the parameter check
	// check envid
	if(envid2env(envid, &store, 1) < 0)
		return -E_BAD_ENV;
	// check va
	if(((uintptr_t)va >= (uintptr_t)UTOP) || (ROUNDDOWN((uintptr_t)va, PGSIZE) != (uintptr_t)va))
		return -E_INVAL;
	// check perm, is PTE_U and PTE_P already set?
	if(((perm & PTE_U) == 0) || ((perm & PTE_P) == 0) )
		return -E_INVAL;
	// is perm set with other perms that should never be set?
	// bit-and ~PTE_SYSCALL clear the four bits
	if((perm & ~PTE_SYSCALL) != 0)
		return -E_INVAL;
	// Now start to do the real things, first alloc a physical page
	if((page = page_alloc(ALLOC_ZERO)) == NULL)
		return -E_NO_MEM;
	// map it at given va, with given perm
	page_insert(store->env_pgdir, page, va, perm);
	// If the control flow executes through here, then we are done
	return 0;
}

// Map the page of memory at 'srcva' in srcenvid's address space
// at 'dstva' in dstenvid's address space with permission 'perm'.
// Perm has the same restrictions as in sys_page_alloc, except
// that it also must not grant write access to a read-only
// page.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if srcenvid and/or dstenvid doesn't currently exist,
//		or the caller doesn't have permission to change one of them.
//	-E_INVAL if srcva >= UTOP or srcva is not page-aligned,
//		or dstva >= UTOP or dstva is not page-aligned.
//	-E_INVAL is srcva is not mapped in srcenvid's address space.
//	-E_INVAL if perm is inappropriate (see sys_page_alloc).
//	-E_INVAL if (perm & PTE_W), but srcva is read-only in srcenvid's
//		address space.
//	-E_NO_MEM if there's no memory to allocate any necessary page tables.
static int
sys_page_map(envid_t srcenvid, void *srcva,
	     envid_t dstenvid, void *dstva, int perm)
{
	// Hint: This function is a wrapper around page_lookup() and
	//   page_insert() from kern/pmap.c.
	//   Again, most of the new code you write should be to check the
	//   parameters for correctness.
	//   Use the third argument to page_lookup() to
	//   check the current permissions on the page.

	// LAB 4: Your code here.
	struct Env* src = NULL;
	struct Env* dst = NULL;
	// First do the parameter check
	// check two envids
	if(envid2env(srcenvid, &src, 1) < 0 || envid2env(dstenvid, &dst, 1) < 0)
		return -E_BAD_ENV;
	// check two vas
	if((uintptr_t)srcva >= UTOP || ROUNDDOWN((uintptr_t)srcva, PGSIZE) != (uintptr_t)srcva)
		return -E_INVAL;
	if((uintptr_t)dstva >= UTOP || ROUNDDOWN((uintptr_t)dstva, PGSIZE) != (uintptr_t)dstva)
		return -E_INVAL;
	// check whether srcva is mapped in src
	pte_t* pte_addr = NULL;
	struct PageInfo* page = NULL;
	if((page = page_lookup(src->env_pgdir, srcva, &pte_addr)) == NULL)
		return -E_INVAL;
	// check whether perm is inappropriate
	if(((perm & PTE_U) == 0) || ((perm & PTE_P) == 0) || ((perm & ~PTE_SYSCALL) != 0))
		return -E_INVAL;
	// check whether srcva is read-only but perm cotains PTE_W
	if(!(*pte_addr & PTE_W) && (perm & PTE_W))
		return -E_INVAL;
	// Now, do the real things, since we have got the physical page
	// just use page_insert() to map it at given addr
	if(page_insert(dst->env_pgdir, page, dstva, perm) < 0)
		return -E_NO_MEM;
	// If we could execute through here, then we are done
	return 0;
}

// Unmap the page of memory at 'va' in the address space of 'envid'.
// If no page is mapped, the function silently succeeds.
//
// Return 0 on success, < 0 on error.  Errors are:
//	-E_BAD_ENV if environment envid doesn't currently exist,
//		or the caller doesn't have permission to change envid.
//	-E_INVAL if va >= UTOP, or va is not page-aligned.
static int
sys_page_unmap(envid_t envid, void *va)
{
	// Hint: This function is a wrapper around page_remove().

	// LAB 4: Your code here.
	// First check envid
	struct Env* store = NULL;
	if(envid2env(envid, &store, 1) < 0)
		return -E_BAD_ENV;
	// Then check va
	if((uintptr_t)va >= UTOP || ROUNDDOWN((uintptr_t)va, PGSIZE) != (uintptr_t)va)
		return -E_INVAL;
	// Then do the real stuff
	page_remove(store->env_pgdir, va);
	// If we could execute through here, then everything is done
	return 0;
}

Exercise 8

这个练习是为之后的用户态的缺页处理程序做准备的。用户态的缺页处理有点类似之前做的xv6 alarm作业，也就是用户传一个handler进来，内核为每一个env都保存这样一个handler，当触发缺页中断并确认来自用户态时，内核将相关信息压栈，并将控制转移给用户态的handler。

static int
sys_env_set_pgfault_upcall(envid_t envid, void *func)
{
	// LAB 4: Your code here.
	struct Env* store = NULL;
	if(envid2env(envid, &store, 1) < 0)
		return -E_BAD_ENV;
	store->env_pgfault_upcall = func;
	return 0;
}

Exercise 9

这个练习要求我们实现内核态将用户态缺页中断的相关内容压栈的函数。需要注意一点，就是这里的user_mem_assert 调用，需要去检测放入的UTrapframe，而不能检测整个异常栈，否则评分系统无法正常运行。

// LAB 4: Your code here.
// First check preconditions, shall we pass the control to user-level page
// fault handler? flag means whether a user-level page fault handler is needed
bool flag = true;
// check whether there exists a page fault upcall
if(curenv->env_pgfault_upcall == NULL)
	flag = false;
// ckeck whether exception stack fails
if((USTACKTOP <= fault_va) && (fault_va < UXSTACKTOP - PGSIZE))
	flag = false;
// Now start to do the real stuff
if(flag)
{
	struct UTrapframe* trapframe = NULL;
	// where do we put the struct UTrapFrame? It depends on whether the page
	// fault happens on user exception stack
	if((curenv->env_tf.tf_esp < UXSTACKTOP) && (curenv->env_tf.tf_esp >= UXSTACKTOP - PGSIZE))
		trapframe = (struct UTrapframe*)(curenv->env_tf.tf_esp - sizeof(struct UTrapframe) - 4);
	else
		trapframe = (struct UTrapframe*)(UXSTACKTOP - sizeof(struct UTrapframe));
	// check whether the env has a page mapped at exception stack, and has write 
	// permission to it. user_mem_assert will automacally destory the env and will
	// not return if the check fails.
	user_mem_assert(curenv, trapframe, sizeof(struct UTrapframe), PTE_W);
	// construct the user trap frame for user-level page fault handler
	trapframe->utf_eflags = curenv->env_tf.tf_eflags;
	trapframe->utf_eip = curenv->env_tf.tf_eip;
	trapframe->utf_err = curenv->env_tf.tf_err;
	trapframe->utf_esp = curenv->env_tf.tf_esp;
	trapframe->utf_fault_va = fault_va;
	trapframe->utf_regs = curenv->env_tf.tf_regs;
	// modify the curenv->env_tf to make it return to user-level page fault 
	// handler, and set the new esp
	curenv->env_tf.tf_eip = (uintptr_t)curenv->env_pgfault_upcall;
	curenv->env_tf.tf_esp = (uintptr_t)trapframe;
	// Finally, return to user space, run the user-level page fault handler
	// This function will never return
	env_run(curenv);
}
// Destroy the environment that caused the fault.
cprintf("[%08x] user fault va %08x ip %08x\n",
	curenv->env_id, fault_va, tf->tf_eip);
print_trapframe(tf);
env_destroy(curenv);

Exercise 10

这个练习要求我们编写用户态的一个汇编函数，这个汇编函数是一个用户态处理函数的包装函数，负责做handler之后的clean up工作。具体的，就是将原来运行程序的eip esp以及其他寄存器恢复。

这个程序的难点就在于esp和eip的恢复，具体怎么恢复可以看注释。简单地说，call 和jmp都是不行的，我们需要巧妙的修改栈帧内容，并使用ret指令返回原来的控制流。

另外，在我们恢复了寄存器后，就不能再修改了。

	// LAB 4: Your code here.
	// fault_va and errno is useless now, since we will use popal to 
	// registers, move the esp to point to correct stuff

	addl $8, %esp

	// we need to put the "eip" to the right place
	// if we use push command after switching the stack, then we need a 
	// general-purpose register to save this value, which is unacceptable
	// so we use mov command to do this, and do this before "popal" command
	// %eax stores the eip value, %ebx stores where to put it
	// Important: We need to modify the %esp in UTrapframe here, because we need
	// to return to the (%esp - 4) place

	movl 0x20(%esp), %eax
	movl 0x28(%esp), %ebx
	subl $4, %ebx
	movl %ebx, 0x28(%esp)
	movl %eax, (%ebx)

	// Restore the trap-time registers.  After you do this, you
	// can no longer modify any general-purpose registers.
	// LAB 4: Your code here.

	popal

	// Restore eflags from the stack.  After you do this, you can
	// no longer use arithmetic operations or anything else that
	// modifies eflags.
	// LAB 4: Your code here.
	// eip on this stack is useless now, ignore it

	addl $4, %esp
	popf

	// Switch back to the adjusted trap-time stack.
	// LAB 4: Your code here.

	movl (%esp), %esp

	// Return to re-execute the instruction that faulted.
	// LAB 4: Your code here.

	ret

Exercise 11

为注册做一些准备工作，包括分配用户态异常栈，以及注册handler。

void
set_pgfault_handler(void (*handler)(struct UTrapframe *utf))
{
	int r;

	if (_pgfault_handler == 0) {
		// First time through!
		// LAB 4: Your code here.
		sys_page_alloc(0, (void*)(UXSTACKTOP - PGSIZE), PTE_W | PTE_U | PTE_P);
		sys_env_set_pgfault_upcall(0, _pgfault_upcall);
	}

	// Save handler pointer for assembly to call.
	_pgfault_handler = handler;
}

然后我们通过下面的测试，这一部分就完成了。

还有一个问题，为什么faultalloc和faultallocbad会表现不一样？因为faultallocbad的输出是通过系统调用，所以这个缺页错误发生在了内核态，操作系统会直接停止运行。

Exercise 12

这个练习就是让我们用之前实现的所有系统调用，实现一个用户态的COW的fork。这个练习我觉得很难。。

首先我们需要阅读lab页面关于这个fork的所有流程：先调用exo_fork系统调用，然后复制所有的映射，然后分配异常栈，最后使其能够运行。

这个练习有很多个注意点，下面一个个提。

对于一个父env的非只读虚拟页面，duppage不仅要将其子env的虚拟页面映射到对应物理页面并标记为COW，还要把父env中的这个页面也标记为COW？这是为啥？答案：如果不把父env中的这个页面标记为COW，那么父env可能会修改这个页面，导致子env中的程序数据出错。
lab页面的问题，为什么要先对子env的页面做标记，再标记父env的页面？答案：如果先标记父env的页面，在两条语句的中间，可能会对这个页面（栈）进行写入，导致触发用户态缺页处理函数，然后分配一个单独的页。子env本来应该指向原来的页，但是现在却指向了新的页。这其实并不会构成问题，但是问题在于，注意到我们的问题1，之后父env如果向这个新的页写入数据，那么这个页就被污染了，然而子env对此一无所知——所以子env的address space也被污染了。
代码中的问题：如果父进程中，某个页本来就是COW，为什么最后还要将其设置为COW呢？其实就是刚刚的问题2，解释是一样的，途中可能这个页写入了内容，不再是COW了。为了保护子env的内容不被破坏，必须重新将父env的这个页设置为COW。
思考：既然栈会被修改，那么我们在exofork后，duppage栈这页前，一定会向栈中写入内容，这会导致结果错误吗？答案：不会，因为exofork会复制esp，而写入内容一定是栈往下增长，所以内容不会被破坏。
思考：代码中需要读取uvpt，也就是本env的页表。然而，有这样一种情况：page table entry所在的page table 根本不存在，访问uvpt就会发生一个一般保护错误。更为关键的，如果你在page fault handler里触发了这个错误，就会导致程序直接炸掉。我的解决方法是写了一个getpte函数，首先通过uvpd看这个page table 是否存在，如果不存在，直接返回0.（返回0是符合逻辑的，因为PTE_P位不存在）
思考：pgfault的注释中提到我们需要三个系统调用，但是实际上，我的代码只用了两个，省略了一个unmap的系统调用。我这样写是因为，我知道sys _ page_ map系统调用会清除原来映射在那里的页。但是，这是因为我自己实现了这些系统调用，知道细节；从软件角度看，我这样写非常不好，破坏了耦合性。
在fork.c中，假设我们拥有全局变量，那么一定要谨慎使用，特别是在缺页处理函数中。小心出现——全局变量所在页为COW——缺页处理函数在分配页时使用到了全局变量——不停缺页错误嵌套导致栈溢出的情况。
针对只读页的处理放在哪里？注释要求我们放在fork.c,我把他放在了duppage。这是设计/架构问题，自己知道，注释写好就行。
分配子env的异常栈空间，为子env注册用户态page fault handler,究竟是由父env执行，还是子env执行？如果由子env执行，那么运行第一句就会出现缺页错误，所以要由父env执行。
因为pgfault.c中的写法，所以为子env注册user level page fault handler时，要自己用系统调用注册，而不能用wrapper函数了。那个upcall函数需要单独声明一下。

因为篇幅限制，代码以及Part C请看6.828 Lab4（下）。

编辑于 2018-11-29 15:34

操作系统内核

操作系统

MIT 公开课程