[Apparmor-dev] rlimit resource limit policies

Sarah Smith sez at storybridge.org
Tue Apr 10 19:57:55 MDT 2007


On Monday 09 April 2007 19:00, Crispin Cowan wrote:
> For about a year, we have been wanting to add resource limit policy
> features to AppArmor, so that a policy can limit the resources a
> confined process can use, and thus limit the ability of an attacker to
> DoS the host machine.

Great!  Some input on this project below.

> Naturally, Linux already has resource limiting features in the form of
> rlimit and ulimit. Unfortunately, similar to POSIX.1e capabilities, the
> management interface sucks, to say the least. Also similar to
> Capabilities, we can ease this situation by providing easy-to-manage
> resource limit policies in AppArmor profiles.
>
> See man 2 getrlimit and setrlimit, and man 1 ulimit. Using the system
> calls getrlimit() and setrlimit(), a process can voluntarily set its
> resource limits for a bunch of attributes, and the kernel will enforce
> these limits if the process is not privileged. The proposed design is
> that you be able to write such resource limits into a profile, e.g.

A process can lower its hard limit, raise it if it has CAP_SYS_RESOURCE |
CAP_SYS_ADMIN, and raise or lower its softlimit. - I guess the policy values 
would be setting the hard limit:

> /usr/bin/foo {
>     capability chown, # a classic POSIX.1e Capability limit
>     rlimit_as 1048576, # set address space limit to 1MB
>     rlimit_nofile 6, # can have 6 file descriptors
>     rlimit_nproc 10, # can have 10 instance processes in this profile
> .
> }
>
> The man page has the full set of attributes. Most of these attributes
> are naturally applied per process, e.g. rlimit_as is the size of the
> address space, and rlimit_nofile is the number of open file descriptors
> a process may have. Limiting these resources is performed by the kernel:
> all AppArmor has to do is set the limits when the process is instantiated.

This looks good and seems straightforward. 

Interestingly it makes AppArmour a security system and a resource control 
system - a subtle point which perhaps we dont have to worry about.

But for example with open (2) the return value would be -EMFILE rather 
than -EPERM, so does it get logged as a breach of policy?  Perhaps its a 
breach of policy of the current value is set to the hard limit at the time of 
the denied call.

This allows processes the current semantics where it can consume resources up 
to the soft limit, be notified of such by the system call return value or 
signal, and then act to curtail its resource use so it doesnt hit the hard 
limit.

> rlimit_nproc is special: this is the number of processes possible. In
> the classic parlance of rlimit, this is the number of processes in the
> current real user_ID. I propose that for AppArmor policy purposes, we
> change it to be the number of processes that can be instantiated under
> this profile. If the limit is exceeded, then fork or exec will fail on
> attempts to create another process via ix permissions. However, the
> process is still free to launch Px, px, and Ux children, subject to the
> rlimit_nproc policy of the corresponding profiles for Px and px.

Here the aim would be to prevent either a malicious or malfunctioning piece of 
code running in a confined process forking so much that the kernel can't 
schedule the children/handle the task structs or otherwise grinds to a halt.

The above looks good for a process which forks new copies of itself, or maybe 
creates lots of threads.

Just wondering about the inheritance for such a policy.  

The 90% case is a fork then an exec.

If the inheritance policy is that the process limit is that of the parent 
process, all is good - a badly written server /sbin/malsrv which fork-bombs 
itself by calling binary /bin/foo a lot via system (3) will be controlled. 

But if /bin/foo always is unlimited, and /sbin/malsrv is not prevented from 
exec'ing it then it can DoS by forking to somewhere under its limit and in 
each copy doing
   char *fork_lots_arg[] = { "--fork", "10000", '\0' };
   execve( "/bin/foo", fork_lots_arg, environ );
Or if the shell is unlimited 'system( "bomb() { bomb | bomb }; bomb" )'

Anyway, maybe that can't be reasonably prevented.  Maybe limits should be 
applied to anything the confined process can exec.

Initial implementation thoughts, without having looked into it too deeply - 
maybe when the security structure is allocated from the hook in do_execve the 
rlimit value in current->signal is updated with the value from the policy, so 
the kernel can do the right/default thing when the value hits the soft limit.

Might need to think about when the struct_signal is shared.

The then just track the number of processes per profile in a static lookup 
table/list per profile in the module, and simply check it in AA's task_create 
hook, returning -EPERM if its at the hard limit.  

Immediate thoughts are making sure threads work properly; and how to handle 
zombies - whether we don't reduce our count until the task (and the security 
structure) is freed, or do we do it when the process is parented onto init?

> An important implementation issue is restricting root's resource usage.
> I did some experimenting with setrlimit() and was disappointed to
> discover that it doesn't actually enforce against root. As with the rest
> of AppArmor, we will need to enforce these limits regardless of whether
> the process is privileged.

Agree totally.  I think the kernel guys agree too - 

http://lkml.org/lkml/2000/11/28/110

There's this code in fork.c:

	if (atomic_read(&p->user->processes) >=
			p->signal->rlim[RLIMIT_NPROC].rlim_cur) {
		if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_RESOURCE) &&
				p->user != &root_user)
			goto bad_fork_free;
	}

Hmmmmm....

> Automatically creating these policy limits in learning mode will be
> difficult. A natural semantic would be to set high water marks for these
> values in learning mode, but the kernel lacks a high water mark
> mechanism for these limits, so we would have to fake it by setting the
> limits to minimal values, catching the exceptions, and changing the
> limits, without failing the system calls from the requesting processes.
> It is also problematic that I suspect that a vast majority of confined
> processes won't want to bother with specifying these limits, so learning
> mode should only learn limits for explicitly declared attributes.
> Enforce mode similarly should only enforce limits for explicitly
> declared attributes.
>
> So no matter how it is done, as far as I can tell manipulating rlimit
> attributes involves hand-editing of profiles.

Yes.  Maybe the GUI folks can make a nice slider in the yast plugin.

> Does this design sound useful?
>
> So why am I posting this now? Two reasons:
>
>    1. A member of the AppArmor community privately expressed interest in
>       building this feature, because the SUSE AppArmor team is doing
>       other stuff right now.
>    2. John Johansen (part of the SUSE team) has actually implemented a
>       partial prototype of rlimit code. He is critical path on some
>       other fun features we are planning for 10.3, so he can't work on
>       it in time for 10.3.
>
> I've asked JJ to post his prototype work in response to this thread.

Interested to see that code.

> I'll let the other party out themselves at their leisure.

:-)

This is a feature that we are looking to take advantage of in our research on 
use of AppArmour as a "MAC" implementation for the security architecture of 
our embedded software stack, Qtopia.  

Not a guru kernel hacker, but if I can get some review from folks on this list 
I can probably come up with something.

I'm interested to start work on this fairly soon - currently on leave so 
prolly not 'til early May.

Rgds,

Sarah Smith
Senior Engineer
Trolltech Mobile & Embedded Systems



More information about the Apparmor-dev mailing list