Thursday, April 19, 2012

What is a Class Loader and its responsibilities?

The Class loader is a subsystem of a JVM which is responsible,predominantly for loading classes and interfaces in the system.Apart from this,a class loader is responsible for the following activities:

-Verification of imported types(classes and interfaces)

-Allocating memory for class variables and initializing them to default values.Static fields for a class are created and these are set to standard default values but they are not explicitly initialized.The method tables are constructed for the class.

-Resolving symbolic references from type to direct references The class loaders can be of two types: a bootstrap or primordial class loader and user defined class loaderEach JVM has a bootstrap class loader which loads trusted classes , including classes from Java API.JVM specs do not tell how to locate these classes and is left to implementation designers.

A Java application with user defined class loader objects can customize class loading.These load untrustworthy classes and not an intrinsic part of JVM.They are written in Java,converted to class files and loaded into the JVM and installed like any other objects.



ORACLE.COM


java.lang 
Class ClassLoader

java.lang.Object
  extended byjava.lang.ClassLoader
Direct Known Subclasses:
SecureClassLoader

public abstract class ClassLoader
extends Object
A class loader is an object that is responsible for loading classes. The class ClassLoader is an abstract class. Given the name of a class, a class loader should attempt to locate or generate data that constitutes a definition for the class. A typical strategy is to transform the name into a file name and then read a "class file" of that name from a file system.
Every Class object contains a reference to the ClassLoader that defined it.
Class objects for array classes are not created by class loaders, but are created automatically as required by the Java runtime. The class loader for an array class, as returned by Class.getClassLoader() is the same as the class loader for its element type; if the element type is a primitive type, then the array class has no class loader.
Applications implement subclasses of ClassLoader in order to extend the manner in which the Java virtual machine dynamically loads classes.
Class loaders may typically be used by security managers to indicate security domains.
The ClassLoader class uses a delegation model to search for classes and resources. Each instance of ClassLoader has an associated parent class loader. When requested to find a class or resource, a ClassLoader instance will delegate the search for the class or resource to its parent class loader before attempting to find the class or resource itself. The virtual machine's built-in class loader, called the "bootstrap class loader", does not itself have a parent but may serve as the parent of a ClassLoader instance.
Normally, the Java virtual machine loads classes from the local file system in a platform-dependent manner. For example, on UNIX systems, the virtual machine loads classes from the directory defined by the CLASSPATH environment variable.
However, some classes may not originate from a file; they may originate from other sources, such as the network, or they could be constructed by an application. The method defineClass converts an array of bytes into an instance of class Class. Instances of this newly defined class can be created usingClass.newInstance.
The methods and constructors of objects created by a class loader may reference other classes. To determine the class(es) referred to, the Java virtual machine invokes the loadClass method of the class loader that originally created the class.
For example, an application could create a network class loader to download class files from a server. Sample code might look like:
   ClassLoader loader = new NetworkClassLoader(host, port);
   Object main = loader.loadClass("Main", true).newInstance();
   . . .
 
The network class loader subclass must define the methods findClass and loadClassData to load a class from the network. Once it has downloaded the bytes that make up the class, it should use the method defineClass to create a class instance. A sample implementation is:
     class NetworkClassLoader extends ClassLoader {
         String host;
         int port;

         public Class findClass(String name) {
             byte[] b = loadClassData(name);
             return defineClass(name, b, 0, b.length);
         }

         private byte[] loadClassData(String name) {
             // load the class data from the connection
              . . .
         }
     }


Securing Java

One of the central tenets of Java is making code truly mobile. Every mobile code system requires the ability to load code from outside a system into the system dynamically. In Java, code is loaded (either from the disk or over the network) by a Class Loader. Java's class loader architecture is complex, but it is a central security issue, so please bear with us as we explain it.
Recall that all Java objects belong to classes. Class loaders determine when and how classes can be added to a running Java environment. Part of their job is to make sure that important parts of the Java runtime environment are not replaced by impostor code. The fake Security Manager shown in Figure 2.4 must be disallowed from loading into the Java environment and replacing the real Security Manager. This is known as class spoofing.
Fig 2.4
Figure 2.4 Spoofing occurs when someone or something pretends to be something it is not.

In this figure, an external class has arrived from the Internet and declares itself to be the Security Manager (in order to replace the real Security Manager). If external code were allowed to do this, Java's security system would be trivial to break.
Class loaders perform two functions. First, when the VM needs to load the byte code for a particular class, it asks a class loader to find the byte code. Each class loader can use its own method for finding requested byte code files: It can load them from the local disk, fetch them across the Net using any protocol, or it can just create the byte code on the spot. This flexibility is not a security problem as long as the class loader is trusted by the party who wrote the code that is being loaded. Second, class loaders define the namespaces seen by different classes and how those namespaces relate to each other. Namespaces are a subtle and security-critical issue that we'll have a lot more to say about later. Problems with namespace management have led to a number of serious security holes.
It probably would have been better if Java's design had initially separated the two functions of class loaders and provided lots of flexibility in finding byte code but not much flexibility in defining namespaces. In a sense, this is what has come about as successive versions of Java have had increasingly restrictive rules about how namespaces may be managed. Java's class loader architecture was originally meant to be extensible, in the sense that new class loaders could be added to a running system. It became clear early on, however, that malicious class loaders could break Java's type system, and hence breach security. As a result, current Java implementations prohibit untrusted code from making class loaders. This restriction may be relaxed in the future, since there is some possibility that the Java 2 class loader specification is at last safe in the presence of untrusted class loaders.

Varieties of Class Loaders
There are two basic varieties of class loaders: Primordial Class Loaders and Class Loader objects. There is only one Primordial Class Loader, which is an essential part of each Java VM. It cannot be overridden. The Primordial Class Loader is involved in bootstrapping the Java environment. Since most VMs are written in C, it follows that the Primordial Class Loader is typically written in C. This special class loader loads trusted classes, usually from the local disk. Figure 2.5 shows the inheritance hierarchy of Class Loaders available in Java 2.
Fig 2.5
Figure 2.5 Class Loaders provide Java's dynamic loading capability, which allows classes to arrive and depart from the runtime environment.

Java 2 implements a hierarchy of Class Loaders. This figure, after Gong [Gong, 1998], shows the inheritance hierarchy of Class Loaders.

The Primordial Class Loader
The Primordial Class Loader uses the native operating system's file access capabilities to open and read Java class files from the disk into byte arrays.
This provides Java with the ability to bootstrap itself and provide essential functions. The Java API class files (stored by default in the classes.zip file) are usually the first files loaded by the VM. The Primordial Class Loader also typically loads any classes a user has located in the CLASSPATH. Classes loaded by the Primordial Class Loader are not subjected to the Verifier prior to execution.
Sometimes the Primordial Class Loader is referred to as the "internal" class loader or the "default" class loader. Just to make things overly complicated, some people refer to classes loaded by the Primordial Class Loader as having no class loader at all.

Class Loader Objects
The second basic variety of class loader is made up of Class Loader objects. Class Loader objects load classes that are not needed to bootstrap the VM into a running Java environment. The VM treats classes loaded through Class Loader objects as untrusted by default. Class Loaders are objects just like any other Java object-they are written in Java, compiled into byte code, and loaded by the VM (with the help of some other class loader). These Class Loaders give Java its dynamic loading capabilities.
There are three distinct types of Class Loader objects defined by the JDK itself: Applet Class Loaders, RMI Class Loaders, and Secure Class Loaders. From the standpoint of a Java user or a system administrator, Applet Class Loaders are the most important variety. Java developers who are interested in rolling their own Class Loaders will likely subclass or otherwise use the RMI Class Loader and Secure Class Loader classes.
Applet Class Loaders are responsible for loading classes into a browser and are defined by the vendor of each Java-enabled browser. Vendors generally implement similar Applet Class Loaders, but they do not have to. Sometimes seemingly subtle differences can have important security ramifications. For example, Netscape now tracks a class not by its name, but by a pointer to actual code, making attacks that leverage Class Loading complications harder to carry out.
Applet Class Loaders help to prevent external code from spoofing important pieces of the Java API. They do this by attempting to load a class using the Primordial Class Loader before fetching a class across the network. If the class is not found by the Primordial Class Loader, the Applet Class Loader typically loads it via HTTP using methods of the URL class. Code is fetched from the CODEBASE specified in the <APPLET> tag. If a fetch across the Web fails, a ClassNotFound exception is thrown.
It should be clear why external code must be prevented from spoofing the trusted classes of the Java API. Consider that the essential parts of the Java security model (including the Applet Class Loader class itself) are simply Java classes. If an untrusted class from afar were able to set up shop as a replacement for a trusted class, the entire security model would be toast!
The RMI Class Loader and Secure Class Loader classes were introduced with JDK 1.1 and Java 2, respectively. RMI Class Loaders are very similar to Applet Class Loaders in that they load classes from a remote machine. They also give the Primordial Class Loader a chance to load a class before fetching it across the Net. The main difference is that RMI Class Loaders can only load classes from the URL specified by Java's rmi.server.codebase property. Similar in nature to RMI Class Loaders, Secure Class Loaders allow classes to be loaded only from those directories specified in Java's java.app.class.path property. Secure Class Loaders can only be used by classes found in the java.security package and are extensively used by the Java 2 access control mechanisms.

Roll-Your-Own Class Loaders
Developers are often called upon to write their own class loaders. This is an inherently dangerous undertaking since class loading is an essential part of the Java security model. Homegrown class loaders can cause no end of security trouble. The right approach to take in writing a class loader is to avoid changing the structure of namespaces, and to change only the methods that find the byte code for a not-yet-loaded class. This will allow you to fetch classes in new ways, such as through a firewall or proxy, or from a special local code library, without taking the risk inherent in namespace management. You can do this by overriding only the loadClass methods.

Namespaces
In general, a running Java environment can have many Class Loaders active, each defining its own namespace. Namespaces allow Java classes to see different views of the world depending on where they originate (see Figure 2.6). Simply put, a namespace is a set of unique names of classes loaded by a particular Class Loader and a binding of each name to a specific class object. Though some people say that namespaces are disjoint and do not overlap, this is not true in general. There is nothing to stop namespaces from overlapping.
Fig 2.6
Figure 2.6 Class Loaders have two distinct jobs (which we believe would have been better off separated): (1) fetching and instantiating byte code as classes, and (2) managing name spaces.

This figure shows how Class Loaders typically divide classes into distinct name spaces according to origin. It is especially important to keep local classes distinct from external classes. This figure implies that name spaces do not overlap, which is not entirely accurate.
Most VM implementations have used different class loaders to load code from different origins. This allowed these implementations to assign a single security policy to all code loaded by the same class loader, and to make security decisions based on which class loader loaded the class that is asking to perform a dangerous operation. With the addition of code signing in JDK 1.1, there are now two characteristics for categorization of code: origin (usually represented as a URL) and signer (the identity associated with the private key used to sign the file). Only the Class Loader that loaded a piece of code knows for sure where the code was loaded from.
Applet Class Loaders, which are typically supplied by the browser vendor, load all applets and the classes they reference, usually getting the classes from HTTP servers. When an applet loads across the network, its Applet Class Loader receives the binary data and instantiates it as a new class. Under normal operation, applets are forbidden to install a new Class Loader, so Applet Class Loaders are the only game in town.
A trusted Java application (such as the Java interpreter built in to Netscape Navigator or Internet Explorer) can, however, define its own class loaders. Sun Microsystems provides three template class loader modules as part of the JDK (discussed earlier). If an untrusted applet could somehow install a Class Loader, the applet would be free to define its own namespace. Prior to Java 2, this would allow an attack applet to breach security (see Chapter 5).
If you are writing an application or built-in extension that defines its own Class Loader, you should be very careful to follow the rules; otherwise, your Class Loader will almost certainly introduce a security hole. It is unfortunate that in order to get the ability to use your own code-finding mechanism, you must also take on responsibility for managing namespaces. One criticism often raised against the Java security model is that because of the presence of objects like application-definable class loaders, the security model is too distributed and lacks central control. Applet Class Loaders install each applet in a separate namespace. This means that each applet sees its own classes and all of the classes in the standard Java library API, but it doesn't see classes belonging to other applets. Hiding applets from each other in this way has two advantages: It allows multiple applets to define classes with the same name without ill effect, so applet writers don't have to worry about name collisions. It also makes it harder, though not impossible, for applets to team up.
As an example, consider a class called laptop with no explicit package name (that is, laptop belongs to the default package). Imagine that the laptop class is loaded by an Applet Class Loader from www.rstcorp.com as you surf the Java Security Web Site. Then you surf over to java.sun.com and load a different class named laptop (also in the default package). What we have here is two different classes with the same name. How can the VM distinguish between them? The tagging of classes according to which Class Loader loaded them provides the answer. Applets from different CodeBases are loaded by different instances of the browser's Applet Class Loader class. (By the way, distinct namespaces will be created even if the two sites use explicit package names that happen to be the same.) Although the same class is involved in loading the two different classes (i.e., the Applet Class Loader), two different instances of the Applet Class Loader class are involved-one for each CodeBase.
Recall that the default object protection and encapsulation scheme covered earlier in this chapter allows classes that are members of a package to access all other classes in the same package. That means it is important for the VM to keep package membership straight. As a result, Class Loaders have to keep track of packages as well as classes.
When a class is imported from the network, the Applet Class Loader places it into a namespace labeled with information about its origin. Whenever one class tries to reference another, the Applet Class Loader follows a particular order of search. The first place it looks for a class is in the set of classes loaded by the Primordial Class Loader. If the Primordial Class Loader doesn't have a class with the indicated name, the Applet Class Loader widens the search to include the namespace of the class making the reference.
Because the Applet Class Loader searches for built-in classes first, it prevents imported classes from pretending to be built-in classes (something known as "class name spoofing"). This policy prevents such things as applets redefining file I/O classes to gain unrestricted access to the file system. Clearly, the point is to protect fundamental primitives from outside corruption.
Since all applets from a particular source are put in the same namespace, they can reference each other's methods. A source is defined as a particular directory on a particular Web server.
According to the Java specification, every Class Loader must keep an inventory of all the classes it has previously loaded. When a class that has already been loaded is requested again, the class loader must return the already loaded class.

Loading a Class
Class loading proceeds according to the following general algorithm:
  • Determine whether the class has been loaded before. If so, return the previously loaded class.
  • Consult the Primordial Class Loader to attempt to load the class from the CLASSPATH. This prevents external classes from spoofing trusted Java classes.
  • See whether the Class Loader is allowed to create the class being loaded. The Security Manager makes this decision. If not, throw a security exception.
  • Read the class file into an array of bytes. The way this happens differs according to particular class loaders. Some class loaders may load classes from a local database. Others may load classes across the network.
  • Construct a Class object and its methods from the class file.
  • Resolve classes immediately referenced by the class before it is used. These classes include classes used by static initializers of the class and any classes that the class extends.
  • Check the class file with the Verifier.

Summary
Each Java class begins as source code. This is then compiled into byte code and distributed to machines anywhere on the Net. A Java-enabled browser automatically downloads a class when it encounters the <APPLET> tag in an HTML document. The Verifier examines the byte code of a class file to ensure that it follows Java's strict safety rules. The Java VM interprets byte code declared safe by the Verifier. The Java specification allows classes to be unloaded when they are no longer needed, but few current Java implementations unload classes.
Java's ability to dynamically load classes into a running Java environment is fraught with security risks. The class-loading mechanisms mitigate these risks by providing separate namespaces set up according to where mobile code originates. This capability ensures that essential Java classes cannot be spoofed (replaced) by external, untrusted code. The Applet Class Loader in particular is a key piece of the Java security model.


http://medialab.di.unipi.it



The Class Loader and Class File Verifier

In this chapter we explore a number of topics:
  • How the components of the Java virtual machine work together to implement the Java security model
  • How the class loader locates and loads class files
  • How the class file verifier ensures that class files are legal prior to execution
In addition, we discuss issues to keep in mind when designing your own ClassLoader.
Overview of the Java Security Model
Before examining the components of the security model in detail, we'll take a high-level look at the whole process involved in loading and running a class.
See Steps in Loading a Class illustrates the steps involved in loading a class into the JVM.
l
Steps in Loading a Class
  1. When an applet or application requests a class file, the execution environment, whether it be a browser or the Java VM running from a command line, invokes a class loader to locate and load the class. 1
  2. The class loader receives the class as an array of bytes and converts it into a Class object in the class area of the JVM. The class area may be a part of the JVM heap (where all other objects are created and stored) or a separate region of memory.
  3. Depending on the class loader which loaded the class file, the JVM may also run the class file verifier. The verifier is responsible for making sure that class files contain only legal Java bytecodes and that they behave in a consistent way (for example, they do not attempt to underflow or overflow the stack, forge illegal pointers to memory or in any other way subvert the JVM). More details of this are in See The Class File Verifier .
  4. Assuming that the class passes verification, the JVM is handed a loaded class. It then links the class by resolving any references to other classes within it. This may result in additional calls to the class loader to locate and load other classes.
  5. Next, static initialization of the class is performed; that is, static variables and static initializers are run. Finally, the class is available to be executed.
  6. In the context of an applet executing within a Web browser, there will always be an instance of the SecurityManager constructed. This may also be true in a Java application. When a SecurityManager is present, calls which could result in the system's integrity being violated (such as file read and write requests, network access requests, or requests to access the environmental variables) are presented to the SecurityManager for validation. If the SecurityManager refuses access, it does so by throwing a SecurityException. Since access to these key system functions is controlled by API calls within the trusted classes, there is no way to avoid the SecurityManager other than by replacing these classes.
Class Loaders
A class loader has a number of duties. Class loaders are the gatekeepers of the JVM, controlling what bytecode may be loaded and what should be rejected. As such they have two primary responsibilities:
  1. To separate Java code from different sources, thus preventing malicious code from corrupting known good code
  2. To protect the boundaries of the core Java class packages (trusted classes) by refusing to load classes into these restricted packages
The class loader has another, useful, side effect. By controlling how the JVM loads code, all platform-specific file I/O is channelled through one part of the JVM, thus making porting the JVM to different platforms a much simpler task.
Let's look a little more closely at these two aims and why they are necessary. First, Java code can be loaded from a number of different sources. These include but are not limited to:
  • The trusted core classes which ship with the JVM (java.lang.*, java.applet.* etc.)
  • Classes stored in the local file store and locatable via the CLASSPATH environmental variable
  • Classes retrieved from Web servers (as parts of applets)
Clearly, we would not want to overwrite a trusted JVM class with an identically named class from a Web server since this would undermine the entire Java security model (the SecurityManager class is responsible for a large part of the JVM runtime security and is a trusted local class; consider what would happen to security if the SecurityManager could be replaced by an applet loaded from a remote site). The class loader must therefore ensure that trusted local classes are loaded in preference to remote classes where a name clash occurs.
Secondly, where classes are loaded from Web servers, it is possible that there could be a deliberate or unintentional collision of names (although the Sun Java naming conventions exist to prevent unintentional name collisions). If two versions of a class exist and are used by different applets from different Web sites then the JVM, through the auspices of the class loader, must ensure that the two classes can coexist without any possibility of confusion occurring. Class type confusion is a key way of attacking the JVM and is discussed later in this chapter.
The last point, that the class loader must protect the boundaries of the trusted class packages merits further explanation. The core Java class libraries that ship with the JVM reside in a series of packages which begin "java.", for example, java.lang and java.applet. Within the Java programming language, it is possible to give special access privileges to classes which reside in the same package; thus, a class which is part of the java.lang package has access to methods and fields within other classes in the java.lang package which are not accessible to classes outside of this package.
If it were possible for a programmer to add his or her own classes to the java.lang package, then those classes would also have privileged access to the core classes. This would be an exposure of the JVM and consequently must not be allowed.
The class loader must therefore ensure that classes cannot be dynamically added to the various core language packages. It achieves this by examining the name of the class which it is being asked to load and refusing to load those which start with "java."
How Class Loaders Are Implemented
The JVM architecture diagram ( See Steps in Loading a Class ) shows two class loaders. In fact, the JVM may have many class loaders operating at any point in time, each of which is responsible for locating and loading classes from different sources.
One of the class loaders, the primordial class loader, is a built-in part of the JVM; that is, it is written in C or whatever language the JVM is written in and is an integral part of the JVM. It is the root class loader and is responsible for loading trusted classes; these are classes from the core Java classes and those classes which can be found in the CLASSPATH and usually in the local filestore.
Classes loaded by the primordial class loader are regarded as special insofar as they are not subject to verification prior to execution; that is, they are assumed to be well formed, safe Java classes. Obviously if would-be attackers could somehow inveigle a malicious class into the CLASSPATH of a JVM they could cause serious damage. 2
In addition to this primordial class loader, application writers (including JVM implementors) are at liberty to build more class loaders to handle the loading of classes from different sources such as the Internet, an intranet, local storage or perhaps even from ROM in an embedded system. These class loaders are not a part of the JVM; rather, they are part of an application running on top of the JVM, written in Java and extending the java.lang .ClassLoader class.
The most obvious example of this is in the context of a Web browser which knows how to load classes from an HTTP (Web) server. The class loader which does this is generally known as the applet class loader and is itself a Java class which knows how to request and load other Java class files from a Web server across a TCP/IP network.
In addition, application writers can implement their own class loaders by subclassing the ClassLoader class (note that such behavior may be disallowed by the SecurityManager in an applet; we discuss more of this in the next chapter).
It is clear then that there can be many types of class loader within a Java environment at any one time. In addition, there may be many instances of a particular type of class loader operating at once.
To summarize the above;
  • There will always be one and only one primordial class loader. It is part of the JVM, like the execution engine.
  • There will be zero or more additional ClassLoader derivatives, written in Java and extending the ClassLoader abstract class. In a Web browser environment there will be at least one additional class loader: the applet class loader.
  • For each additional ClassLoader type, there will be zero or more instances of that type created as Java objects.
Let's look at this last point more closely. Why would we want to have multiple instances of the same class loader running at any one time?
To answer this question we need to examine what class loaders do with a class once it has been loaded.
Every class present in the JVM has been loaded by one and only one class loader. For any given class, the JVM "remembers" which class loader was responsible for loading it. If that class subsequently requires other classes to be loaded, the JVM uses the same class loader to load those classes.
This gives rise to the concept of a name space: the set of all classes which have been loaded by a particular instance of a class loader. Within this name space, duplicate class names are prohibited. More importantly, there is no cross name space visibility of classes; a class in one name space (loaded by a particular class loader) cannot access a class in another name space (loaded by a different class loader).
Returning to the question "Why would we want to have multiple instances of a given ClassLoader derivative?", consider the case of the applet class loader. It is responsible for loading classes from a Web server across the Internet or intranets. On most networks (and certainly the Internet) there are many Web servers from which classes could be loaded and there is nothing to prevent two Webmasters from having different classes on their sites with the same name.
Since a given instance of a class loader cannot load multiple classes with the same name, if we didn't have multiple instances of the applet class loader we would very quickly run into problems when loading classes from multiple sites. Moreover, it is essential for the security of the JVM to separate classes from different sites so that they cannot inadvertently or deliberately cross reference each other. This is achieved by having classes from separate Web sites loaded into separate name spaces which in turn is managed by having different instances of the applet class loader for each site from which applets are loaded.
The Class Loading Process
The ability to create additional class loaders is a very powerful feature of Java. This becomes particularly apparent when you realize that user- written class loaders have first refusal when it comes to loading classes; that is, they take priority over the primordial class loader. This enables a user-written class loader to replace any of the system classes, including the SecurityManager. In other words, since the class loader is Cerberus to the JVM's Hades, you had better be sure that when you replace it, you don't inadvertently install a lapdog in its place.
We have already stated that a class loader which has loaded a particular class is invoked to load any dependent classes. We also know that a class loader generally has responsibility for loading classes from one particular source such as Web servers.
What if the class first loaded requires access to a class from the trusted core classes such as java.lang.String? This class needs to be loaded from the local core class package, not from across a network. It would be possible to write code to handle this within the applet class loader but it is unnecessary. We already have a class loader in the shape of the primordial class loader which knows how to load classes from the trusted packages.
This leads us to our second observation about class loaders: they frequently interoperate, one class loader asking another to load a class for it.
To illustrate how this works, consider the PointlessButton applet. As a reminder, PointlessButton uses a second class, JamJar.examples.Button which represents a push button on the browser display. Pushing the button results in nothing happening and a display being updated to inform you how many times nothing has happened to date.
When a Web browser encounters the pointlessButton applet in a Web page the following sequence of events occurs:
  1. The browser finds the <APPLET> tag in the Web page and determines that it needs to load PointlessButton.class from the Web server. It creates an instance of the applet class loader (specific to this Web site) to fetch the class.
  2. The applet class loader first asks the primordial class loader to load PointlessButton.class. The primordial class loader which only knows about the trusted classes fails to locate the class and returns control to the applet class loader.
  3. The applet class loader connects to the Web site using the HTTP and downloads the class.
  4. The JVM begins executing the PointlessButton applet.
  5. PointlessButton needs to create an instance of JamJar.examples.Button, a class which currently has not been loaded. It requests the JVM to load the class.
  6. The JVM locates the applet class loader which loaded PointlessButton and invokes it to load JamJar.examples.Button.
  7. The applet class loader again first asks the primordial class loader to load the JamJar.examples.Button class and again the primordial class loader fails to find it and returns control to the applet class loader which is able to load the class from the Web server.
  8. JamJar.examples.Button creates a java.lang.String object as the title of the button. The String class has not yet been loaded so again the JVM is requested to load the class.
  9. The applet class loader which loaded both PointlessButton and JamJar.examples.Button is now invoked to load the java.lang.String class.
  10. The applet class loader requests the primordial class loader to load the String class. This time, the primordial class loader is able to locate and load the class since it is part of the trusted classes package. Since the primordial class loader was successful, the applet class loader needs look no further and returns.
There are a couple of interesting points to note here.
First, at step 7, if we were using a regular java.awt.Button class then the primordial class loader would have been able to find the class in the trusted packages and the search would have stopped.
Secondly, there are actually many references to the java.lang.String class in the code. However, only the first reference results in the class being loaded from disk. Subsequent requests to the class loader will result in it returning the class already loaded. Since it is the primordial class loader which loads the String class, if there are multiple applets on a single page, only the first one to request a String class will result in the primordial class loader loading the class from disk.
Note also the order in which the applet class loader searches for classes. An applet class loader could always search the Web server from which it loaded the applet first for any subsequent classes and this would cut out some calls to the primordial class loader. This would have been incredibly bad practice for two reasons:
  • Most of the class load requests for an applet will be for trusted classes from the java.* packages.
  • More importantly, if classes were sought on the Web server before being sought in the trusted package, it would allow subversion of built-in types, enabling malicious programmers to substitute their own implementations of core, trusted classes such as the SecurityManager or even the applet class loader itself.
For this reason all commercially available browsers have applet class loaders which implement the following search strategy: 3
  1. Ask the primordial class loader to load the class from the trusted packages.
  2. If this fails, request the class from the Web server from which the original class was loaded.
  3. If this fails, report the class as not locatable by throwing a ClassNotFound exception.
This search strategy ensures that classes are loaded from the most trusted source in which they are available.
Why You Might Want to Build Your Own Class Loader
If it is done correctly, a user-built class loader can significantly enhance the security of an application deployed on an intranet, particularly if it is used in conjunction with a firewall and other local security measures.
Note that at the time of writing, Web browsers use the security manager to prohibit the creation of new derivatives of ClassLoader, although this may change with the new Java security model and the various permissions APIs which are being implemented. See Playing in the Sandbox examines the security manager in more detail.
Some of the situations in which a user-written class loader could be used are:
  • To restrict searches for trusted classes to a particular directory or path other than the CLASSPATH
  • To allow the JVM to load classes from a particular source such as from EPROM or a non-TCP/IP network
  • To specify paths which should be searched in advance of the CLASSPATH
  • To provide auditing information about access to classes
In each of these cases you will need to build your own class loader and implement your own search strategy for locating classes.
It is beyond the scope of this book to show you how to write your own extension to ClassLoader and there are other resources, both books and on-line, which will teach you the specifics. For the serious codeheads out there, there is a sample ClassLoader included on the CD accompanying this book which implements a simple audit trail for class libraries.
The Class File Verifier 4
Once a class has been located and loaded by a class loader (other than the primordial class loader), it still has another hurdle to cross before being available for execution within the JVM. At this point we can be reasonably sure that the class file in question cannot supplant any of the core classes, cannot inveigle its way into the trusted packages and cannot interfere with other safe classes already loaded.
We cannot, however, be sure that the class itself is safe. There is still the safety net of the SecurityManager which will prevent the class from accessing protected resources such as network and local hard disk, but that in itself is not enough. The class might contain illegal bytecode, forge pointers to protected memory, overflow or underflow the program stack, or in some other way corrupt the integrity of the JVM.
As we have said in earlier chapters, a well behaved Java compiler produces well behaved Java classes and we would be quite happy to run these within the JVM since the Java language itself and the compiler enforce a high degree of safety. Unfortunately we cannot guarantee that everyone is using a well behaved Java compiler. Nasty devious hacker types may be using home made compilers to produce code designed to crash the JVM or worse, subvert the security thereof. In fact, as we saw in Chapter 4, we can't even be sure that the source language was Java in the first place!
In addition to this there is the problem of release-to-release binary compatibility. Let's say that you have built an applet which uses a class called TaxCalculator from a third party. You have constructed your applet with great care and have purchased and installed the TaxCalculator class on the server with your applet code.
At this point you are certain that the methods you call in TaxCalculator are present and valid but what happens if/when you upgrade TaxCalculator? Of course you should make sure that the API exposed by TaxCalculator hasn't changed and that your class will still work, but what if you forget? In practice it is quite possible that TaxCalculator has changed between versions and methods or fields which were previously accessible have become inaccessible, been removed or changed type from dynamic to static fields. In this case, when your applet is downloaded to a browser and it tries to make method calls or access fields within TaxCalculator those calls may fail.
This is because the binary (code) compatibility between the classes has been broken between releases. These problems exist with all forms of binary distributable libraries. On most systems this results in at best a system message and the application refusing to run; at worst the entire operating system could crash. The JVM has to perform at least as well as other systems in these circumstances and preferably better.
For all of the above reasons, an extra stage of checking is required before executing Java code and this is where the class file verifier comes in.
After loading an untrusted class via a ClassLoader instance, the class file is handed over to the class file verifier which attempts to ensure that the class is fit to be run. The class file verifier is itself a part of the Java Virtual Machine and as such cannot be removed or overridden without replacing the JVM itself.
The Duties of the Class File Verifier
Before we discuss what the class file actually does we look at the possible ways in which a class file could be "unsafe." By understanding the threat, we can see better how the Java architecture goes about countering it and expose any holes in the security provided by the class file verifier.
The following are some of the things that a class file could do which could compromise the integrity of the JVM:
  • Forge illegal pointers. If a Java class can obtain a reference to an object of one type and treat it as an object of a different type then it effectively circumvents the access modifiers (private, protected or whatever) on the fields of that object. This type of attack is known as a class confusion attack since it relies on confusing the JVM about the class of an object.
  • Contain illegal bytecode instructions. The JVM's execution engine is responsible for running the bytecode of a program in the same way as a conventional processor runs machine code.
When a conventional processor encounters an illegal instruction in a program, there is nothing that it can do other than stop execution. You may have seen this in Windows programs where the operating system can at least identify that an illegal instruction has been found and display a message.
Similarly, if the execution engine finds a bytecode instruction that it cannot execute, it is forced to stop executing. In a well written execution engine this would not be good but in a poorly written version it is possible that the entire JVM, or the Web browser in which it is embedded or even the underlying operating system might be halted. This is obviously unacceptable.
  • Contain illegal parameters for bytecode instructions. Passing too many or too few parameters to a bytecode instruction, or passing parameters of the wrong type, can lead to class confusion or errors in executing the instruction.
  • Overflow or underflow the program stack. If a class file could underflow the stack (by attempting to pop more values from it than it had placed on it) or overflow the stack (by placing values on it that it did not remove) then it could at best cause the JVM to execute an instruction with illegal parameters or at worst crash the JVM by exhausting its memory.
  • Perform illegal casting operations. Attempting to convert from one data type to another - for example, from an integer to a floating point or from a String to an Object - is known as casting. Some types of casting can result in a loss of precision (such as converting a floating point number to an integer) or are simply illegal (such as converting a String to a DataInputStream).
The legality of other types of casts is less clear, for example, all Strings are Objects (since the String class is derived from the Object class) but not all Objects are Strings. Trying to cast from an Object to a String is legal only if the Object is originally a String or a String derivative. Allowing illegal casts to be performed will result in class confusion and thus must be prevented.
  • Attempt to access classes, fields or methods illegally. As discussed above, a class file may attempt to access a nonexistent class. Even if the class does exists, it may attempt to make reference to methods or fields within the class which either do not exist or to which it has no access rights. This may be part of a deliberate hacking attempt or as a result of a break in release-to-release binary compatibility.
By tagging each object with its type, the JVM could check for illegal casts. By checking the size of the stack before and after each method call, stack overflows and underflows can be caught. The JVM could also test the stack before each bytecode was executed and thus avoid illegal or wrongly numbered parameters.
In fact, all of these tests could be made at runtime but the performance impact would be significant. Any work that the class file verifier can do in advance of runtime to reduce the performance burden is welcome. With some idea of the magnitude of the task before the class file verifier, we now look at how it meets this challenge.
The Four Passes of the Class File Verifier
Before we go into any detail on how the class file verifier works it is important to note that the Java specification requires the JVM to behave in a particular way when it encounters certain problems with class files, which is usually to throw an error and refuse to use the class.
The precise implementation varies from one vendor to the next and is not specified. Thus some vendors may make all checks prior to making a class file available; others may defer some or all checks until runtime. The process described below is the way in which Sun's HotJava Web browser works; it has been adopted by most JVM writers, not least because it saves the effort of reinventing a complex process.
The class file verifier makes four passes over the newly loaded class file, each pass examining it in closer detail. Should any of the passes find fault with the code then the class file is rejected. For reasons which we explain below, not all of these tests are performed prior to executing the code. The first three passes are performed prior to execution and only if the code passes the tests here will it be made available for use.
The fourth pass, really a series of ad hoc tests, is performed at execution time, once the code has already started to run.
Pass 1 - File Integrity Check
The first and simplest pass checks the structure of the class file. It ensures that the file has the appropriate signature (first four bytes are 0x CAFEBABE) and that each of the structures within the file is of the appropriate length. It checks that the class file itself is neither too long nor too short and that the constant pool contains only valid entries. Of course class files may have varying lengths but each of the structures (such as the constant pool) has its length included as part of the file specification.
If a file is too long or too short, the class file verifier throws an error and refuses to make the class available for use.
Pass 2 - Class Integrity Check
The second pass performs all other checking which is possible without examining the actual bytecode instructions themselves. Specifically, it ensures that:
  • The class has a superclass (unless this class is Object).
  • The superclass is not a final class and that this class does not attempt to override a final method in its superclass.
  • Constant pool entries are well formed, and that all method and field references have legal names and signatures.
Note that in this pass, no check is made as to whether fields, methods or classes actually exist, merely that their names and signatures are legal according to the language specification.
Pass 3 - Bytecode Integrity Check
This is the pass in which the bytecode verifier runs and is the most complex pass of the class file verifier. The individual bytecodes are examined to determine how the code will actually behave at runtime. This includes data-flow analysis, stack checking and static type checking for method arguments and bytecode operands.
It is the bytecode verifier which is responsible for checking that the bytecodes have the correct number and type of operands, that datatypes are not accessed illegally, that the stack is not over or underflowed and that methods are called with the appropriate parameter types.
The precise details of how the bytecode verifier operates may be found in See The Bytecode Verifier in Detail . For now, it is important to state two points:
First, the bytecode verifier analyzes the code in a class file statically. It attempts to reconstruct the behavior of the code at runtime, but does not actually run the code.
Secondly, some very important work has been done in the past and more recently by one of the authors of this book which demonstrates that it is impossible for static analysis of code to identify all of the problems which may occur at runtime. We include this proof in See An Incompleteness Theorem for Bytecode Verifiers .
To restate this in simple terms, any class file falls into one of three categories:
  • Runtime behavior is demonstrably safe.
  • Runtime behavior is demonstrably unsafe.
  • Runtime behavior is neither demonstrably safe nor demonstrably unsafe.
Clearly the bytecode verifier should accept those class files in the first category and reject those in the second category. The problem arises with class files in the third category.
These class files may or may not contain code which will cause a problem at runtime, but it is impossible from static analysis of the code alone to determine which is the case.
The more complex the bytecode verifier becomes, the more it can reduce the number of cases which fall into the third category but no matter how complex the verifier, it can never completely eliminate the third category and for this reason there will always be bytecode programs which pass verification, but which may contain illegal code.
This means that simply having the bytecode verifier is not enough to prevent runtime errors in the JVM and that the JVM must perform some runtime checking of the executable code.
Lest you be panicking at this stage you should comfort yourself with the thought that the level of verification performed by the JVM prior to executing bytecode is significantly higher than that performed by traditional runtime environments for native code (that is, none at all).
Pass 4 - Runtime Integrity Check
As we have hinted, the JVM must make a tradeoff between security and efficiency. For that reason, the bytecode verifier does not exhaustively check for the existence of fields and classes in pass 3. If it did, then the JVM would need to load all classes required by an applet or application prior to running it. This would result in a very heavy overhead which is not strictly required.
We'll examine the following case with three classes, MyClass, MyOtherClass and MySubclass, which is derived from MyClass. MyOtherClass has two public methods
  • methodReturningMyClass() which returns an instance of MyClass (huzzah! for meaningful method names!) and
  • methodReturningSubclassOfMyClass( ) which returns an instance of SubclassOfMyClass.
Against this background, consider the following code snippet.
MyOtherClass x = new MyOtherClass( );
MyClass y = x.methodReturningMyClass( );
In pass 3, the class file verifier has ascertained that the method methodReturningMyClass( ) is listed in the constant pool as a method of MyOtherClass which is public (and therefore reachable from this code).
It also checks that the return type of methodReturningMyOtherClass( ) is MyClass. Having made this check and assuming that the classes and methods in question do exist, the assignment statement in the second line of code is perfectly legal. The bytecode verifier does not in fact need to load and check class MyOtherClass at this point.
Now consider this similar code:
MyOtherClass x = new MyOtherClass( );
MyClass y = x.methodReturningSubclassOfMyClass( );
In this case, the return type of the method call does not return an object of the same class as y, but the assignment is still legal since the method returns a subclass of MyClass. This is not, however, obvious from the code alone: the verifier would need to load the class file for the return type SubclassOfMyClass and check that it is indeed a subclass of MyClass.
Loading this class involves a possible network access and running the class file verifier for the class and it may well be that these lines of code are never executed in the normal course of the program's execution in which case loading and checking the subclass would be a waste of time.
For that reason, class files are only loaded when they are required, that is when a method call is executed or a field in an object of that class is modified. This is determined at runtime and so that is when the fourth pass of the verifier is executed.
Summary
You have now seen the types of checking which take place before a class file from an untrusted source can be loaded and run inside the JVM. While not perfect, this is significantly more checking than is performed on any conventional operating system (that is, none at all).
Once it is running, code from untrusted sources is subject to further checking at the hands of the security manager which we have mentioned briefly here.See Playing in the Sandbox describes how the security manager works and looks at ways in which it is possible to reduce the burden placed on the class loader and class file verifier by extending the range of classes which the JVM regards as trusted.

No comments:

Post a Comment