Chapter 6: Garbage Collection ( Mcgrawhill )

Overview

This chapter introduces you to garbage collection, the process that Java uses for managing unused memory. Every object created in Java uses memory. Garbage collection ensures that when your program is finished using an object, the memory is freed. This decreases the chance of creating an unwanted memory leak in Java code, although memory leaks can still occur in some situations. It also greatly simplifies the design and implementation of code, as opposed to C and C++, in which programmers spend precious time manually programming memory management.

Although the Java specification doesn't require it, most implementations of the Java Virtual Machine (JVM) use a mark-sweep garbage collection system. In this system, objects become eligible for deletion as soon as the last reference to them drops, but they aren't actually deleted until free memory is exhausted. When the system determines that it needs memory, it deletes any object that it determines is no longer in use. This takes the deletion control out of the hands of the programmer. The object's finalizer is called to alert the programmer that an object is about to be deleted. The finalizer of the object is simply a method of an object that is called just before the object is deleted; the finalizer for an object isn't required and is often omitted.

The Java language provides some built-in routines for controlling garbage collection: the methods System.gc() and System.runFinalization(). System.gc() requests that garbage collection be run. System.runFinalizers() asks that finalizers be executed, but that memory not necessarily freed. We'll discuss the difference later in the chapter. We'll also talk about the new classes in java.lang.ref that you can use for more advanced memory management.

Although garbage collection simplifies the writing of Java code, it is not an excuse to become lazy. Garbage collection imposes some trade-offs, and you still have to make decisions so the system works efficiently. After you read this chapter, you'll be able to make those decisions with certainty.

Garbage Collection

Garbage collection is nothing new; it has been used in languages such as Lisp and Smalltalk for many years. When an application runs, it uses memory. Memory is one of the most basic computer resources and tends to be one of the most limited. The action of creating a new object is the biggest use of memory in a Java application. Because it is difficult to tell how many objects are going to be created when a program runs, it is difficult to tell how much memory a program will need.

Java manages memory in a structure called a heap. Every object that Java creates is allocated in the heap, which is created when the application begins and is managed automatically by the JVM. Java attempts to ensure that there is always enough memory in the heap to create a new object through a process called garbage collection.

The basic idea behind a garbage collection system is simple: if memory is allocated, it eventually has to be freed. There is only so much memory on a computer—even today's modern machines with huge amounts of memory have an upper limit. If a program repeatedly allocates memory without freeing it, the system will eventually run out of memory and the program will fail.

The problem with freeing memory is that it can be very difficult to determine when memory should be freed. It is always clear when memory is allocated; each object can be tracked down to the single new statement that created it. It is not as clear when an object is no longer being used. An object may be in use by many different parts of a program at once; determining which of these parts is the last one to use the object can be impossible to figure out before the program runs.

Consider a class that represents a company with employees. Each employee has a reference to the company. There are an unknown number of employees in the system. If all the employees for the company are freed, the company is no longer being used and should be freed. However, the company should not be freed until all the employees have been freed.

Now you have a problem: How do you know when to free the Company object? If you free the Company object too soon while there are still employees referring to the company, then those employee objects may fail when they attempt to reference the nonexistent company. If you never free the Company object, although there are no employees referring to the company, then we are unnecessarily using up valuable memory.

On the Job / Can a Java application run out of memory? Yes, if there are too many strong references. The garbage collection system attempts to remove objects from memory when they are not used. However, if you maintain too many live objects (strongly referenced from other live objects,) the system can run out of memory. Garbage collection cannot ensure that there is enough memory, only that the memory that is available will be managed as efficiently as possible.

It takes a great deal of work and diligence on behalf of a programmer to do this manually. Java relieves the programmer of having to do this by moving the determination of which objects are in use into the Java runtime system. The runtime system can look behind the scenes to determine which objects are in use and automatically free those that are not.

This may sound like a trivial advance in programming languages, but studies have shown that up to 90 percent of all programming errors are related to poorly written memory management. By automatically managing memory, Java automatically removes up to 90 percent of the possible bugs in your programs!

Although garbage collection does a nice job of making sure that objects are freed once they are no longer used, it cannot give you an infinite amount of memory, and it cannot make your program use less memory. If you continuously add objects in a Java program without dropping references to them, the program can still run out of memory. Garbage collection makes sure that well-behaved programs have enough memory; it doesn't ensure that poorly programmed ones do.

Certification Objective 6.01: The Behavior of the Garbage Collection System

The first part of this chapter deals with classic garbage collection: garbage collection as it was introduced in the first Java releases. This is what will be tested on the Java 2 exam. With the release of Java 2, Sun added functionality on top of the existing system that allows for more programmer control of how and when objects are collected. This is only briefly covered in this chapter, because it is not tested on the exam.

In classic garbage collection, the JVM has the responsibility for making sure that unused objects are deleted from memory, or collected. It does this by checking to see if other objects refer to an object; an object that is not referenced is considered unused and can be collected.

Java manages these objects through references. A reference is what is returned from the new statement; it allows one object to refer to another. An object can have any number of references pointing to it; once the last of these references is dropped, the object can be collected. Java manages this for you, so you never have to delete an object explicitly.

Mark-Sweep

The JVM that Sun distributes is not the only one around; other companies such as Symantec, Borland, Netscape, IBM, and Microsoft have created them as well. The Java language specification gives these implementers a great deal of flexibility in how garbage collection is implemented. The specification describes how the system works from the programmer's point of view and not from an internal point of view. As long as the programmer experience is the same, the internal implementations of a JVM can be very different.

Exam Watch / You will need to know what type of behavior is guaranteed by any JVM that follows the Java language specification. This means that you must acquire in-depth knowledge of the garbage collection model if you want to answer these questions accurately. The exam will not ask for terms such as mark-sweep, but it will make sure that you understand what is happening during runtime.

Having said that, most JVMs implement garbage collection using a variant of the mark-sweep algorithm. This algorithm gets its name from the two passes that it makes over the memory. The first pass marks those objects that are no longer used, and the second pass removes (sweeps) those objects from memory.

When discussing a garbage collection algorithm, the term reachable is used to describe objects that are in use. The idea is that active objects in memory find a huge interconnected web; any object that is on that web can be reached by traversing from another object that is in use. If no objects refer to an object, then that object is unreachable, and can therefore be removed from memory.

Mark-sweep starts with the assumption that some objects are always reachable. The main application object, for example, will be in use as long as the program is executing. The run() methods of threads, which are discussed in Chapter 9, are always reachable for similar reasons. Objects that are always reachable by definition are considered root objects, and are used to start traversing the web of objects. If an object can be reached from a root object, then it is in use. If an object can be reached from an object that can be reached from a root object, then it is in use. If an object can be reached from an object that can be…well, you get the idea. In general, if an object can be traced back to a root object, then it is in use.

Figure 6-1 shows a typical program containing several objects. All Java programs start from a main() method, which is the main thread of the program. The main() method can create several objects and hold references to them. In this example, it has created three objects: two Employee objects and a Timer thread. In our example, let's imagine that it only holds reference variables to the Employee objects. The thread was created with the new keyword, but no reference variable name was stored.


Figure 6.1: Viewing some objects in a typical program

In this example, the Employee objects each have a reference to the same Company object. If the main() thread drops its variable reference to the Employee A object, then Employee A is eligible for collection, as is the String object and any other objects that Employee A contains. The Company object will still have a live variable reference from the Employee B object, so it is not eligible for collection. If the main() thread loses its variable references to both Employee objects, then Company would be eligible.

The mark phase of the mark-sweep garbage collection starts with the root objects and looks at all of the objects to which each root object refers. If the object is already marked as being in use, nothing more happens. If not, the object is marked as in use and all the objects to which it refers are considered. The algorithm repeats until all the objects that can be reached from the root objects have been marked as in use.

When the mark phase completes, the sweep phase begins. The sweep phase looks at each object in memory and sees if it was marked as in use by the mark phase. If it was, the sweep phase clears the in-use flag and does nothing more to the object. If, however, the object was not marked as in use, then the sweep phase knows that that object can be safely freed. The sweep phase then removes that object from memory.

Garbage Collection and Performance

As you can imagine, it can take a considerable amount of time to walk through all of the associations in memory. While the garbage collector is walking through these associations, it has to make sure that none of the associations changes. As a practical matter, this means that all other processing in the virtual machine stops while the garbage collector runs. This is one of the big disadvantages of garbage collection: the pause while the garbage collector runs.

One way that garbage collectors attempt to alleviate this overhead is by running the garbage collection only when needed. If there is plenty of free memory still available in the system, then there is no need to run a garbage collection. Unused objects are allowed to remain in memory because they do not affect the performance of the system; they are merely objects that sit in unused memory space.

The garbage collector is activated when Java attempts to allocate more memory than it has available. The garbage collector suspends the normal functioning of the virtual machine and executes a mark-sweep pass. After the garbage collector executes, the JVM attempts again to allocate memory. Ideally, the mark-sweep pass frees enough memory to satisfy the request . If the mark-sweep pass did not free enough memory, the request fails, and java.lang.OutOfMemoryError is thrown.

Let's look at the garbage collection system with some real Java code. We can't directly detect when it is called, but by monitoring the free memory, we can see when the JVM called for garbage collection. To monitor this, we will use the Runtime.freeMemory() method (Runtime is discussed later in this chapter). We'll make an endless loop that creates objects and then removes any reference to them:

1. import java.util.Date;

2. class GarbageFactory {

3. public static void main(String [] args) {

4. Runtime rt = Runtime.getRuntime();

5. System.out.println("Total JVM memory: " + rt.totalMemory());

6. Date d = null;

7. int total = 0;

8. while(true) {

9. d = new Date();

10. ++total;

11. if(total % 500 == 0) {

12. System.out.print("Objects: " + total);

13. System.out.println(" Memory: " + rt.freeMemory());

14. }

15. }

16. }

17. }

The code is pretty easy to follow. Line 8 creates an endless loop that keeps creating more objects. Line 11 uses modulus to print the status of our experiment every 500 objects. When I ran this on my computer, the memory started at about 800,000 bytes and slowly reduced to about 200,000 to 300,000 bytes before the free memory suddenly jumped back up to 800,000 bytes. From this, we can conclude that the garbage collection is being activated when there are about 200,000 bytes free to the JVM.

Therefore, the good news is that the JVM delays the running of the garbage collector as long as possible. The bad news is that garbage collection can run at any time. You don't receive prior notice that the garbage collector is about to run. Consequently, any request for more memory could result in the garbage collector running and making your program wait. This is one of the big problems with using Java for a real-time system—in a real-time system, you have to know when the pauses will be.

This is also the reason that garbage collection systems have the (undeserved) reputation for being slow. The total time that a garbage collection system takes to manage memory is only slightly more than the total time it takes to manage memory by hand. The difference is that a hand-written memory management scheme will rarely wait until memory is exhausted and then delete many objects at once. The action of deleting many objects at once causes the system to pause. Therefore, a garbage collected system might be only slightly slower overall, but may appear to be jerkier—performing a lot of work and then pausing—than a hand-written memory management scheme.

Certification Objective 6.02: Writing Code That Explicitly Makes Objects Eligible for Collection

In the previous section, we learned the theories behind Java garbage collection. This included a look behind the scenes on how the JVM deals with removing unused objects from memory. In this section, we show how to make objects available for garbage collection using actual code. We also discuss how to attempt to force garbage collection if it is necessary, and how you can perform additional cleanup on objects before they are removed from memory.

Exam Watch / You will need to be able to state, for any given situation, the guaranteed behavior of the garbage collection system. This will mean looking at a body of code and stating whether hypothetical objects have been cleared from memory.

Making Objects Available for Garbage Collection

As we discussed earlier, an object becomes eligible for garbage collection when there are no more references to it. Obviously, if there are no references, it doesn't matter what happens to the object. For our purposes it is just floating in space, unused and no longer needed.

We can easily remove an object reference from a variable by setting the variable to null. Examine the following code: