Java 3DTM API Collateral — 1.2.1 Performance Guide

I - Introduction

The Java 3DTM API was designed with high performance 3D graphics as a primary goal. Since this is a new API, many of its performance features are not well known. This document presents the performance features of Java 3D in a number of ways. It describes the specific APIs that were included for performance. It describes which optimizations are currently implemented in Java 3D 1.2.1. And, it describes a number of tips and tricks that application writers can use to improve the performance of their application.

II - Performance in the API

There are a number of things in the API that were included specifically to increase performance. This section examines a few of them.

— Capability bits
Capability bits are the applications way of describing its intentions to the Java 3D implementation.
The implementation examines the capability bits to determine which objects may change at run time. Many optimizations are possible with this feature.

— Compile
The are two compile methods in Java 3D 1.2.1. They are in the BranchGroup and SharedGroup classes. Once an application calls compile(), only those attributes of objects that have their capability bits set may be modified. The implementation may then use this information to "compile" the data into a more efficient rendering format.

— Bounds
Many Java 3D object require a bounds associated with them. These objects include Lights, Behaviors, Fogs, Clips, Backgrounds, BoundingLeafs, Sounds, and Soundscapes. The purpose of these bounds is to limit the spatial scope of the specific object. The implementation may quickly disregard the processing of any objects that are out of the spatial scope of a target object.

— Unordered Rendering
All state required to render a specific object in Java 3D is completely defined by the direct path from the root node to the given leaf. That means that leaf nodes have no effect on other leaf nodes, and therefore may be rendered in any order. There are a few ordering requirements for direct descendents of OrderedGroup nodes or Transparent objects. But, most leaf nodes may be reordered to facilitate more efficient rendering.

— Appearance Bundles
A Shape3D node has a reference to a Geometry and an Appearance. An Appearance NodeComponent is simply a collection of other NodeComponent references that describe the rendering characteristics of the geometry. Because the Appearance is nothing but a collection of references, it is much simpler and more efficient for the implementation to check for rendering characteristic changes when rendering. This allows the implementation to minimize state changes in the low level rendering API.

III - Current Optimizations in Java 3D 1.2.1

This section describes a number of optimizations that are currently implemented in Java 3D 1.2.1. Other optimizations will be implemented as the API matures. The purpose of this section is to help application programmers focus their optimizations on things that will compliment the current optimizations in Java 3D.

— Hardware
Java 3D uses OpenGL and Direct3D as its low level rendering APIs. It relies on the underlying OpenGL and Direct3D drivers for its low level rendering acceleration. Using a graphics display adapter that offers OpenGL or Direct3D acceleration is the best way to increase overall rendering performance in Java 3D.

— Compile
In the
Java 3D 1.2 release, no compile optimizations were implemented. The following compile optimizations are implemented in the Java 3D 1.2.1 release:

— State Sorted Rendering
Since Java 3D allows for unordered rendering for most leaf nodes, the implementation sorts all objects to be rendered on a number of rendering characteristics. The characteristics that are sorted on are, in order, Lights, Texture, Geometry Type, Material, and finally localToVworld transform. The only exception to this is any child of an OrderedGroup node. There is no state sorting for those objects.

— 3View Frustum Culling
The Java 3D implementation implements view frustum culling. The view frustum cull is done when an object is processed for a specific Canvas3D. This cuts down on the number of objects needed to be processed by the low level graphics API.

— Multithreading
The Java 3D API was designed with multithreaded environments in mind. The current implementation is a fully multithreaded system. At any point in time, there may be parallel threads running performing various tasks such as visibility detection, rendering, behavior scheduling, sound scheduling, input processing, collision detection, and others. Java 3D is careful to limit the number of threads that can run in parallel based on the number of CPUs available.

IV - Tips and Tricks   <<=====

This section presents a number of tips and tricks for an application programmer to try when optimizing their application. These tips focus on improving rendering frame rates, but some may also help overall application performance. A number of these optimization will eventually be handled directly by the Java 3D implementation.

— Move Object vs. Move ViewPlatform
If the application simply needs to transform the entire scene, transform the ViewPlatform instead. This changes the problem from transforming every object in the scene into only transforming the ViewPlatform.

— Capability bits
Only set them when needed. Many optimizations can be done when they are not set. So, plan out application requirements and only set the capability bits that are needed.

— Bounds and Activation Radius
Consider the spatial extent of various leaf nodes in the scene and assign bounds accordingly. This allows the implementation to prune processing on objects that are not in close proximity. Note, this does not apply to Geometric bounds. Automatic bounds calculations for geometric objects is fine.

— Change Number of Shape3D Nodes
In the current implementation there is a certain amount of fixed overhead associated with the use of the Shape3D node. In general, the fewer Shape3D nodes that an application uses, the better. However, combining Shape3D nodes without factoring in the spatial locality of the nodes to be combined can adversely effect performance by effectively disabling view frustum culling. An application programmer will need to experiment to find the right balance of combining Shape3D nodes while leveraging view frustum culling. The .compile optimization that combines shape node will do this automatically, when possible.

— Geometry Type and Format
Most rendering hardware reaches
peak performance when rendering long triangle strips. Unfortunately, most geometry data stored in files is organized as independent triangles or small triangle fans (polygons). The Java 3D utility package includes a stripifier utility that will try to convert a given geometry type into long triangle strips. Application programmers should experiment with the stripifier to see if it helps with their specific data. If not, any stripification that the application can do will help. Another option is that most rendering hardware can process a long list of independent triangles faster than a long list of single triangle triangle fans. The stripifier in the Java 3D utility package will be continually updated to provided better stripification.

— Sharing Appearance/Texture/Material NodeComponents
To assist the implementation in efficient state sorting, and allow more shape nodes to be combined during compilation, applications can help by sharing Appearance/Texture/Material NodeComponent objects when possible.

— Geometry by reference
Using geometry by reference reduces the memory needed to store a scene graph, since Java 3D avoids creating a copy in some cases. However, using this features prevents Java 3D from creating display lists (unless the scene graph is compiled), so rendering performance can suffer in some cases. It is appropriate if memory is a concern or if the geometry is writable and may change frequently. The interleaved format will perform better than the non-interleaved formats, and should be used where possible. In by-reference mode, an application should use arrays of native data types; referring to TupleXX[] arrays should be avoided.

— Texture by reference and Y-up
Using texture by reference and Y-up format may reduce the memory needed to store a texture object, since Java 3D avoids creating a copy in some cases. Currently, Java3D will not make a copy of texture image for the following combinations of BufferedImage format and ImageComponent format (byReference and Yup should both be set to true):

On both Solaris and Win32 OpenGL:

BufferedImage.TYPE_CUSTOM
of form 3BYTE_RGB

BufferedImage.TYPE_CUSTOM
of form 4BYTE_RGBA

BufferedImage.TYPE_BYTE_GRAY

ImageComponent.FORMAT_RGB8 or
ImageComponent.FORMAT_RGB

mageComponent.FORMAT_RGBA8 or
ImageComponent.FORMAT_RGBA

ImageComponent.FORMAT_CHANNEL8

On Win32/OpenGL:

BufferedImage format
----------------------
BufferedImage.TYPE_3BYTE_BGR



ImageComponentFormat
----------------------
ImAgeComponent.FORMAT_RGB8 or
ImageComponent.FORMAT_RGB

On Solaris/OpenGL:

BufferedImage format
----------------------
BufferedImage.TYPE_4BYTE_ABGR



ImageComponentFormat
----------------------
ImageComponent.FORMAT_RGBA8 or
ImageComponent.FORMAT_RGBA

— Application Threads
The built in threads support in the Java language is very powerful, but can be deadly to performance if it is not controlled. Applications need to be very careful in their threads usage. There are a few things to be careful of when using Java threads. First, try to use them in a demand driven fashion. Only let the thread run when it has a task to do. Free running threads can take a lot of cpu cycles from the rest of the threads in the system - including Java 3D threads. Next, be sure the priority of the threads are appropriate.

Most Java Virtual Machines will enforce priorities aggressively. Too low a priority will starve the thread and too high a priority will starve the rest of the system. If in doubt, use the default thread priority. Finally, see if the application thread really needs to be a thread. Would the task that the thread performs be all right if it only ran once per frame? If so, consider changing the task to a Behavior that wakes up each frame.

— Java 3D Threads
Java 3D uses many threads in its implementation, so it also needs to implement the precautions listed above. In almost all cases, Java 3D manages its threads efficiently. They are demand driven with default priorities. There are a few cases that don't follow these guidelines completely.

— Behaviors
One of these cases is the Behavior scheduler when there are pending WakeupOnElapsedTime criteria. In this case, it needs to wakeup when the minimum WakeupOnElapsedTime criteria is about to expire. So, application use of WakeupOnElapsedTime can cause the Behavior scheduler to run more often than might be necessary.

— Sounds
The final special case for Java 3D threads is the Sound subsystem. Due to some limitations in the current sound rendering engine, enabling sounds cause the sound engine to potentially run at a higher priority than other threads. This may adversely effect performance.

— Threads in General
There is one last comment to make on threads is general. Since Java 3D is a fully multithreaded system, applications may see significant performance improvements by increasing the number of CPUs in the system. For an application that does strictly animation, then two CPUs should be sufficient. As more features are added to the application (Sound, Collision, etc.), more CPUs could be utilized. Note: When running in the Solaris environment, be sure that native threads are enabled. Green threads will not take advantage of multiple CPUs.

— Switch Nodes for Occlusion Culling
If the application is a first person point of view application, and the environment is well known, Switch nodes may be used to implement simple occlusion culling. The children of the switch node that are not currently visible may be turned off. If the application has this kind of knowledge, this can be a very useful technique.

— Switch Nodes for Animation
Most animation is accomplished by changing the transformations that effect an object. If the animation is fairly simple and repeatable, the flip-book trick can be used to display the animation. Simply put all the animation frames under one switch node and use a SwitchValueInterpolator on the switch node. This increases memory consumption in favor of smooth animations.

— Switch nodes under Writable Transforms
Switch nodes that are descendants of writable TransformGroup nodes can incur extra cost associate with updating the vworld bounds and localToVworld transforms of all children (not just those that are switched on). This is one more reason why it is better to rotate the viewer than the entire scene graph (see "Move Object vs. Move ViewPlatform").

— Link/SharedGroup versus cloneTree
Using multiple Link nodes pointing to a shared subgraph (SharedGroup) can have a performance penalty over a shallow clone of the scene graph. To create a shallow clone of the scene graph,
use cloneTree without duplication the node components. Restrict the use of Link/SharedGroup to those cases where you really need the kind of sharing that it provides.

— OrderedGroup Nodes
OrderedGroup and its subclasses are not as high performing as the unordered group nodes. They disable any state sorting optimizations that are possible. If the application can find alternative solutions, performance will improve.

— LOD Behaviors
For complex scenes, using LOD Behaviors can improve performance by reducing geometry needed to render objects that don't need high level of detail. This is another option that increases memory consumption for faster render rates.

— Picking
If the application doesn't need the accuracy of geometry based picking,
use bounds based picking. For more accurate picking and better picking performance, use PickRay instead of PickCone/PickCylnder unless you need to pick line/point. PickCanvas with a tolerance of 0 will use PickRay for picking.