One quick enhancement to this method would be to split each sector (square in 2-dimensional space, cube in 3-dimensional space) into even quarters/eights. This produces quad-trees in 2-D, and oct-trees in 3D. The Barnes-Hut (BH) algorithm uses this quad-tree/oct-tree subdivision of space. There are two methods of constructing this: one can either place particles in the tree one at a time, or used an adaptive quadtree (recursive subdivision.)
For particle-by-particle, you start at the top node. Whenever a node already has a particle, you simply make both the new and present occupant become children of that node, and progress recursively. Since the area of each parent is divided uniformly into quadrants/octants, you end up with a large number of empty nodes once you are done, and simply set these to Null. The second method, adaptive quadtrees, is the approach we used herein.
For adaptive quadtrees, you have a node and a list of particles in that node. First, you find the node center of mass and its extent and limits. Then, you divide the node space into quadrants/octants, around the center of mass. You subdivide your particle list into a separate list for each sector, and then descend into each sector to reiterate this process until each bottom leaf, with one particle per node, is made. The chief difference in this method (besides using different code to construct it), is that the subdivision of a node into quadrants/octants is based on the physical nature of that space (its actual extrema and using its center of mass) rather than a purely geometric division into equal quarters. Therefore, you are only defining regions that have something interesting in them, and each child may be of much smaller extent than a purely geometric slicing.
We attempted a more dynamic scheme, were each parent was not forced to have 4 (or 8) children, but instead had one primary Child. This child then served as the head for a linked list of later siblings (using a pointer to ->younger). Traversing the tree was simple: one visited a node, descending to its child if it had one, then went along the list of younger siblings until there were no more. Each sibling contained a link back to its parent, so travel up the tree was rapid. Indeed, this scheme was both elegant and consumed little resources.
However, implementation took much longer than the fixed number of children used in this setup. And, it was not clear that this would save in memory, since it required extra calls to the system memory allocator; its performance times with earlier versions of this tree code did not show a significant speed-up. Therefore, we filed it away as an interesting experiment.
The final code is, essentially, our particle-particle code from Project 1 with a quick call to "build_tree", a quick replacement of the PP force step with a Tree force step that has nearly identical calling parameters, and a similar change for the potential energy calculations. As a bonus, we kept in the PP potential energy calculator as a check on our initial tree's accuracy (running the PP energy calculation against the tree at the first step, to compare results.) This code reuse greatly speeded development. Some other small customizations had to be made; the equations for force, distance, and potential were made more generic so they could handle nodes or particles, and the structures were slightly redefined.
One large change was to change our initial naive point structure, which had "x, y, z, vx, vy, vz" to simply have an array for coordinates and an array for velocities. By making this change, and using loops for dimensioned qualities instead of directly referencing X/Y/Z elements, our code is functional for both 2-dimensional and 3-dimensional simulations by simply changing one defined constant. Our results presented here are for the 2-dimensional case, simply because the graphics are much more clear than for a 3-dimensional model.
Our code is available as a tar file, including perl routines for making the initial points files. The "readme" file contains full details on all files in this distribution.
The energy calculation and simulation results using Theta=0 and using a particle-particle code should return identical results. And, for higher N runs, energy conservation should be maintained as with all our N-body codes. While it isn't the definitive assessment of quality, it does provide an instant indication of major faults when energy is not conserved.
We can do an initial parameter "back of the envelope" calculation to determine a proper time step. Given F = m a ~= 1/r^2 and v = v_o + a t, we don't want a velocity increase in one time step to cover more than, say, 1/100th of the area for the average case. So, setting: a t < 0.01 -> F/m * t < 0.01 -> t/(m r^2) < 0.01. The average distance is related by density, and we can assume at uniform density that particles are R/n^2 apart (for the 2-dimensional case). So, for a system between 0 and 1 with unit masses, for 100 particles, they are typically 0.1 apart, and we want t < 0.005 or so.
Tree rebuilding is another issue. Rebuilding the tree each time step seems wasteful; if the time steps are defined so particles do not significantly stray, the tree should be viable for at least a few time steps. Table 1 shows the results for different rebuild intervals. When the tree was not rebuilt, a recalculation of each node's center of mass (CM) and extent is required. Our nodes are defined loosely as circles (or spheres), with a CM position and a radius (the extent). We do not have the individual particle positions saved within each node, so we must make an estimate of the new extent when recalibrating. After the new CM is calculated, the new extent can be assumed as "distance from the new CM to the old CM, plus the extent." This errs on the side of caution, and also means that nodes tend to spread, or grow in size, when we do not rebuild. Spreading nodes implies our theta criteria gets met less often, and that the tree eventually must be rebuilt. We ran into one interesting trade-off: our time to recalculated the CM and extent was on part with our time to build a tree from scratch. This disadvantageous situation suggests we should work on a less intensive rebuilding routine, and for now just rebuild the tree each step.
| Rebuild timesteps |
time | NCalc | time on treebuild |
time on treetrim |
time in force tree |
|---|---|---|---|---|---|
| 1 | 26s | 3452 | 0.5 | 1.8 | 19.7 |
| 5 | 25s | 3452 | 0.5 | 2.2 | 21.3 |
| 10 | 26s | 3453 | 0.1 | 2.3 | 20.5 |
| 20 | 23s | 3453 | 0.1 | 2.2 | 20.6 |
| 50 | 24s | 3453 | 0.05 | 2.5 | 19.8 |
The first obvious conclusion is that, though Theta=0 approximates a PP code method, it is less optimized and thus a poor idea. PP codes typically take advantage of the symmetry of forces and potentials, and thus reduce the number of calculations down from N^2 to approximately half that. However, the tree method requires (at Theta=0) doing each calculation, and thus has a higher number of force calculations than the PP code. However, by the time you have set Theta to the range of 0.20, the number of calculations for the tree has finally proven less than for the PP code. We chose the value of Theta=0.25 for our simulations; this provided good accuracy while also providing a speed enhancement over PP codes.
The other clear factor is that, by large N, the advantages of the tree code begin to show. The increase in time for the Theta=0.25 case basically doubles with the doubling of the system size, while the PP code goes up by a factor of four. Time lost in the tree code due to tree building and overhead is suddenly dwarfed by the reduced number of force calculations the tree provides, once N gets large enough.
| # of particles | 20 | 100 | 100* | 256 | 512 | 512* |
|---|---|---|---|---|---|---|
| PP code | 0:01 | 0:22 | 0:20 | 2:05 | 9:41 | 9:21 |
| Tree (theta=0) | 0:05 | 1:24 | -- | -- | -- | -- |
| Tree (theta=0.25) | 0:04 | 0:57 | 0:20 | 3:30 | 7:40 | 6:03 |
| (Mesh, different scale) | -- | -- | -- | 1:40 | 1:45 | 1:43 |
| (Tree, mesh scale) | -- | -- | -- | 3:14 | -- | 10:00 |
The mesh routine spends 80% of its time on the FFT routine, and that's fairly independent of particle numbers or particle distribution, only on grid size. Grid size, though, is determined by the granularity of interaction that you are interested in. This places meshes in a separate category: for fine resolution, the single mesh grid square essentially has to shrink down to the same size as an individual tree node, and you're effectively using an inefficient tree.
The Tree is losing time in memory allocation, which is definitely an area that code tuning can improve. It spends 30% of its time simply doing system memory management. The force equation, on the other hand, takes 25% of the time, and another 21% is taken up by hunting through the tree to see whether to apply the force. Related to this, the calculation of theta takes another 23%. Clearly, a clean-up of the code can drastically improve performance; issues such as memory and tree lookups are an area that can easily be tuned. This further accentuates the edge in speed that the tree has over PP codes; trees have more room to optimize.
One potential area for future development would be to actively monitor the average depth that the force calculation requires, and to only rebuild the tree when too many PP encounters occur, indicating that the tree has "spread" too far and needs to be recomputed. Since the overhead for tree building and the overhead for simply recalculating nodes centers and extents are nearly equal, this may not provide a significant savings.
By far the best improvement, based on current literature, is to use a multipole expansion for our force (and potential) calculation. By using a more accurate expansion, one can use a more liberal value for Theta (descending less deep into the tree) yet still gain in accuracy. The "Fast Multipole Method" is indeed the most significant improvement for tree codes today.
As Figure 2 (and movie 1) shows, the collapse of random particles begins to form small groups, which further unit until there is a central mass containing nearly all the particles. A few particles are ejected from this interaction at sufficient velocity to escape the system. The majority, however, expand out and slow, and begin to collapse in again. However, this expansion and re-contraction shows even stronger examples of structure. For the case shown here, for example, two relatively cohesive groups have formed.
Essentially, we have a system where small amounts of structure magnify during the interaction, gain a stronger gravitational influence over the group, and thus increase their structure. The initial inhomogeneity of the original "random" mix thus produces structure. This is not visible with small-N PP calculations (see our earlier paper, for example a 20-particle simulation, for comparison.)
The third simulation imparts a hefty initial velocity to our second cluster, in this case a scale unit of 0.5 per full time interval. Figure 4 (and movie 3) shows four time steps of this, covering initial motion+collapse, interaction, and scatter. This high-velocity interaction shows strong forward and back scattering, as well as core interactions. Working with an increased N provides finer detail of the scatter and the core microstructures. The timing is very important, since the compactness of the galaxies at the time of interaction determines to a large degree whether they preferentially scatter, or merge. This simulation uses essentially parallel evolution; the two galaxies start in similar states and are collapsing and expanding on parallel time scales. By starting one at a larger extent then the other, we can explore a variety of morphological evolutions. Again, trees are useful for any work that looks beyond simple bulk motion.
By the same token, systems with disparate groups of particle that interact are best handled with tree codes. Colliding galaxies come into this category; for the pre-collision and post-collision stages, the majority of interactions can be done in the higher levels of the tree code, and the earlier steps computer much faster. For the actual close-in collision stage, the tree code is sufficiently robust to provide PP-like accuracy at not too much a cost in speed. And, relative to mesh codes, tree codes have a much finer granularity. Thus, in comparison with other N-body schemes, the use of tree codes for large N simulations is highly recommended.