Specification #161

iMesh: remove iMesh_areEHValid and add iMesh_optimize()

Added by Mark Miller over 6 years ago. Updated over 6 years ago.

Status:ResolvedStart date:02/02/2011
Priority:NormalDue date:
Assignee:Mark Miller% Done:

100%

Category:-Estimated time:0.50 hour
Target version:1.4Spent time:0.25 hour

Description

This is related to #159. Submitted on behalf of Carl. If I didn't get the new API spec right, then please fix.

The proposal is two parts.

  1. Remove the function iMesh_areEHValid.
  2. Add a new function
        /**\brief Tell an implementation to do whatever internal optimizations
         *        are possible.
         *
         * For performance reasons, in the presence of _modification_ where
         * entities and/or entity sets are added, moved, deleted, etc, implmentations
         * may _defer_ adjustments to their internal data structures until a _batch_
         * of such modifications are completed. This call allows the application to
         * to tell the implementation to go ahead and do whatever work might be
         * necessary to re-adjust their internal data structures. Depending on the
         * implementation as well as the amount and kind of changes to the mesh
         * that have been made prior to calling it, this call may be expensive.
         *
         * Any handles obtained by a caller previous to calling this method shall
         * be assumed to be invalidated.
         *
         * \param instance iMesh instance handle
         * \param *err Pointer to error type returned from function
         */
    
    void iMesh_optimize(iMesh_Instance instance,
                            /*out*/ int *err);
    

{{html(
<iframe width="300" height="250" frameborder="0" src="http://doodle.com/summary.html?pollId=dzt58b8gky6ns4hs"> </iframe>
)}}


Related issues

Related to ITAPS - Specification #159: iMesh: do we need iMesh_areESHValid sister to iMesh_areEH... Rejected 01/31/2011

History

#1 Updated by Tim Tautges over 6 years ago

This seems way too implementation-specific as currently worded. What is the defined behavior when an impl doesn't do any garbage collection? Or, when none has to be done?

#2 Updated by Mark Miller over 6 years ago

Tim Tautges wrote:

This seems way too implementation-specific as currently worded. What is the defined behavior when an impl doesn't do any garbage collection? Or, when none has to be done?

This statement presumes its possible to have a quality database system that supports deletion without having to do any garbage collection. I don't think thats possible. IMHO, whether garbage collection needs to be done or not is an answered question. Yes, it does. Hence the numerous contributions to this issue in the literature over the past 50 years of database system design.

Now, whether garbage collection needs to be explicitly triggered/controlled by the application or can be handled automagically by the database is perhaps an open question. Automatic garbage collection is always nice. However, I think performance concerns (both in time and space) will dictate a need for application to be able to inform the implementation when would be a good time to do it. In fact, having such a function missing from the API specification means the only possible route for an implementation is to use automatic garbage collection where it could defer work until iMesh_save is called or, when everything is getting freed anyways during iMesh_dtor or on demand as iMesh functions are called. The problem with defered approach is excessive growth in garbage before reaching a point where caller is ready to call iMesh_save or iMesh_dtor. The problem with the on demand approach is excessive work upon each manipulation of the iMesh to keep everything clean and free of garbage before returning to caller for next manipulation. In addition, I think the on demand approach is all too likely to invalidate handles caller has already acquired causing excessive re-acquisition of handles.

Having such a function in the API spec. doesn't mean an implementation still cannot use any of these automatic approaches to garbage collection. For these implementations, they can make iMesh_garbageCollect() a no-op if they so choose. But, this also gives them an opportunity to avoid both excessive growth in garbage and excessive work on keeping internal data structures clean by taking a hint from the application about when it thinks its ok to do garbage collection.

#3 Updated by Tim Tautges over 6 years ago

Well, but that's the rub. How many scientific computing codes do you know of that use garbage collection-supporting database systems for their fine-grained data? Maybe I'm misunderstanding what you mean by database systems (this from the person who maintains a library with that word as part of it's name...).

Also, I wonder whether we're thinking of memory recovery-like garbage collection with handle compression-like garbage collection. There are certainly cases where frequent creation/deletion will mess up performance with MOAB, esp for those algorithms using MOAB ranges. That's a case where being able to compress the handle space would be useful (compression always happens through a save/restore though).

#4 Updated by Mark Miller over 6 years ago

Tim Tautges wrote:

Well, but that's the rub. How many scientific computing codes do you know of that use garbage collection-supporting database systems for their fine-grained data?

I hear more and more code teams talk about having to modify the mesh on fine grained scale all the time; errosion and fracture models, adaptive mesh and dynamic load balancing come to mind as drivers. On a scale of a 1 million cores, being out of balance by the tiniest of percentages reaks total havoc. This implies ability to modify mesh on fine grained scale. These code teams are currently having to adapt their 20+ year old infrastructure to deal with this issue. So, they are facing the garbage collection problem already on their own, internal datastructures. I am sure all of them would prefer not to have to write code like that and use someone elses implmentation.

Also, I wonder whether we're thinking of memory recovery-like garbage collection with handle compression-like garbage collection.

I would be fine if you would rather call this function iMesh_compressHandleRanges ;) Seriously though, I find compressing out wasted space in handle ranges to be way, way, way more implementation specific than the general concept of garbage collection. The latter is used in many application domains for roughly the same basic meaning.

#5 Updated by Tim Tautges over 6 years ago

Ok, so I think either we're using the wrong term, or in practice this won't have much impact (I suspect the former). The wikipedia entry for "garbage collection" defines this as:

The basic principles of garbage collection are:
1. Find data objects in a program that cannot be accessed in the future
2. Reclaim the resources used by those objects

That's consistent with what I think of as garbage collection. If we use that definition, then I assert that any large scientific code worth their salt isn't doing this. Specifically, for their fine-grained data objects, they aren't allocating single instances off the heap; rather, they're using some form of pool-based allocation. Also, they don't need to go through step #1 above, i.e. go through data objects in a program, to figure out which ones aren't being used anymore. They're keeping track of that in their pool-based allocation code.

The examples you cite are certainly valid, and widespread; I doubt any good-performing of those use garbage collection as defined above as a means for recovering memory. That brings me back to the assertion that what we're really talking about is compression of handles/storage (in this thread I view those as synonymous). Maybe compression of handles is very implementation-specific, but compression of storage definitely isn't.

#6 Updated by Mark Miller over 6 years ago

Tim Tautges wrote:

Specifically, for their fine-grained data objects, they aren't allocating single instances off the heap; rather, they're using some form of pool-based allocation.

Yeah, I'd bet they are using some kind of pool-based allocation too. And, ones I know that are also doing mesh modification wind up with garbage in their pool they have to occasional free up. Whether they are using a heap or pool-based allocation is irrelvant; fine grained and maybe even coarse grained mesh modification algorithms can and often do lead to garbage in internal data structures that needs to be occasionally cleaned up. The garbage can take the form of stuff they want to delete and have marked as deleted but have not compressed out if their internal data structures such that their internal data structures aren't using memory for these marked as deleted but not yet eleminated from memory thingies.

#7 Updated by Mark Miller over 6 years ago

Indicate your vote here...

http://doodle.com/w88ms7ga2mw5c7e5

#8 Updated by Mark Miller over 6 years ago

Ok, thanks to Tim Wickberg, here is a better version of embedded link to Doodle vote-caster. To cast your vote, just click on the 'Participate' link in the embedded Doodle thingy below.

{{html(
<iframe width="300" height="250" frameborder="0" src="http://doodle.com/summary.html?pollId=w88ms7ga2mw5c7e5"> </iframe>
)}}

#9 Updated by Mark Miller over 6 years ago

Possible other names for this function...

iMesh_garbageCompact()
iMesh_trashCompact()
iMesh_compact()
iMesh_cleanup()
iMesh_readjust()
iMesh_resettle()
iMesh_anneal()

#10 Updated by Tim Tautges over 6 years ago

A better name would be: iMesh_compressHandles.

#11 Updated by E. Seegyoung Seol over 6 years ago

Tim Tautges wrote:

A better name would be: iMesh_compressHandles.

or, iMesh_areHandlesValid.

I agree with the idea of asking whether "(whatever) handles are invariant" to implementation and prefer having way more relevant name rather than "garbageCollect"; I couldn't vote for the poll with "iMesh_garbageCollect"

#12 Updated by Mark Miller over 6 years ago

Tim Tautges wrote:

A better name would be: iMesh_compressHandles.

I think that is a very MOAB-centric interpretation of what this function is/needs to do. Although the outward impact on the caller (frankly the only possible effect on the caller) is its effect of any previously obtained handles, that frankly is really only a likely and possibly unfortunate side effect of calling this function. So, including the word handles in it off the mark.

The real value in this function has to do with cleaning up the implementation's internal cruft that can accumulate in the presence of modification with efficient deletion. IMHO, if an implementation does not support BOTH modification and efficient deletion, this call is likely to be a no-op.

An INefficient deletion appraoch would be one where the implementation attempts to always maintain with complete compactness all of its internal data structures upon each and every delete; in which case I can almost guarantee two things...
  1. The caller will have to be carefully engineered to deal with impact of not knowing when iMesh calls it makes will cause invalidation of previously obtained handles.
  2. There exists an ordering of deletion operations what will cause such an implementation to blow up in time and/or space or both.

#13 Updated by Mark Miller over 6 years ago

Seegyoung Seol wrote:

I couldn't vote for the poll with "iMesh_garbageCollect"

Are you saying there is some techincal reason you were unable to cast a vote? The poll involves voting yes or no. So, you should be able to cast a vote ;)

#14 Updated by E. Seegyoung Seol over 6 years ago

Mark Miller wrote:

Seegyoung Seol wrote:

I couldn't vote for the poll with "iMesh_garbageCollect"

Are you saying there is some techincal reason you were unable to cast a vote? The poll involves voting yes or no. So, you should be able to cast a vote ;)

No tech reason not to vote. I vote for it (yes) under the assumption that we will choose more relevant name such as iMesh_areHandlesValid..

#15 Updated by Carl Ollivier-Gooch over 6 years ago

areHandlesValid gets us back to, essentially, the name we started with.

The spec already says something very much like "A particular handle always refers to the same entity for the lifetime of that entity". With a function name like "areHandlesValid", I'd expect the function to check whether handles have changed, but the spec says they can't. So what should I expect the function to do? (Answer: always return true.)

The current incarnation of areEHValid includes a magic argument that gives the implementation permission to change handles anyhow it sees fit. This is really the only useful behavior of the current function --- to allow the implementation to move things around behind the scenes to close up holes in the mesh database. (I'd argue that a correct implementation is going to have to update sets, tags, and relations, too, which makes this non-trivial.) As a side effect, all handles and iterators the application may be holding are invalidated, of course.

All this is to say that, while I don't have much of a positive opinion about what the function should be called, there are some names that don't match (what I perceive to be) the purpose of the function, and areHandlesValid is squarely in that range.

As a suggestion, how about something like:

purgeDeletedEntitiesFromDatabase

That's a little long-winded, and probably not exactly what we're after, but at least seems to me to capture what we intend the function to mean.

#16 Updated by Jason Kraftcheck over 6 years ago

Carl Ollivier-Gooch wrote:

All this is to say that, while I don't have much of a positive opinion about what the function should be called, there are some names that don't match (what I perceive to be) the purpose of the function, and areHandlesValid is squarely in that range.

I agree.

As a suggestion, how about something like:

purgeDeletedEntitiesFromDatabase

I still think that is too implementation-specific. The types of optimizations that a database might want to make after extensive mesh modification (compacting handle space, making entities contiguous in memory, etc.) may not really correspond to purging deleted entities. I'd prefer something much more general. Why not allow implementations to do whatever they see fit to optimize both runtime performance and memory use? For example:

iMesh_optimizeDatabase()

The application should understand that a) it will probably take a while to execute and b) handles will probably be invalidated. And, of course, it should be understood that one would only call this after doing a lot of mesh modification because there really shouldn't be any other time that it would be necessary.

I'm also all for dropping the ability of the current areEHValid to return whether or not handles changed. For portability the application should just assume that they always change.

#17 Updated by Mark Beall over 6 years ago

I vote for Jason's iMesh_optimizeDatabase

#18 Updated by Mark Miller over 6 years ago

Mark Beall wrote:

I vote for Jason's iMesh_optimizeDatabase

I wonder if iMesh_optimizeInstance() might be a little better or just iMesh_optimize(). But, I am honestly happy any of these optimize variants.

#19 Updated by E. Seegyoung Seol over 6 years ago

Mark Miller wrote:

Mark Beall wrote:

I vote for Jason's iMesh_optimizeDatabase

I wonder if iMesh_optimizeInstance() might be a little better or just iMesh_optimize(). But, I am honestly happy any of these optimize variants.

OK, I vote for iMesh_optimize. Furthermore, I don't think we need "/*out*/ int *areHandlesInvariant" for it.

#20 Updated by Mark Miller over 6 years ago

  • Subject changed from iMesh: remove iMesh_areEHValid and add iMesh_garbageCollect() to iMesh: remove iMesh_areEHValid and add iMesh_optimize()

#21 Updated by Jason Kraftcheck over 6 years ago

Mark Miller wrote:

I wonder if iMesh_optimizeInstance() might be a little better or just iMesh_optimize(). But, I am honestly happy any of these optimize variants.

I like iMesh_optimizeDatabase because I think it better conveys the severity of the requested operation. The fact that it acts only on one iMesh instance will be conveyed by the fact that it is passed an instance.

#22 Updated by Jason Kraftcheck over 6 years ago

I think it might also be useful to allow the application to pass arrays of handles which the implementation will update to the corresponding new handles if handles change during the optimization.

My proposed function signature is below. I don't really like using "track" in the names of the array arguments, but I couldn't think if anything better that conveyed that the arrays are not the list of things to optimize. Any suggestions?


    /**\brief Tell an implementation to do whatever internal optimizations
     *        are possible.
     *
     * For performance reasons, in the presence of _modification_ where
     * entities and/or entity sets are added, moved, deleted, etc, implmentations
     * may _defer_ adjustments to their internal data structures until a _batch_
     * of such modifications are completed. This call allows the application to
     * to tell the implementation to go ahead and do whatever work might be
     * necessary to re-adjust their internal data structures. Depending on the
     * implementation as well as the amount and kind of changes to the mesh
     * that have been made prior to calling it, this call may be expensive.
     *
     * Any handles obtained by a caller previous to calling this method shall
     * be assumed to be invalidated.
     *
     * Implementations may choose not to do anything for this.
     *
     * \param instance iMesh instance handle
     * \param set_track_array An array of entity set handles for which, if the 
     *                  implementation changes handles, it will change passed
     *                  input handle to the corresponding new handle.
     * \param set_track_array_length  The length of set_track_array
     * \param ent_track_array An array of entity handles for which, if the 
     *                  implementation changes handles, it will change passed
     *                  input handle to the corresponding new handle.
     * \param ent_track_array_length  The length of ent_track_array
     * \param *err Pointer to error type returned from function
     */
void iMesh_optimizeDatabsase( /* in    */ iMesh_Instance instance,
                              /* inout */ iBase_EntitySetHandle* set_track_array,
                              /* in    */ int set_track_array_length,
                              /* inout */ iBase_EntityHandle* ent_track_array,
                              /* in    */ int ent_track_array_length,
                              /* out   */ int* err );

#23 Updated by Carl Ollivier-Gooch over 6 years ago

I'm on board with this change (not surprisingly, considering that I suggested it).

While I see Jason's point on updating a given collection of set and entity handles, for this to be really useful, the app is going to have to have all of the handles it might want preserved in one place. Is this a practical and useful expectation? I ask because one (presumably) common usage of this function will be to get things cleaned up after some service has done mesh adaptation. The parent app, if it has some set or entity it wants to keep track of, would have to pass those handles in to the adaptation service. This quickly starts to get painful, especially if the app is also using some other service that wants to keep track of some handles for its own reasons.

On an unrelated note, I'd amend the docs that Jason has proposed to specifically note that iterators are also going to be invalidated, and that implementations are responsible for updating set membership and entity/set tag values appropriately. Finally, what should we say about iRel kinds of things? And how would an implementation do anything about that?

#24 Updated by Mark Miller over 6 years ago

Carl Ollivier-Gooch wrote:

I'm on board with this change (not surprisingly, considering that I suggested it).

While I see Jason's point on updating a given collection of set and entity handles, for this to be really useful, the app is going to have to have all of the handles it might want preserved in one place.

I may have mis-understood Jason's proposal but I didn't think the array(s) of handles were supposed to be preserved by the implementation. I thought these arrays of handles were ones the caller wanted fixed (e.g. updated to whatever their possible new value is). This then saves caller having to re-acquire them later. My thought was that as the implementation optimizes its internals, whenever it changes one of the handles being tracked, it then updates it in the array so that when its returned to the caller, he caller gets the new, updated handle.

On an unrelated note, I'd amend the docs that Jason has proposed to specifically note that iterators are also going to be invalidated, and that implementations are responsible for updating set membership and entity/set tag values appropriately.

Sounds reasonable. Though I think we have a fundamental problem with documentation content in iMesh.h. We don't distinguish between user level documentation (e.g. documentation essential to use the iMesh API) and implementation level documentation (e.g. documentation essential to someone implementing iMesh). We could adopt some kind of a tagging mechanism and use the MPI-like phrase notes to implementors to signify implementation level documentation.

#25 Updated by Carl Ollivier-Gooch over 6 years ago

Mark Miller wrote:

Carl Ollivier-Gooch wrote:

I'm on board with this change (not surprisingly, considering that I suggested it).

While I see Jason's point on updating a given collection of set and entity handles, for this to be really useful, the app is going to have to have all of the handles it might want preserved in one place.

I may have mis-understood Jason's proposal but I didn't think the array(s) of handles were supposed to be preserved by the implementation. I thought these arrays of handles were ones the caller wanted fixed (e.g. updated to whatever their possible new value is). This then saves caller having to re-acquire them later. My thought was that as the implementation optimizes its internals, whenever it changes one of the handles being tracked, it then updates it in the array so that when its returned to the caller, he caller gets the new, updated handle.

Yes, that was my understanding, too, and I mistyped. But that doesn't make the fundamental problem go away.

#26 Updated by Jason Kraftcheck over 6 years ago

Carl Ollivier-Gooch wrote:

While I see Jason's point on updating a given collection of set and entity handles, for this to be really useful, the app is going to have to have all of the handles it might want preserved in one place. Is this a practical and useful expectation? I ask because one (presumably) common usage of this function will be to get things cleaned up after some service has done mesh adaptation. The parent app, if it has some set or entity it wants to keep track of, would have to pass those handles in to the adaptation service. This quickly starts to get painful, especially if the app is also using some other service that wants to keep track of some handles for its own reasons.

Not having those arguments doesn't help with that problem either. Do you have a suggestion that better addresses the above issue?

A service like mesh adaptation should never call this function anyway. Presumably the top-level code has some idea of whether or not the services it is calling are modifying the mesh and can decide that that level if it wants to call this function.

#27 Updated by Carl Ollivier-Gooch over 6 years ago

Jason Kraftcheck wrote:

Not having those arguments doesn't help with that problem either. Do you have a suggestion that better addresses the above issue?

A service like mesh adaptation should never call this function anyway. Presumably the top-level code has some idea of whether or not the services it is calling are modifying the mesh and can decide that that level if it wants to call this function.

I'm willing to accept the notion that a service shouldn't call this function (though it'll inevitably happen, no matter how things are documented...).

And I certainly can't come up with a better mechanism for dealing with this problem (and I doubt there is one). The application is probably best off minimizing the data it carries across this call. The challenges in having all the handles in the right place will probably keep the app from getting carried away with passing things in here. (And I'm also guessing that it'll be mostly sets that the app will care about.)

#28 Updated by Carl Ollivier-Gooch over 6 years ago

  • Assignee set to Mark Miller
  • Target version set to 1.4

We're going to go with the original (short) function signature proposed for this, because we can't identify an obvious use case for including the before to after mapping.

Once Mark updates the docs, then the rest of us will have to get to work.

#29 Updated by Mark Miller over 6 years ago

  • Status changed from New to Feedback
  • Estimated time set to 0.50

#30 Updated by Mark Miller over 6 years ago

  • Status changed from Feedback to In Progress
  • % Done changed from 0 to 80

#31 Updated by Mark Miller over 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

I removed iMesh_areEHValid() and added

void iMesh_optimize(
    iMesh_Instance instance,
    int *err);

Also available in: Atom PDF