Bug #339

migration seg. fault after mutiple part distribution

Added by Seegyoung Seol over 5 years ago. Updated over 5 years ago.

Status:ResolvedStart date:01/09/2012
Priority:NormalDue date:
Assignee:Seegyoung Seol% Done:

0%

Category:-Spent time:-
Target version:v1.3.8

Description

migration procedure doesn't run after multiple part distribution.

To replicate:

*********************************************
[INPUT]
distributed mesh? NO # parts per proc: 2 # migration steps: 10
do ghosting? NO
write mesh files? NO

MPIHOME/bin/mpirun -np 8 singlePart_parallel ../meshes/cube_16K.sms out.sms 0 10 2 0 0

16_512_slacTDR_RGN - parma output from run on Avatar: mpirun -np 16 <mesh> 32 1 0 0 0 10 0 0 0 (837 KB) Cameron Smith, 02/02/2012 04:16 pm


Related issues

Related to PUMI_MESH (FMDB) - Support #328: Partition model information Resolved 12/21/2011
Related to ParMA - Bug #348: mesh queries fail after migration New 01/18/2012

History

#1 Updated by Seegyoung Seol over 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Seegyoung Seol
  • Target version set to v1.3.8

#2 Updated by Seegyoung Seol over 5 years ago

The quick fix for this error is put in r3423.

However, this issue is still open since we've noticed seg. fault in some cases (test/parallel/main.cc which does aggressive testing by doing migration with randomly picked partition object while incresing #parts from 1 to 19 gradually.

Please be noted that the API for FMDB_Mesh_Migr changed now to take both "global" source part id and "global" dest. part id for each partition object (entity and p-set).

#3 Updated by Cameron Smith over 5 years ago

With ParMA r133 ran on a 512 part mesh, 32 parts per process on 16 processes, a seg fault occurs at line 788 of mPart.cc in the call to FMDB_Mesh_DspNumEnt(...) on line 179 after migration. Attached is a log file from the run. FMDB r3428 was compiled in debug mode.

The input mesh is here:
/bigtmp/qlue/meshes/tdr/16parts/PM_partition_mesh_out_[0-15].sms

Thank-you,
Cameron

#4 Updated by Seegyoung Seol over 5 years ago

  • Status changed from In Progress to Resolved

Multi-part migration is now in service. To use multi-part, call

int FMDB_Mesh_SetNumPart (pMeshMdl mesh, int numPart);

before migration or global partitioning. For more information, please refer to the User's Guide or the sample code located in $FMDB_SRC/test/parallel.

Also, please let us know any further errors and issues you encounter. Thanks for your patience.

Also available in: Atom PDF