Quantcast
Channel: Clusters and HPC Technology
Viewing all 936 articles
Browse latest View live

MPI myrank debugging

$
0
0

Hi All,

I am writing to ask some questions related to CFD model result different by mpi_hosts order.
I would like to hear your opinion theoretically becuase the code is long and complex and it would be difficult to reproduce it through the sample codes.

The current situation is that when nodes in different infiniband switches perform parallel computations, case #1 works well, but case #2 doesn't work well.
"Doesn't work well" means that there is a difference in values.

Background: host01-host04 in IB switch#1 and host99 in IB switch#2.
Case#1: host01, host02, host03, host04, host99(i.e. header node is hosts01)
Case#2: host99, host01, host02, host03, host04(i.e. header node is hosts99)

As far as I can guess(It's a hypothetical scenario with no theoretical basis),
1) There are miss communication problems while the header node is on another switch.
2) Myranks are reversed while working on MPI_COMM_RANK several times.
3) There are some problems(broken or mismatch) in MPI_COMM_WORLD.
4) Synchronization excludes header nodes.

First of all, for debugging, I'm putting the print statement in several places to see which subroutine or function changes the value.
(I'll post more when the situation is updated.)
However, no matter what function I finally find, I am not sure it's a part of code-level resolution, so I post to the forum to hear a story about a similar experiences.

Thank you.

 


Correct sourcing of enviroment variables and creation of modulefiles

$
0
0

Hi everyone,

I have installed Intel Parallel Studio XE 2020 on our HPC system, and am planning on enabling it for cluster-wide use via modulefiles. I have read the relevant article on the topic, but it seems a bit outdated at this point. I have found some examples of modulefiles on Github and various HPC sites, but there is not much consistency between them either. The components that will be most accessed are the compilers, the MKL, and the MPI library and compilers.

I have a few questions that might clear this up for me:

  1. What is the proper way of sourcing PSXE? Both of these seem to work:
    • source $PSXE_INSTALL/bin/compilervars.sh -arch intel64 -platform linux
    • source $PSXE_INSTALL/parallel_studio_xe_2020.0.088/bin/psxevars.sh 
  2. I would like to have modulefiles for the different components: compilers, MKL, MPI, daal, etc. What are the specific scripts that are recommended for each of these components?
  3. Is there still no official source for PSXE modulefiles? The best way I have found to generate my own is using the 'createmodule.sh' tool that comes with the environment-modules package, but I would like to clarify points 1 and 2 before doings this work.

I appreciate any assistance you can provide.

Thanks,

Sean

Interrupted system call from gprof

$
0
0

Hi, 

When we compile with '-pg' option, the following message was received during execution.

hfi_userinit: assign_context command failed: Interrupted system call
hfp_gen1_context_open: hfi_userinit: failed, trying again (1/3)
 rank           0 : Hello, World!
 rank           1 : Hello, World!

This causes code performing heavy numerical computations to hang. 

The only related information we can find on this issue is from Intel OPA repo: https://github.com/intel/opa-psm2/issues/28

Here are our system information: 

- Linux 3.10.0-1062.el7.x86_64

- Intel 2019 Update 5 

- hfi1-firmware-0.9-84

We appreciate your insight on how to minimize the interrupted system calls. 

Regards.   

Intel® Parallel Studio for Fortran and C++ Linux* 2020

$
0
0

I would like a clarification on the version of Intel® Parallel Studio for Fortran and C++ Linux* 2020. I would like to be able to compile applications that will run on a cluster of multicore nodes connected via Infiniband. Would the Intel® Parallel Studio XE Professional Edition for Fortran and C++ Linux* (all tools) 2020 work, or do I need the cluster edition?

Thanks

Achilles

dapl async_event: DEV ERR 11

$
0
0

Dear Intel HPC expert

I encountered the following error while calculating fluent, please help me.

d3702:UCM:41a6a:3bd19700: 2114318887 us(1550650213 us!!!):  WARNING: IBV_CLIENT_REREGISTER
d3505:UCM:5544a:b18ce700: 2114475755 us(1550629376 us!!!): dapl async_event: DEV ERR 11
 

Intel MPI issue

$
0
0

I was running a MPI program on Skylake nodes. The program is able to finish on only one node (np=2, ph=2) and reports no error. However, if I run it using two Skylake nodes (np=2, ph=1), I would get the following error: 

rank = 1, revents = 8, state = 8
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 2988: (it_plfd->revents & POLLERR) == 0
internal ABORT - process 0

Weird thing is, all my colleagues using csh can finish the run without reporting this failed assertion, and who are using bash (including me) always get this assertion issue failed at the same line (2988). What could be the potential causes for this type of error? 

 

MPI issue

$
0
0

I was running a program on skylake nodes. If I run it using one node (np=2, ph=2), the program is able to complete successfully. However, if I run it using two nodes (np=2, ph=1), I would get the following assertion failure:

rank = 1, revents = 8, state = 8
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 2988: (it_plfd->revents & POLLERR) == 0
internal ABORT - process 0

Does anyone know what are the possible causes for this type of assertion failure? Weird thing is: all my colleagues who are using csh can run the program reporting no error, but all other colleagues who are using bash (including me) always saw the same issue failed at the same line (2988). 

unable to run linpack with intel2020u0 binary (xhpl_intel64_*)

$
0
0

Hi,
I have installed intel2020u0 on RHEL 7.6 based system having intel 8280M processor.
While running a quick test with linpack binary provided under compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack , i end up with issues. Here is how i setup and run the linpack binary (on single node) -

[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/runme_intel64_static /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/runme_intel64_prv /home/user/COMPILER/MPI/INTELMPI/2020u0/compilers_and_libraries_2020.0.166/linux/mkl/benchmarks/mp_linpack/HPL.dat .
[puneet@node61 BASELINE]$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)
Copyright 2003-2019, Intel Corporation.
[user@node1 BASELINE]$ ls
HPL.dat  runme_intel64_prv  runme_intel64_static  xhpl_intel64_static
[user@node1 BASELINE]$ ./runme_intel64_static
This is a SAMPLE run script.  Change it to reflect the correct number
of CPUs/threads, number of nodes, MPI processes per node, etc..
This run was done on: Wed Apr  8 22:36:04 IST 2020
RANK=1, NODE=1
RANK=0, NODE=0
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(649)......:
MPID_Init(861).............:
MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available)
Abort(1094543) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(649)......:
MPID_Init(861).............:
MPIDI_NM_mpi_init_hook(953): OFI fi_open domain failed (ofi_init.h:953:MPIDI_NM_mpi_init_hook:No data available)
Done: Wed Apr  8 22:36:05 IST 2020
[user@node1 BASELINE]$

now, in the 2020u0 environment, if i remove the xhpl_intel64_static binary and use the one supplied with 2019u5 (HPL 2.3), HPL works fine - 

 

[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2019_U5/compilers_and_libraries_2019.5.281/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static .
[user@node1 BASELINE]$ ./runme_intel64_static
This is a SAMPLE run script.  Change it to reflect the correct number
of CPUs/threads, number of nodes, MPI processes per node, etc..
This run was done on: Wed Apr  8 22:36:40 IST 2020
RANK=0, NODE=0
RANK=1, NODE=1
================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N        :    1000
NB       :     192
PMAP     : Column-major process mapping
P        :       1
Q        :       1
PFACT    :   Right
NBMIN    :       2
NDIV     :       2
RFACT    :   Crout
BCAST    :   1ring
DEPTH    :       0
SWAP     : Binary-exchange
L1       : no-transposed form
U        : no-transposed form
EQUIL    : no
ALIGN    :    8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

node1          : Column=000192 Fraction=0.005 Kernel=    0.00 Mflops=100316.35
node1          : Column=000384 Fraction=0.195 Kernel=65085.04 Mflops=83075.67
node1          : Column=000576 Fraction=0.385 Kernel=39885.67 Mflops=70127.11
node1          : Column=000768 Fraction=0.595 Kernel=17659.92 Mflops=58843.41
node1          : Column=000960 Fraction=0.795 Kernel= 4894.70 Mflops=51756.17
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WC00C2R2        1000   192     1     1               0.01            4.64944e+01
HPL_pdgesv() start time Wed Apr  8 22:36:41 2020

HPL_pdgesv() end time   Wed Apr  8 22:36:41 2020

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0059446 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
Done: Wed Apr  8 22:36:41 IST 2020

 

same is the case with the xhpl binary supplied with intel 2018u4 (HPLv2.1)

[user@node1 BASELINE]$ cp /home/user/COMPILER/MPI/INTELMPI/2018_U4/compilers_and_libraries_2018.5.274/linux/mkl/benchmarks/mp_linpack/xhpl_intel64_static .
[user@node1 BASELINE]$ ./runme_intel64_static
This is a SAMPLE run script.  Change it to reflect the correct number
of CPUs/threads, number of nodes, MPI processes per node, etc..
This run was done on: Wed Apr  8 22:37:48 IST 2020
RANK=0, NODE=0
RANK=1, NODE=1
================================================================================
HPLinpack 2.1  --  High-Performance Linpack benchmark  --   October 26, 2012
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N        :    1000
NB       :     192
PMAP     : Column-major process mapping
P        :       1
Q        :       1
PFACT    :   Right
NBMIN    :       2
NDIV     :       2
RFACT    :   Crout
BCAST    :   1ring
DEPTH    :       0
SWAP     : Binary-exchange
L1       : no-transposed form
U        : no-transposed form
EQUIL    : no
ALIGN    :    8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

node1          : Column=000192 Fraction=0.005 Kernel=    0.00 Mflops=99748.31
node1          : Column=000384 Fraction=0.195 Kernel=67904.30 Mflops=84547.57
node1          : Column=000576 Fraction=0.385 Kernel=39287.97 Mflops=70666.21
node1          : Column=000768 Fraction=0.595 Kernel=18197.26 Mflops=59578.53
node1          : Column=000960 Fraction=0.795 Kernel= 4634.78 Mflops=51930.16
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WC00C2R2        1000   192     1     1               0.01            4.96887e+01
HPL_pdgesv() start time Wed Apr  8 22:37:49 2020

HPL_pdgesv() end time   Wed Apr  8 22:37:49 2020

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0059446 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
Done: Wed Apr  8 22:37:49 IST 2020

 

here is the fi_info putput - 

[user@node1 BASELINE]$ fi_info
provider: mlx
    fabric: mlx
    domain: mlx
    version: 1.5
    type: FI_EP_UNSPEC
    protocol: FI_PROTO_MLX
provider: mlx;ofi_rxm
    fabric: mlx
    domain: mlx
    version: 1.0
    type: FI_EP_RDM
    protocol: FI_PROTO_RXM

also i tested the mpi hello word

[user@node1 BASELINE]$ mpiicc hello.c
[user@node1 BASELINE]$ mpirun -np 2 ./a.out
Hello world from processor node61, rank 0 out of 2 processors
Hello world from processor node61, rank 1 out of 2 processors

Please advice.

MPI program hangs in "MPI_Finalize"

$
0
0

Hi All,

I will explain the current situation and the attached file.

The MPI application performed with LSF is currently debugging due to a problem that does not terminate the operation. Currently, the code level suspects mpi_finalize, and it occurs randomly, not every time, so we need to check more about the occurrence conditions. I inquired about similar symptoms in MPI forum, but the result was not known as post went to the ticket in the middle.
Please check if it is a similar symptom.
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...

- Strace results of MPI executing hosts (I suspected this error "│ + 01:17:43 read(7 ")

----------------------------------------------------------------------------------------
duru0403 has 24 procs as below:

* Name/State       : pmi_proxy / State:    S (sleeping)[m
  PID/PPID         : 141955 / 141954
  Commandline      : **************/apps/intel/18.4/impi/2018.4.274/intel64/bin/pmi_proxy --control-port duru0374:37775 --pmi-connect alltoall --pmi-aggregate -s 0 --rmk lsf --launcher lsf --demux poll --pgid 0 --enable-stdin 1 --retries 10 --control-code 1000390395 --usize -2 --proxy-id -1
  CPU/MEMs_allowed : 0-95 / 0-3
  [<ffffffff96e56e55>] poll_schedule_timeout+0x55/0xb0
  [<ffffffff96e585dd>] do_sys_poll+0x48d/0x590
  [<ffffffff96e587e4>] SyS_poll+0x74/0x110
  [<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
  [<ffffffffffffffff>] 0xffffffffffffffff
  Files            :
     Num of pipes: 26
     Num of sockets: 16
     Num of anon_inodes: 0
  Strace           :
     + /xshared/support/systrace/strace: Process 141955 attached
     + 01:17:43 restart_syscall(<... resuming interrupted poll ...>/xshared/support/systrace/strace: Process 141955 detached
     +  <detached ...>
  Num of subprocs  : 23
  │
  ├─Name/State       : ensda / State:    S (sleeping)[m
  │ PID/PPID         : 141959 / 141955
  │ Commandline      : **************
  │ CPU/MEMs_allowed : 0 / 0-3
  │ [<ffffffff972f5139>] unix_stream_read_generic+0x309/0x8e0
  │ [<ffffffff972f5804>] unix_stream_recvmsg+0x54/0x70
  │ [<ffffffff972186ec>] sock_aio_read.part.9+0x14c/0x170
  │ [<ffffffff97218731>] sock_aio_read+0x21/0x30
  │ [<ffffffff96e404d3>] do_sync_read+0x93/0xe0
  │ [<ffffffff96e40fb5>] vfs_read+0x145/0x170
  │ [<ffffffff96e41dcf>] SyS_read+0x7f/0xf0
  │ [<ffffffff97374ddb>] system_call_fastpath+0x22/0x27
  │ [<ffffffffffffffff>] 0xffffffffffffffff
  │ Files            :
  │    -  > /dev/infiniband/uverbs0
  │    -  > **************/log_proc00324.log
  │    -   /dev/infiniband/uverbs0
  │    Num of pipes: 6
  │    Num of sockets: 5
  │    Num of anon_inodes: 6
  │ Strace           :
  │    + /xshared/support/systrace/strace: Process 141959 attached
  │    + 01:17:43 read(7, /xshared/support/systrace/strace: Process 141959 detached
  │    +  <detached ...>
  │ Num of subprocs  : 0
----------------------------------------------------------------------------------------

- Version Infomaition
   Intel Compiler: 18.5.234
   Intel MPI: 18.4.234
   DAPL: ofa-v2-mlx5_0-1u

- MPI options I used

declare -x I_MPI_DAPL_UD="1"
declare -x I_MPI_FABRICS="dapl"
declare -x I_MPI_HYDRA_BOOTSTRAP="lsf"
declare -x I_MPI_PIN="1"
declare -x I_MPI_PIN_PROCESSOR_LIST="0-5,24-29"
declare -x I_MPI_ROOT="**************/apps/intel/18.4/compilers_and_libraries/linux/mpi"

- And the code I used

After MPI_FINALIZED, there are 5 lines of codes that are if, close, and deallocate command.
Can these cause the hang problem?

! last part of main_program

call fin_common_par

(there is nothing)

endprogram


!!!!!!!!!!!!!!!!!

subroutine fin_common_par
implicit none
integer :: ierr

call mpi_finalize(ierr)
call fin_log

if(allocated(ranks_per_node)) deallocate(ranks_per_node)
if(allocated(stride_ranks))         deallocate(stride_ranks)

return
end subroutine fin_common_par

!!!!!!!!!!!!!!!!!

subroutine fin_log
implicit none

if(logf_unit == closed_unit) return
close(logf_funit)
logf_unit = closed_unit

return
endsubroutine fin_log

!!!!!!!!!!!!!!!!!

Additionaly, How can I get call stack of process like this post
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...

Thank you in advance.

How to get intel compiler support on HPC server

$
0
0

Dear Intel Team

I am a Ph.D. research scholar and qualified for student license(Intel Parallel Studio XE), presently using High Performace Computing (HPC) at IITD, http://supercomputing.iitd.ac.in/

I need help in installing intel compiler with OpenMP support, in my account (nonroot installation) on the HPC server.

I do not have admin/root/sudo access, and there is no internet connectivity on the machine( HPC cannot connect to the internet)

Please guide, what steps do I need to follow,

MS VS 2019 MPI integration issue - include paths

$
0
0

Playing around with Intel Parallel Studio XE 2020 Cluster edition on Windows with MS VS 2019

The Solution has a Fortran PROGRAM project, and an Intel C++ static library Project

Selecting: Intel Performance Libraries, there is no single entry for "Use MPI Library" (without MKL) but one can select under Intel Math Kernel Library, Use Intel MKL=No, Use MPI Library=Intel(R) MPI.

One would expect that #include "mpi.h" would have the include path configured properly, but it does not. I have to explicitly add an include path:

       C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.0.166\windows\mpi\intel64\include

By doing so, my C++ static library builds (finds the #include "mpi.h"). This is problematic in that the property page for include paths must get updated each time I update the compiler.

Jim Dempsey

Intel Fortran 2019 + MPI cause an unexpected Segmentation Fault [Linux]

$
0
0

    Hello,

The following code example compiled with `mpiifort` produces a segfault error:
 

module test_intel_mpi_mod
   implicit none
   integer, parameter :: dp = kind(1.0d0)

   type :: Container
      complex(kind=dp), allocatable :: arr(:, :, :)
   end type

contains
   subroutine test_intel_mpi()
      use mpi_f08, only: &
         MPI_Init_thread, &
         MPI_THREAD_SINGLE, &
         MPI_Finalize, &
         MPI_Comm_rank, &
         MPI_COMM_WORLD, &
         MPI_COMPLEX16, &
         MPI_Bcast

      integer :: provided
      integer :: rank
      type(Container) :: cont

      call MPI_Init_thread(MPI_THREAD_SINGLE, provided)
      call MPI_Comm_rank(MPI_COMM_WORLD, rank)

      allocate(cont % arr(1, 1, 1))

      if (rank == 0) then
         cont % arr(1, 1, 1) = (1.0_dp, 2.0_dp)
      endif

! This works fine --->  call MPI_Bcast(cont % arr(1, 1, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD)
      call MPI_Bcast(cont % arr(:, :, 1), 1, MPI_COMPLEX16, 0, MPI_COMM_WORLD)

      print *, rank, " after Bcast: ", cont % arr(1, 1, 1)
      call MPI_Finalize()
   end subroutine test_intel_mpi
end module test_intel_mpi_mod

program test_mpi
   use test_intel_mpi_mod

   call test_intel_mpi()
end program test_mpi

 

The code is compiled simply as follows: `mpiifort -o test_mpi test_mpi.f90`  and executed as `mpirun -np N ./test_mpi` (N = 1, 2, ...).

The output for N=2 is the following (also `-g -traceback` was added in this case):

           0  after Bcast:  (1.00000000000000,2.00000000000000)
           1  after Bcast:  (1.00000000000000,2.00000000000000)
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
test_mpi           000000000041475A  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AE5C8C0A5D0  Unknown               Unknown  Unknown
test_mpi           000000000040941D  Unknown               Unknown  Unknown
test_mpi           0000000000409D79  Unknown               Unknown  Unknown
test_mpi           00000000004044C0  test_intel_mpi_mo          44  test_mpi.f90
test_mpi           00000000004044E0  MAIN__                     50  test_mpi.f90
test_mpi           0000000000403BA2  Unknown               Unknown  Unknown
libc-2.17.so       00002AE5C913B3D5  __libc_start_main     Unknown  Unknown
test_mpi           0000000000403AA9  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
test_mpi           000000000041475A  Unknown               Unknown  Unknown
libpthread-2.17.s  00002AB3DEFB75D0  Unknown               Unknown  Unknown
test_mpi           000000000040941D  Unknown               Unknown  Unknown
test_mpi           0000000000409D79  Unknown               Unknown  Unknown
test_mpi           00000000004044C0  test_intel_mpi_mo          44  test_mpi.f90
test_mpi           00000000004044E0  MAIN__                     50  test_mpi.f90
test_mpi           0000000000403BA2  Unknown               Unknown  Unknown
libc-2.17.so       00002AB3DF4E83D5  __libc_start_main     Unknown  Unknown
test_mpi           0000000000403AA9  Unknown               Unknown  Unknown

 

The program crashes when it tries to exit the subroutine. The problem seems to be related to passing of the array section, `cont % arr(:, :, 1)`, to MPI_Bcast, as opposed to a reference to the first element, `cont % arr(1, 1, 1)` (this version of the call is left commented in the source code provided). At the same time, my understanding of the standard is that array sections, contiguous or not, are explicitly allowed in MPI 3.x (e.g., see https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node409.htm).

Details in the source are important to reproduce the segfault:

  • The crash happens only if MPI_Bcast is called -- commenting it out prevents the error
  • The subroutine must be in a module
  • The array must be at least 3-dimensional, allocatable, and be contained in a derived type object
  • Non-blocking MPI_Ibcast, as well as other collectives implying broadcast (e.g., Allreduce) give the same result

Compiler/library versions:

Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.1.217 Build 20200306

IntelMPI is from the same build: 2019.7.pre-intel-19.1.0.166-7

Output with I_DEBUG_MPI=6:

[0] MPI startup(): libfabric version: 1.9.0a1-impi
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       278604   l49        {0,1,2,3,4,5,6,7,16,17,18,19,20,21,22,23}
[0] MPI startup(): 1       278605   l49        {8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31}
[0] MPI startup(): I_MPI_ROOT=....
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=6

OS: CentOS Linux release 7.6.1810 (Core)

Kernel: 3.10.0-957.10.1.el7.x86_64

Intel MPI Library install on one node or all?

$
0
0

Hi,

In the "Intel MPI Library for Linux" installation guide (found here), it discusses installing the libraries on one node or multiple nodes of a cluster. How do I know which method to choose? What are the advantages / disadvantages of either kind of installation?

Thanks,

 

SIGFPE with mpiexec.hydra for Intel MPI 2019 update 7

$
0
0

If I use Intel MPI update 7 in a Slurm configuration on two cores on two separate nodes, I get a SIGFPE here (according to gdb on the generated core file):

#0 0x00000000004436ed in ipl_create_domains (pi=0x0, scale=4786482) at ../../../../../src/pm/i_hydra/../../intel/ipl/include/../src/ipl_service.c:2240

This happens only with mpirun / mpiexec.hydra using e.g. "mpirun -n 2 ./test"

I know of 3 workarounds, any of which will let me run this successfully, but I thought maybe you or others should know about this crash:

1. Set I_MPI_PMI_LIBRARY=libpmi2.so and use "srun -n 2 ./test" (with Slurm configured to use pmi2).

2. Use I_MPI_HYDRA_TOPOLIB=ipl

3. Use the "legacy" mpiexec.hydra.

IMPI numa_num Assertion Failed

$
0
0

Hi All - 

I'm getting an error and not quite sure where to begin tracking it down. I'm running a model known to run on our system using:

Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)

The code runs for certain core counts(generally smaller processor counts) but errors for some counts with:

Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2101: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)
  8 Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2101: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)
  9 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7f01f66321d4]
 10 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f01f5dba031]
 11 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c5d) [0x7f01f5f34c5d]
 12 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x1975e4) [0x7f01f5e465e4]
 13 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7f01f5e1ad8e]
 14 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x2886a7) [0x7f01f5f376a7]
 15 /apps/applications/development/compilers/intel/1
 16 Abort(1) on node 2: Internal error
 17 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7f983aed31d4]
 18 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f983a65b031]
 19 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c5d) [0x7f983a7d5c5d]
 20 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x1975e4) [0x7f983a6e75e4]
 21 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7f983a6bbd8e]
 22 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x2886a7) [0x7f983a7d86a7]
 23 /apps/applications/development/compilers/intel/1
 24 Abort(1) on node 3: Internal error
 25 Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2101: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)
 26 Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2101: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)
 27 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7f9ede95a1d4]
 28 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7f9ede0e2031]
 29 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c5d) [0x7f9ede25cc5d]
 30 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x1975e4) [0x7f9ede16e5e4]
 31 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7f9ede142d8e]
 32 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x2886a7) [0x7f9ede25f6a7]
 33 /apps/applications/development/compilers/intel/1
 34 Abort(1) on node 1: Internal error
 35 Assertion failed in file ../../src/mpid/ch4/src/intel/ch4_shm_coll.c at line 2101: node_info->numa_num <= ((MPIDI_SHMGR_SYNCPAGE_SIZE / MPIDI_SHMGR_FLAG_SPACE) - 1)
 36 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPL_backtrace_show+0x34) [0x7ff6ff7ae1d4]
 37 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(MPIR_Assert_fail+0x21) [0x7ff6fef36031]
 38 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x285c5d) [0x7ff6ff0b0c5d]
 39 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x1975e4) [0x7ff6fefc25e4]
 40 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x16bd8e) [0x7ff6fef96d8e]
 41 /apps/applications/development/compilers/intel/19.1/compilers_and_libraries_2020.0.166/linux/mpi/intel64/lib/release/libmpi.so.12(+0x2886a7) [0x7ff6ff0b36a7]
 42 /apps/applications/development/compilers/intel/1
 43 Abort(1) on node 84: Internal error

The code does run correctly with this core configuration when started through SLURM using "srun --mpi=pmi2". Can you provide any guidance? 

The machine this is running on is a dual-socket AMD Epyc 7702 with hyperthreading disabled and Ubuntu 18.04 server

Thanks


HPC Orchestrator integration tests

$
0
0

Hello Marius,

My name is Soman & I am technical support manager for the Intel(R) HPC Orchestrator product. I see that you have filed a ticket via Salesforce (03048154), let's use that channel to communicate further.

Thanks,

MPI_WIN_ALLOCATE_SHARED direct/RMA access

$
0
0

From the MPI 3.1 specification:

This is a collective call executed by all processes in the group of comm. On each
process, it allocates memory of at least size bytes that is shared among all processes in
comm, and returns a pointer to the locally allocated segment in baseptr that can be used
for load/store accesses on the calling process. The locally allocated memory can be the
target of load/store accesses by remote processes; the base pointers for other processes
can be queried using the function MPI_WIN_SHARED_QUERY. The call also returns a
window object that can be used by all processes in comm to perform RMA operations.
The size argument may be di erent at each process and size = 0 is valid. It is the user's
responsibility to ensure that the communicator comm represents a group of processes that
can create a shared memory segment that can be accessed by all processes in the group.

On a single SMP host with multiple ranks it is clear that you can use this to construct a window to a multi-process shared memory buffer that can be accessed (with care) either with direct load/store instructions or by way of RMA operations. Note, each rank/process may have a different virtual address base for the baseptr.

From the MPI 3.1 specification it is stated (implied) that the group of comm must have the capability to access the same physical memory (which may be mapped at different virtual addresses in different processes).

Now as a simplification of my query, consider the situation of say 8 processes running on 2 hosts, 4 processes per host (and the hosts do not have sharable memory between them).

Can all 8 processes issue MPI_WIN_ALLOCATE_SHARED using MPI_COMM_WORLD returning 8 win objects ( 4 per host) with:

4 processes on host 0 having shared memory (and direct access by those processes)
4 processes on host 1 having shared memory (different from host 0, and direct access by those processes)
All 8 processes having RMA access to all processes win window.

What I wish to do is to improve the performance with intra-host access without excluding inter-host access (and not having each process using 2 windows to do this).

Note, I am not currently setup to make this test.

Jim Dempsey

dapl async_event: DEV ERR 11

$
0
0

Dear Intel HPC expert

I encountered the following error while calculating fluent, please help me.

d3702:UCM:41a6a:3bd19700: 2114318887 us(1550650213 us!!!):  WARNING: IBV_CLIENT_REREGISTER
d3505:UCM:5544a:b18ce700: 2114475755 us(1550629376 us!!!): dapl async_event: DEV ERR 11
 

Disconnect VPN causes parallel computing with Intel MPI to stop

$
0
0

Hi:

Our software product is based on Intel MPI for parallel computing on Windows.

Recently many of our customers encounter this error. Due to COVID-19, they all work at home with VPN connection to the office.

They run our software for parallel computing at home, but when they disconnect VPN, the parallel computing is stopped at the same time.

 

I can reproduce the error by the following steps:

1. Run IMB-MPI1.exe with the command:   mpiexec.exe -localonly -n 4 C:\test\IMB-MPI1.exe

2. While IMB-MPI1.exe is still running, I disable any of the network interfaces (I have 3 NICs, 2 are created by VMware, 1 is physical NIC.) and got the following errors

       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]
            0         1000         0.05         0.06         0.06
            1         1000         0.77         0.79         0.78
            2         1000         0.86         0.87         0.86
            4         1000         0.86         0.90         0.88
            8         1000         0.78         0.80         0.79
           16         1000         1.01         1.15         1.08
           32         1000         1.18         1.22         1.20
           64         1000         0.87         0.90         0.88
          128         1000         1.31         1.36         1.34
          256         1000         1.29         1.35         1.31
          512         1000         1.52         1.57         1.55
         1024         1000         1.45         1.47         1.46
         2048         1000         2.57         2.77         2.67
         4096         1000         3.60         4.05         3.88
         8192         1000         4.99         5.31         5.13
        16384         1000         8.44         8.74         8.52
        32768         1000        14.06        14.34        14.14
        65536          640        35.06        35.86        35.47
       131072          320        59.00        67.31        63.18
       262144          160       156.66       167.80       161.57
       524288           80       869.78       896.27       880.66
      1048576           40      2402.85      2564.92      2484.02
      2097152           20      4692.22      4907.47      4789.06
[mpiexec@PCAcer144006] ..\hydra\pm\pmiserv\pmiserv_cb.c (863): connection to proxy 0 at host PCAcer144006 failed
[mpiexec@PCAcer144006] ..\hydra\tools\demux\demux_select.c (103): callback returned error status
[mpiexec@PCAcer144006] ..\hydra\pm\pmiserv\pmiserv_pmci.c (520): error waiting for event
[mpiexec@PCAcer144006] ..\hydra\ui\mpich\mpiexec.c (1157): process manager error waiting for completion

C:\Program Files\Intel MPI 2018\x64>

 

Is there any workaround ?  Thank you

 

boost::mpi linker error by using intel MPI

$
0
0

Dear experts

I confront the problem in linking error using boost::mpi when my executable is linked to my static library (which contains boost::mpi) I constructed.
The error is almost due to undefined reference to such as ompi_mpi_comm_null, which seems types in open mpi.

I know intel MPI is based on MPICH so that referring such a type is strange.

 

Through compiling the library, I used intel compiler (mpiicc & mpiicpc).

If you know how to resolve it, I would appreciate it if you could advice me.

Viewing all 936 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>