Quantcast
Channel: Clusters and HPC Technology
Viewing all 936 articles
Browse latest View live

IntelMPI Intel19 update 4 error

$
0
0

Dear All
Good afternoon
I successfully installed intel parallel studio 19 update 4 on my cluster based on Ubuntu 18.04 LTS
The cluster is composed by 4 nodes: a master and 3 other nodes where I run my calculations.
I am able to run calculations on the master only or on the nodes only or togheter.
But when I try to ask for master+ one of the nodes I receive this message error:

Abort(543240207) on node 7 (rank 7 in comm 0): Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(452)...................: MPI_Bcast(buf=0x5b7cee0, count=10, MPI_INTEGER, root=7, comm=MPI_COMM_WORLD) failed
PMPI_Bcast(438)...................:
MPIDI_SHMGR_Gather_generic(391)...:
MPIDI_NM_mpi_bcast(161)...........:
MPIR_Bcast_intra_tree(227)........: Failure during collective
MPIR_Bcast_intra_tree(219)........:
MPIR_Bcast_intra_tree_generic(180): Failure during collective

And also when I run the MPI-Benchmarks as :
mpirun -hosts master,node1 -n 2 -ppn 1 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1

I receive this error message

Abort(609312527) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)........................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=1, new_comm=0x6de6e4) failed
PMPI_Comm_split(489)........................:
MPIR_Comm_split_impl(167)...................:
MPIR_Allgather_intra_auto(145)..............: Failure during collective
MPIR_Allgather_intra_auto(141)..............:
MPIR_Allgather_intra_recursive_doubling(126):
MPIC_Sendrecv(344)..........................:
MPID_Isend(662).............................:
MPID_isend_unsafe(282)......................:
MPIDI_OFI_send_lightweight_request(106).....:
(unknown)(): Other MPI error

I tried also to install parallel studio 19 update 5 but the problem is still the same

All the best
Lorenzo

TCE Level: 

TCE Open Date: 

Tuesday, January 21, 2020 - 21:47

Process tracing

$
0
0

Hello,

I am going to test the difference between mpirun and mpiexec on LSF.
The following manual says that mpirun is related to LSF.
(https://software.intel.com/en-us/mpi-developer-reference-linux-mpirun)

1. I would like to see what physical process mpirun and mpiexec use when working through LSF.
   Can the "-trace" command confirm which physical processes are being used?

2. Since there is a case where mpiexec is faster, I want to use mpiexec instead of mpirun.
    Is there a disadvantage of not being recognized by job scheduler?

Thanks

intel mpi 2019.5 - is libfabric extensible, or can another libfabric library be used

$
0
0

I'm working on a proof-of-concept for a libfabric provider for a piece of hardware not currently supported. I find that Intel MPI does not appear to accept a new provider library for a "fred" provider named libfred-fi.so. The library would seem to be recognized, and FI_PROVIDER=fred looks to work, but the actual implementation complains "set the FI_PROVIDER=fred", which is already done. I've been examining provders "sockets" and "verbs" and these environment variables seem to do the job (I can see the expected performance differences between 1GigE Ethernet with sockets and 40GigE and 100GigE Ethernet interfaces and sockets and verbs on these 40G/100G interfaces).

Assuming I can't actually use the libfabric provided with Intel MPI, I have not been able to get Intel MPI to make use of a libfabric outside of the Intel implementation. The application would seem to be friendly with the new library, but fails to make any connections. I have attempted to use libfabric-1.7.2 with my 2019.5 installation to no avail.

The project preference is to use Intel MPI, but this new provider is kinda the reason for doing this work. Our reference MPI application has issues with MPICH and OpenMPI, so we're motivated to keep at Intel MPI.

Invalid communicator issue with PMPI_Allreduce

$
0
0

Operating system and version: CentOS Linux release 7.5.1804
Intel MPI version: 2019.5.281
Compiler and version: 19.0.5.281
Fabric: Mellanox Technologies MT27500
Libfabric version: 1.7.2

Would anyone be able to help me with an "invalid communicator" error I've been getting with Intel MPI plus Intel compilers (not present with OpenMPI plus GNU or Intel compilers) in one subroutine in a large code?

I receive the error when I use MPI_ALLREDUCE in this subroutine, but if I replace it with an MPI_REDUCE followed by an MPI_BCAST the code works fine. There are many other instances of MPI_ALLREDUCE in other subroutines that seem to work fine. The snippet that works:

     CALL MPI_REDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_REDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_REDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEX,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEY,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)
     CALL MPI_BCAST(FEZ,NATOMX,MPI_REAL8,0,COMM_CHARMM,IERROR)

The snippet that causes the error:

     CALL MPI_ALLREDUCE(EX,FEX,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_ALLREDUCE(EY,FEY,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)
     CALL MPI_ALLREDUCE(EZ,FEZ,NATOMX,MPI_REAL8,MPI_SUM,0, &
       COMM_CHARMM,IERROR)

After setting I_MPI_DEBUG=6, I_MPI_HYDRA_DEBUG=on, the error message is:

Abort(1007228933) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Allreduce: Invalid communicator, error stack:
PMPI_Allreduce(434): MPI_Allreduce(sbuf=0x2b5015d1b6c0, rbuf=0x2b5004ff8740, count=1536, datatype=dtype=0x4c000829, op=MPI_SUM, comm=comm=0x0) failed
PMPI_Allreduce(355): Invalid communicator

The problem persists while using only a single core with mpirun. The initial MPI debug output then is:

$ mpirun -ppn 1 -n 1 ../build/cmake/charmm-bug -i c45test/dcm-ti.inp

[mpiexec@pc-beethoven.cluster] Launch arguments: /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host pc-beethoven.cluster --upstream-port 36326 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=init pmi_version=1 pmi_subversion=1
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_maxes
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=4096
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_appnum
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=appnum appnum=0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[0] MPI startup(): libfabric version: 1.7.2a-impi
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get_my_kvsname
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=my_kvsname kvsname=kvs_24913_0
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=put kvsname=kvs_24913_0 key=bc-0 value=mpi#0200ADFEC0A864030000000000000000$
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=barrier_in
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=barrier_out
[proxy:0:0@pc-beethoven.cluster] pmi cmd from fd 6: cmd=get kvsname=kvs_24913_0 key=bc-0
[proxy:0:0@pc-beethoven.cluster] PMI response: cmd=get_result rc=0 msg=success value=mpi#0200ADFEC0A864030000000000000000$
[0] MPI startup(): Rank    Pid      Node name             Pin cpu
[0] MPI startup(): 0       24917    pc-beethoven.cluster  {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_CC=icc
[0] MPI startup(): I_MPI_CXX=icpc
[0] MPI startup(): I_MPI_F90=ifort
[0] MPI startup(): I_MPI_F77=ifort
[0] MPI startup(): I_MPI_ROOT=/opt/intel-2019/compilers_and_libraries_2019.5.281/linux/mpi
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_DEBUG=on
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=6

Note that I also tried using FI_PROVIDER=sockets, with the same result. Any ideas?

it is not using two IB’s and the stats.ipm file is empty

$
0
0

Hi,

i tried this one CFX job but it is not using two IB’s , i used following setting, 

export I_MPI_TMPDIR=/scratch/pwc/asd0392/tmp

export I_MPI_DEBUG=5

export I_MPI_FABRICS=shm:ofa

export I_MPI_FALLBACK=0

export I_MPI_OFA_NUM_ADAPTERS=2

export I_MPI_OFA_NUM_PORTS=1

export I_MPI_STATS=ipm

export I_MPI_STATS_FILE=stats.ipm

export I_MPI_HYDRA_BOOTSTRAP=lsf

export I_MPI_HYDRA_BRANCH_COUNT=${numhost}

export I_MPI_LSF_USE_COLLECTIVE_LAUNCH=1

export I_MPI_MPIRUN_CLEANUP=yes

export I_MPI_HYDRA_CLEANUP=yes

 

how to use infiniband dual-rail ?

 

regards

deepak

IMPI Creates Large Files on Startup

$
0
0

Operating system and version: CentOS Linux release 7.5.1804
Intel MPI version: 2019.5.281
Compiler and version: 19.0.5.281
Fabric: Mellanox Technologies MT27500
Libfabric version: 1.7.2

I have a quick question regarding Intel MPI and large (> 1GB) files created by MPI at runtime. We maintain part of a large code and the standard test suite for this code imposes a limit on file size to keep the tests small. File size is restricted using "limit filesize 1024m" (csh), but I also tested with "limit -f 1024000" (BASH) with the same results. When I start mpirun for any code requesting more than a single core, I apparently exceed this filesize limit and the code crashes. A minimal example:

   program hello
   include 'mpif.h'
   integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)

   call MPI_INIT(ierror)
   call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
   call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
   print*, 'node', rank, ': Hello world'
   call MPI_FINALIZE(ierror)
   end

This runs fine when called as follows:

#!/bin/bash
ulimit -f 1024000
nproc=1
mpirun -n $nproc -ppn $nproc ./mpi-test

It also runs fine without the ulimit and with nproc increased, but crashes with the ulimit and nproc >= 2 with the following error:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 78358 RUNNING AT pc-beethoven.cluster
=   KILLED BY SIGNAL: 25 (File size limit exceeded)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 78359 RUNNING AT pc-beethoven.cluster
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

As this minimal example doesn't read or write any files, I guess the files must belong to mpirun. Is this "normal" behavior with IMPI (something I need to contact the main developers about), or does it indicate a problem with our IMPI installation?

Thanks in advance for any help!

one-sided communication and shared memory (single process)

$
0
0

Hello all,

I have come across the following problem with one-sided communication using MPI_ACCUMULATE. The versions are:
ifort (IFORT) 19.0.3.199 20190206
Intel(R) MPI Library for Linux* OS, Version 2019 Update 3 Build 20190214 (id: b645a4a54)

The attached program does a very basic calculation using one-sided communication with MPI_ACCUMULATE (and MPI_WIN_FENCE to synchronize). Compile it with

mpif90 test.f donothing.f

The program accepts a command line argument. For example,

mpiexec -np 1 ./a.out 10

simply runs the calculation ten times (on a single process).

When I run the program, it crashes with a segmentation fault in MPI_WIN_FENCE if the argument is larger than 8615. (Or around that number.) But only if one (!) process is used. For any other number of processes, the program run is successful!

When I set FI_PROVIDER to tcp (unset before), the behavior is different: Then, the program run gets stuck for an argument larger than 12, and for very large arguments, the program crashes with "Fatal error in PMPI_Win_fence: Other MPI error".

(The dummy routine "donothing" is a substitution for "mpi_f_sync_reg", which does not exist in this version of IntelMPI.)

Thank you.

Best wishes
Christoph

AttachmentSize
Downloadapplication/octet-streamtest.f2.13 KB
Downloadapplication/octet-streamdonothing.f43 bytes

Error importing shell functions

$
0
0

We use Intel MPI by default on our clusters, and we've recently run into a problem which, I think, is due to some problem with the way it passes environment variables through to the tasks.

If you export a shell function in bash:

function hw() { echo "Hello world"; }
export -f hw

That is available to child processes, and can be seen with env:

$ env | grep -A1 hw
BASH_FUNC_hw()=() {  echo 'Hello world!'
}

Using `mpirun` with this set, without any modifications to the environment, gives the following error:

bash: hw: line 1: syntax error: unexpected end of file
bash: error importing function definition for `BASH_FUNC_hw'

We see this in 2018.3 and 2019.4, the most recent versions we have installed on our clusters. I tried to get a better handle on what it was doing with strace but I couldn't manage to find how it was going wrong -- like I said above, I'm guessing the way the environment variables are passed through wasn't written with these functions in mind and can't handle them.

Has anyone else seen this problem? I'd guess the ability to export shell functions doesn't get much use, but the module command in Environment Modules 4.4.x does it, so I would have thought it would come up elsewhere.

TCE Level: 

TCE Open Date: 

Wednesday, February 12, 2020 - 02:18

Intel_Parallel_Studio_XE_Cluster_Edition_2019_Update3_Linux

$
0
0

CentOS 7 (1810)

To uninstall the Intel® Parallel Studio XE on Linux* OS, use the shell scripts uninstall.sh or
uninstall_GUI.sh located at:
<install-dir>/parallel_studio_xe_2019.x.xxx

But there are no uninstall.sh or uninstall_GUI.sh scripts in this directory.

So how do I uninstall a Parallel Studio XE?

And further.

5.1 Getting Started
The document is located at:
<install_dir>/documentation_2019/en/ps2019/getstart_*.htm
 

But however, the getstart_*.htm is not there.

TCE Level: 

TCE Open Date: 

Wednesday, February 12, 2020 - 05:36

MPI_HYRRA_BOOTSTRAP issue

$
0
0

Hello,

I am submitting a job through LSF Scheduler, and ssh setting is blocked with the nologin setting.
I am trying to connect LSF blaunch command, then I found that options associated with LSF.
Changing the I_MPI_HYDRA_BOOTSTRAP option from ssh to lsf seems to be solved.
But I tested the next four intel mpi, but I feel like it doesn't apply properly to the two x-marked mpi libraries.

2018.4.274: O
2019.2.187: X
2019.4.243: X
2019.5.281: O

Let me know if I'm missing something.

The options that I tried is below.

export I_MPI_HYDRA_BOOTSTRAP=lsf
export I_MPI_HYDRA_BOOTSTRAP_EXEC=lsf
export I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=lsf
export I_MPI_HYDRA_RMK=lsf

And the error messesgs is below.

check_exit_codes (../hydra_demux_poll.c): unable to run proxy on hostname
poll_for_event (): check exit codes error
HYD_dmx_poll_wait_for_proxy_event (): poll for event error
HYD_bstrap_setup (): error waiting for event
main (): error setting up the boostrap proxies

Thanks

TCE Open Date: 

Thursday, February 20, 2020 - 02:22

ntel_Parallel_Studio_XE_Cluster_Edition_2019_Update4_Linux_ubuntu18.04

$
0
0

Sir

the installed version of ifort is 19.0.4.243 on ubuntu 18.04.  the intel parallel studio xe (clusters) 2019 is the version

i am getting difficulty while compiling ARPACk-NG. It was downloaded from https://github.com/opencollab/arpack-ng/releases

i am following commands mentioned in read.md file to compile ARPACK-NG using CMake functionality:
   
    $ mkdir build
    $ cd build
    $ cmake -D EXAMPLES=ON -D MPI=ON -D BUILD_SHARED_LIBS=ON ..
    $ make
    $ make install

it is expected to build everything including examples and parallel support (with MPI).

-- Detecting Fortran/C Interface - Failed to compile
CMake Warning (dev) at /usr/share/cmake-3.10/Modules/FortranCInterface.cmake:309 (message):
  No FortranCInterface mangling known for sgemm

  The Fortran compiler:

    /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpiifort

  and the C compiler:

    /opt/intel/compilers_and_libraries_2019.4.243/linux/bin/intel64/icc

  failed to compile a simple test project using both languages.

my bashfile has

   source /opt/intel/compilers_and_libraries_2019/linux/bin/compilervars.sh intel64
    source /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/bin/mklvars.sh intel64 ilp64 mod

  export CMAKE_INCLUDE_PATH=/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/include
 export CMAKE_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64:/opt/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64
 export LD_LIBRARY_PATH=$CMAKE_LIBRARY_PATH:$LD_LIBRARY_PATH

    export MKLROOT=/opt/intel/mkl
       export FCFLAGS=-i8
     export FFLAGS="-i8"
   
   export CC=$(which icc)
    export CXX=$(which icpc)
    export FC=$(which mpiifort)   

kindly let me know where i am making a mistake.

thanks

ab

TCE Open Date: 

Sunday, February 23, 2020 - 02:46

Visual Studio 2017 and Intel MPI Library, Compilation error, Fortran

$
0
0

After updating Intel® Parallel Studio XE Cluster Edition for Windows to version 2020.0.166, the VS project  cannot compile:

1>C:\Users\koyno\source\repos\wave1\wave1\wave1.f90(3): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [MPI]
1>C:\Users\koyno\source\repos\wave1\wave1\wave1.f90(30): error #6683: A kind type parameter must be a compile-time constant.   [MPI_OFFSET_KIND]
1>C:\Users\koyno\source\repos\wave1\wave1\wave1.f90(35): error #6404: This name does not have a type, and must have an explicit type.   [MPI_COMM_WORLD]

... etc.

Configured the project according to the following instructions:

https://software.intel.com/ru-ru/mpi-developer-guide-windows-configuring...

In the attached materials added Build log.

 

Thanks, 

Vitaly

 

AttachmentSize
Downloadimage/pngbuild.png89.81 KB

TCE Open Date: 

Tuesday, February 25, 2020 - 01:06

error #7002: Error in opening the compiled module file. Check INCLUDE paths

$
0
0

hi

I am compiling a source code not developed by me but is available freely.  Hence I am not expert in the internal functioning and intricacies. Being an end user, my purpose is to compile and run.

My ubuntu is 18.04, ifort  19.0.4.243   and codes use parallel support/ mpi, 64 byte interface.

my compilation using mpiifort leads to following errors

/source/mpi-ci-diag/Modules/Dispatcher/Dispatcher_module.F90(57): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SLEPCMATRIX_MODULE]
    use SLEPCMatrix_module,               only: SLEPCMatrix, initialize_slepc
--------^
/source/mpi-ci-diag/Modules/Dispatcher/Dispatcher_module.F90(58): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SLEPCDIAGONALIZER_MODULE]
    use SLEPCDiagonalizer_module,         only: SLEPCDiagonalizer
--------^
/source/mpi-ci-diag/Modules/Dispatcher/Dispatcher_module.F90(62): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SCALAPACKMATRIX_MODULE]
    use SCALAPACKMatrix_module,           only: SCALAPACKMatrix
--------^

source/mpi-ci-diag/Modules/Dispatcher/Dispatcher_module.F90(57): error #6580: Name in only-list does not exist or is not accessible.   [SLEPCMATRIX]
    use SLEPCMatrix_module,               only: SLEPCMatrix, initialize_slepc

..

...

..

compilation aborted for /source/mpi-ci-diag/Modules/Dispatcher/Dispatcher_module.F90 (code 1)
source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/build.make:158: recipe for target 'source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/Modules/Dispatcher/Dispatcher_module.F90.o' failed
make[3]: *** [source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/Modules/Dispatcher/Dispatcher_module.F90.o] Error 1
source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/build.make:174: recipe for target 'source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/Modules/Dispatcher/Dispatcher_module.F90.o.provides' failed
make[2]: *** [source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/Modules/Dispatcher/Dispatcher_module.F90.o.provides] Error 2
CMakeFiles/Makefile2:1119: recipe for target 'source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/all' failed
make[1]: *** [source/mpi-ci-diag/CMakeFiles/mpi-scatci.dir/all] Error 2
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2

 

i found out Dispatcher_module.F90.o is not created. however many other .o files are created in different folders. what is the source of this error ? is it that i am missing some paths/ include statements/libraries in environment settings ? in my bashrc file , i have mentioned following settings.

 source /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpivars.sh
   source /opt/intel/compilers_and_libraries_2019/linux/bin/compilervars.sh intel64
 
  source /opt/intel/parallel_studio_xe_2019.4.070/compilers_and_libraries_2019/linux/bin/ifortvars.sh intel64
    source /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/bin/mklvars.sh intel64 ilp64 mod

i have not linked any mpi libraries/ include commands. whereas i am using miifort in compilation

any help will be highly appreciated. pl don't mind I am not an expert in this field.

thank you

ab

 

TCE Open Date: 

Saturday, March 7, 2020 - 23:18

How to manage disk access after splitting the communicator

$
0
0

Hi,

I have been using MPI  succesfully without splitting the group of threads into subgroups. Reading and writing of files have been handled by

a subroutine that allows serial access to the hard disk by the threads (see attached subroutine).

However, as I have an excess of threads for my application, I therefore want to use subgroups by use of  MPI_GROUP_SPLIT. in order

to get an additional speed increase. Can my treatment in the attached subroutine be generalized in some way in order to handle disk access?

Best regards

Anders S

 

TCE Open Date: 

Sunday, March 8, 2020 - 09:20

MPI_Get failed in fortran [Please help me]

$
0
0

I just used mpi one-sided communication to write a simple demo, which is process 1 just fetch an integer from process 0.

here is the code

program main
    include "mpif.h"
    integer a(10), b(10)
    integer :: myid, numprocs, ierr
    integer :: win, data_win
    integer :: k, p
    integer size_of_window

    call MPI_INIT(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, numprocs, ierr)

    !print*, win
    if (myid .eq. 0) then
        call mpi_win_Create(a, 4*10, 4, mpi_info_null, MPI_COMM_WORLD,win,ierr)
    else
        call mpi_win_Create(null, 0, 1, mpi_info_null, MPI_COMM_WORLD,win,ierr)
    endif
    print*, win

    if (myid .eq. 0) then
        do i=1,10
            a(i) = 99
        enddo
    endif

    if(myid .ne. 0) then
        !print*, win
        call MPI_Win_lock(MPI_LOCK_SHARED,0,0,win,ierr)
        call MPI_Get(b,10,mpi_integer,0,0,10,mpi_integer,win,ierr)
        call MPI_Win_unlock(0,win,ierr)
        print *,"err", ierr

        call MPI_Win_free(win, ierr)
        print *,"myid ", myid, "get message", b
    else
        !print*, win
        call MPI_Win_free(win, ierr)
    endif
    call MPI_Barrier(MPI_COMM_WORLD, ierr)
end program

but it doesn't work at all, I try so many ways it just don't reach the print message line, please can someone tell me what's going on

 

TCE Open Date: 

Friday, March 6, 2020 - 19:00

Intel® Parallel Studio XE installations on HPC cluster environmentt

$
0
0

Dear intel team and users, I was trying to install the intel parallel studio cluster 2019 edition on my university HPC cluster. In fact, on head-node, we already had this intel product and later we added the 6 nodes to the main node. Now, we have 6 nodes cluster with a head node. I have proceeded according to the intel installation guide. But setup failed error came and a log file was created (attached here).  The setup was successfully picking up the nodes from machines.LINUX file and we were doing this as a root user. The only thing which makes me doubtful is that setup was looking for /opt/intel on nodes while on the head node it was installed on some other directory (/export/installs/) but this was a common and shared directory. Any help in this regard will be highly appreciated.

I did the test to check the presence of ifort on nodes, interestingly sometimes it correctly prints the path and load the intel on nodes and sometimes it produces the error! Can someone please guide me on the procedure and what I am doing wrong here?

 

Regards,

Haseeb Ahmad

TCE Level: 

TCE Open Date: 

Monday, March 9, 2020 - 02:27

Intel MKL performance degrade a lot when I combine it with openMPI

$
0
0

I am using the intel math kernel library to write my algorithm and I set the number of threads to 16. My program can work well. However, when I tried to combine the MKL with MPI and run my program with 

mpirun -n 1 ./MMNET_MPI

I think this will give me the same result as I directly run my program as the following.

./MMNET_MPI

However, the performance of my program degrades a lot when I used 16 threads and the activate threads are only 2 or 3. I am not sure what the problem is. The part of my MKL program is as the following. 

void LMMCPU::multXXTTrace(double *out, const double *vec) const {

  double *snpBlock = ALIGN_ALLOCATE_DOUBLES(Npad * snpsPerBlock);
  double (*workTable)[4] = (double (*)[4]) ALIGN_ALLOCATE_DOUBLES(omp_get_max_threads() * 256 * sizeof(*workTable));

  // store the temp result
  double *temp1 = ALIGN_ALLOCATE_DOUBLES(snpsPerBlock);
  for (uint64 m0 = 0; m0 < M; m0 += snpsPerBlock) {
    uint64 snpsPerBLockCrop = std::min(M, m0 + snpsPerBlock) - m0;
#pragma omp parallel for
    for (uint64 mPlus = 0; mPlus < snpsPerBLockCrop; mPlus++) {
      uint64 m = m0 + mPlus;
      if (projMaskSnps[m])
        buildMaskedSnpCovCompVec(snpBlock + mPlus * Npad, m,
                                 workTable + (omp_get_thread_num() << 8));
      else
        memset(snpBlock + mPlus * Npad, 0, Npad * sizeof(snpBlock[0]));
    }

    for (uint64 iter = 0; iter < estIteration; iter++) {
      // compute A=X^TV
      MKL_INT row = Npad;
      MKL_INT col = snpsPerBLockCrop;
      double alpha = 1.0;
      MKL_INT lda = Npad;
      MKL_INT incx = 1;
      double beta = 0.0;
      MKL_INT incy = 1;
      cblas_dgemv(CblasColMajor,
                  CblasTrans,
                  row,
                  col,
                  alpha,
                  snpBlock,
                  lda,
                  vec + iter * Npad,
                  incx,
                  beta,
                  temp1,
                  incy);

      // compute XA
      double beta1 = 1.0;
      cblas_dgemv(CblasColMajor, CblasNoTrans, row, col, alpha, snpBlock, lda, temp1, incx, beta1, out + iter * Npad,
                  incy);

    }

  }
  ALIGN_FREE(snpBlock);
  ALIGN_FREE(workTable);
  ALIGN_FREE(temp1);
}

 

TCE Level: 

TCE Open Date: 

Friday, March 13, 2020 - 03:29

Can't run job with impi

$
0
0

Dear all:

I am currently using INTEL COMPILER and IMPI to run the CICE numerical model. But I failed every single time. According to the runlog, the model returns the error message "rank 0 in job 2  node01_44414   caused collective abort of all ranks exit status of rank 0: killed by signal 11" I treid to searched this error meassage but unfortunately, I do not know much about MPI and have no idea about how to debug the error. One thing I can make sure is that CICE is a widely used numerical model and I don't think there is any major bug in the code which brings this error. 

So is there anyone may provide some insight about this error or tell me which information i should provide  to locate this error??

THANKS 

TCE Level: 

TCE Open Date: 

Thursday, March 12, 2020 - 03:59

PS 2020 + IVF + MS VS 2019 + MPI

$
0
0

Could you please add to the VS IDE Project integration a property pull down for Language (or wherever) an option to configure for MPI.
Also add a command prompt launch that configures for MPI

An additional feature to consider (which may be too much to ask) is for the MS VS IDE to be able to Debug launch an MPI application and automatically open a debug window for each process (even if this must be restricted to the localmachine). While this can be done manually via Debug | Attach to Process, (then select all of the launched processes) it is a bit cumbersome. This works, but I also need to add helper code to the application:

    logical, volatile :: DebugWait = .true.
    
    call MPI_INIT(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, numprocs, ierr)

    do while(DebugWait)
        CALL sleepqq(100) ! for all processes set DebugWait = .false.
    END DO

This too is a bit cumbersome.

Following this, it would be nice too if the developer could have (a means to) perform debugging to all processes (as well as the one in focus). Note, currently one must: Debug | Break All, then walk through each image setting the local DebugWait=.false., then walk through each image adding/removing break points. It would be handy to have a means to apply action to all images (this will require all images be the same .exe)

 

TCE Level: 

TCE Open Date: 

Thursday, March 19, 2020 - 09:09

MPI Isend/Irecv Bottleneck

$
0
0

Dear all,

I have some questions related to MPI Isend/Recv bottleneck.
Below is my subroutine named "Broadcast_boundary":

      DO NP=1,NPROCS-1
        IF(MYRANK==NP-1)THEN
         CALL MPI_ISEND( ARRAY_1D(L/2-NUMBER),NUMBER,MPI_REAL,NP  ,101,MPI_COMM_WORLD,IREQ1,IERR)
         CALL MPI_IRECV( ARRAY_1D(L/2)       ,NUMBER,MPI_REAL,NP  ,102,MPI_COMM_WORLD,IREQ2,IERR)
         CALL MPI_WAIT(IREQ1,STATUS1,IERR)
         CALL MPI_WAIT(IREQ2,STATUS2,IERR)
        ELSEIF(MYRANK==NP)THEN
         CALL MPI_ISEND( ARRAY_1D(L/2)       ,NUMBER,MPI_REAL,NP-1,102,MPI_COMM_WORLD,IREQ1,IERR)
         CALL MPI_IRECV( ARRAY_1D(L/2-NUMBER),NUMBER,MPI_REAL,NP-1,101,MPI_COMM_WORLD,IREQ2,IERR)
         CALL MPI_WAIT(IREQ1,STATUS1,IERR)
         CALL MPI_WAIT(IREQ2,STATUS2,IERR)
        ENDIF
      ENDDO

The code is designed to communicate the boundary data between np-1 and np from 0 to nprocs.
And here is my sample program:

      L=20000; NUM=500
      ALLOCATE(A(L),B(L))
      CALL RANDOM(A)
      CALL RANDOM(B)

      CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
      TIC=MPI_WTIME() 
      CALL BROADCAST_BOUNDARY(A,NUM)
      TOC=MPI_WTIME()
      MPI_WTIMES(1)=TOC-TIC
   
      CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
      TIC=MPI_WTIME() 
      CALL BROADCAST_BOUNDARY(B,NUM)
      TOC=MPI_WTIME()
      MPI_WTIMES(2)=TOC-TIC

As far as I am concerned, the mpi_wtimes(1) (the elapsed time to communicate an array A) and mpi_wtimes(2) (array B) will be nearly same because the size of A and B are equal.
But after several experiments, I found that mpi_wtimes(1) took about eight times more time than mpi_wtimes(2).

Please let me know if there's an initial process in MPI communication, or if there's something on the first communication that accelerated the next performance, or if my code needs to be improved.
I'll attach the entire code I tested. If you can test it and let me know the results, I think many questions will be answered.

 

AttachmentSize
Downloadapplication/octet-streamPROGRAM.f901.93 KB

TCE Level: 

TCE Open Date: 

Sunday, March 22, 2020 - 15:03
Viewing all 936 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>