SGI Techpubs Library

Linux  »  Man Pages
find in page

       intro_shmem - Introduction to the SHMEM programming model

       The  SHMEM(TM)  programming  model  consists  of library functions that
       provide low-latency, high-bandwidth communication  for  use  in  highly
       parallelized scalable programs. The functions in the SHMEM  application
       programming interface (API) provide a programming model for  exchanging
       data between cooperating parallel processes. The resulting programs are
       similar in style to Message Passing Interface (MPI) programs. The SHMEM
       API  can  be  used either alone or in combination with MPI functions in
       the same parallel program. SGI's SHMEM implementation is compliant with
       the OpenSHMEM(TM) 1.0 standard specification.

       A  SHMEM program is SPMD (single program, multiple data) in style.  The
       SHMEM processes, called processing elements or PEs, all  start  at  the
       same  time, and they all run the same program.  Usually the PEs perform
       computation  on  their  own  subdomains  of  the  larger  problem,  and
       periodically  communicate  with  other  PEs  to exchange information on
       which the next computation phase depends.

       The SHMEM functions minimize the overhead associated with data transfer
       requests,  maximize bandwidth, and minimize data latency.  Data latency
       is the period of time that starts when a PE  initiates  a  transfer  of
       data and ends when a PE can use the data.

       SHMEM  functions  support  remote data transfer through put operations,
       which transfer data to a different PE, get operations,  which  transfer
       data  from  a  different  PE,  and  remote pointers, which allow direct
       references to data objects  owned  by  another  PE.   Other  operations
       supported    are    collective   broadcast   and   reduction,   barrier
       synchronization,  and  atomic  memory  operations.   An  atomic  memory
       operation    is   an   atomic  read-and-update  operation,  such  as  a
       fetch-and-increment, on a remote or local data object.

   SHMEM Routines
        This section lists the significant SHMEM message-passing functions.

       *  PE queries:

            C/C++ only:
                      _num_pes(3I), _my_pe(3I)

            Fortran only:
                      NUM_PES(3I), MY_PE(3I)

       *  Elemental data put functions:

            C/C++ only:
                      shmem_double_p,       shmem_float_p,        shmem_int_p,
                      shmem_long_p, shmem_short_p

       *  Block data put functions:

            C/C++ and Fortran:
                      shmem_put32, shmem_put64, shmem_put128

            C/C++ only:
                      shmem_double_put,     shmem_float_put,    shmem_int_put,
                      shmem_long_put, shmem_short_put

            Fortran only:
                      shmem_complex_put, shmem_integer_put, shmem_logical_put,

       *  Elemental data get functions:

            C/C++ only:
                      shmem_double_g,        shmem_float_g,       shmem_int_g,
                      shmem_long_g, shmem_short_g

       *  Block data get functions:

            C/C++ and Fortran:
                      shmem_get32, shmem_get64, shmem_get128

            C/C++ only:
                      shmem_double_get,    shmem_float_get,     shmem_int_get,
                      shmem_long_get, shmem_short_get

            Fortran only:
                      shmem_complex_get, shmem_integer_get, shmem_logical_get,

       *  Strided put functions:

            C/C++ and Fortran:
                      shmem_iput32, shmem_iput64, shmem_iput128

            C/C++ only:
                      shmem_double_iput,   shmem_float_iput,   shmem_int_iput,
                      shmem_long_iput, shmem_short_iput

            Fortran only:
                      shmem_complex_iput,                  shmem_integer_iput,
                      shmem_logical_iput, shmem_real_iput

       *  Strided get functions:

            C/C++ and Fortran:
                      shmem_iget32, shmem_iget64, shmem_iget128

            C/C++ only:
                      shmem_double_iget,   shmem_float_iget,   shmem_int_iget,
                      shmem_long_iget, shmem_short_iget

            Fortran only:
                      shmem_complex_iget,                  shmem_integer_iget,
                      shmem_logical_iget, shmem_real_iget

       *  Point-to-point synchronization functions:

            C/C++ only:
                      shmem_int_wait,  shmem_int_wait_until,  shmem_long_wait,
                      shmem_long_wait_until,              shmem_longlong_wait,
                      shmem_longlong_wait_until,             shmem_short_wait,

            Fortran:  shmem_int4_wait, shmem_int4_wait_until, shmem_int8_wait,

       *  Barrier synchronization functions:

            C/C++ and Fortran:
                      shmem_barrier_all, shmem_barrier

       *  Atomic memory fetch-and-operate (fetch-op) functions:

            C/C++ and Fortran:

       *  Reduction functions:

            C/C++ only:
                      shmem_int_and_to_all,             shmem_long_and_to_all,
                      shmem_longlong_and_to_all,       shmem_short_and_to_all,
                      shmem_double_max_to_all,         shmem_float_max_to_all,
                      shmem_int_max_to_all,             shmem_long_max_to_all,
                      shmem_longdouble_max_to_all,  shmem_longlong_max_to_all,
                      shmem_short_max_to_all,         shmem_double_min_to_all,
                      shmem_float_min_to_all,            shmem_int_min_to_all,
                      shmem_long_min_to_all,      shmem_longdouble_min_to_all,
                      shmem_longlong_min_to_all,       shmem_short_min_to_all,
                      shmem_double_sum_to_all,         shmem_float_sum_to_all,
                      shmem_int_sum_to_all,             shmem_long_sum_to_all,
                      shmem_longdouble_sum_to_all   shmem_longlong_sum_to_all,
                      shmem_short_sum_to_all,        shmem_double_prod_to_all,
                      shmem_float_prod_to_all,          shmem_int_prod_to_all,
                      shmem_long_prod_to_all,    shmem_longdouble_prod_to_all,
                      shmem_longlong_prod_to_all,     shmem_short_prod_to_all,
                      shmem_int_or_to_all,               shmem_long_or_to_all,
                      shmem_longlong_or_to_all,         shmem_short_or_to_all,
                      shmem_int_xor_to_all,             shmem_long_xor_to_all,
                      shmem_longlong_xor_to_all, shmem_short_xor_to_all

            Fortran only:
                      shmem_int4_and_to_all,            shmem_int8_and_to_all,
                      shmem_real4_max_to_all,          shmem_real8_max_to_all,
                      shmem_real16_max_to_all,          shmem_int4_max_to_all,
                      shmem_int8_max_to_all,           shmem_real4_min_to_all,
                      shmem_real8_min_to_all,         shmem_real16_min_to_all,
                      shmem_int4_min_to_all,            shmem_int8_min_to_all,
                      shmem_real4_sum_to_all,          shmem_real8_sum_to_all,
                      shmem_real16_sum_to_all,          shmem_int4_sum_to_all,
                      shmem_int8_sum_to_all,          shmem_real4_prod_to_all,
                      shmem_real8_prod_to_all,       shmem_real16_prod_to_all,
                      shmem_int4_prod_to_all,          shmem_int8_prod_to_all,
                      shmem_int4_or_to_all,              shmem_int8_or_to_all,
                      shmem_int4_xor_to_all, shmem_int8_xor_to_all

       *  Broadcast functions:

            C/C++ and Fortran:
                      shmem_broadcast32, shmem_broadcast64

       *  Generalized barrier synchronization function:

            C/C++ and Fortran:

       *  Cache management functions:

            C/C++ and Fortran:
                      shmem_udcflush, shmem_udcflush_line

       *  Byte-granularity block put functions:

            C/C++ and Fortran:
                      shmem_putmem and shmem_getmem

            Fortran only:
                      shmem_character_put and shmem_character_get

       *  Collect functions:

            C/C++ and Fortran:
                      shmem_collect32,    shmem_collect64,   shmem_fcollect32,

       *  Atomic memory fetch-and-operate (fetch-op) functions:

            C/C++ only:
                      shmem_double_swap,  shmem_float_swap,   shmem_int_cswap,
                      shmem_int_fadd,      shmem_int_finc,     shmem_int_swap,
                      shmem_long_cswap,   shmem_long_fadd,    shmem_long_finc,
                      shmem_long_swap,                   shmem_longlong_cswap,
                      shmem_longlong_fadd,                shmem_longlong_finc,

            Fortran only:
                      shmem_int4_cswap,    shmem_int4_fadd,   shmem_int4_finc,
                      shmem_int4_swap,   shmem_int8_swap,    shmem_real4_swap,
                      shmem_real8_swap, shmem_int8_cswap

       *  Atomic memory operation functions:

            Fortran only:
                      shmem_int4_add, shmem_int4_inc

       *  Remote memory pointer function:

            C/C++ and Fortran:

       *  Accessibility query functions:

            C/C++ and Fortran:
                      shmem_pe_accessible, shmem_addr_accessible

   Symmetric Data Objects
       Consistent  with  the SPMD nature of the SHMEM programming model is the
       concept of symmetric data objects.  These are arrays or variables  that
       exist  with  the  same  size,  type,  and  relative address on all PEs.
       Another term for symmetric data objects is  "remotely  accessible  data
       objects."   In  the  interface  definitions  for  SHMEM  data  transfer
       functions, one or more of the parameters are typically required  to  be
       symmetric or remotely accessible.

       The following kinds of data objects are symmetric:

       *  Fortran  data  objects  in common blocks or with the SAVE attribute.
          These data objects must not be defined in a  dynamic  shared  object

       *  Non-stack  C  and  C++  variables.   These  data objects must not be
          defined in a DSO.

       *  Fortran arrays allocated with shpalloc(3F)

       *  C and C++ data allocated by shmalloc(3C)

   Collective Routines
       Some   SHMEM   functions,   for   example,    shmem_broadcast(3)    and
       shmem_float_sum_to_all(3),   are  classified  as  collective  functions
       because they distribute work across a set of PEs.  They must be  called
       concurrently  by  all  PEs  in  the active set defined by the PE_start,
       logPE_stride,  PE_size  argument  triplet.   The  following  man  pages
       describe the SHMEM collective functions:

       *  shmem_and(3)

       *  shmem_barrier(3)

       *  shmem_broadcast(3)

       *  shmem_collect(3)

       *  shmem_max(3)

       *  shmem_min(3)

       *  shmem_or(3)

       *  shmem_prod(3)

       *  shmem_sum(3)

       *  shmem_xor(3)

   Using the Symmetric Work Array, pSync
       Multiple pSync arrays are often needed if a particular PE calls a SHMEM
       collective function twice without intervening barrier  synchronization.
       Problems would occur if some PEs in the active set for call 2 arrive at
       call 2 before processing of call 1 is complete by all PEs in the call 1
       active  set.   You  can  use shmem_barrier() or shmem_barrier_all(3) to
       perform a barrier synchronization between consecutive  calls  to  SHMEM
       collective functions.

       There are two special cases:

       *  The shmem_barrier(3) function allows the same pSync array to be used
          on consecutive calls as long as the active PE set does not change.

       *  If the same collective function is called multiple  times  with  the
          same  active  set, the calls may alternate between two pSync arrays.
          The SHMEM functions  guarantee  that  a  first  call  is  completely
          finished by all PEs by the time processing of a third call begins on
          any PE.

       Because the SHMEM functions restore pSync  to  its  original  contents,
       multiple  calls that use the same pSync array do not require that pSync
       be reinitialized after the first call.

       This section describes environment variables  that  control  the  SHMEM
       programming  environment.   Environment variables identified as toggles
       may be set on or off.  A  setting  of  "ON"  is  indicated  when  these
       environment  variables are set to any of the following case-insensitive
       values: "ON", "YES", "Y", or "1".  A setting of "OFF" is  indicated  by
       any of the following case-insensitive values: "OFF", "NO", "N", or "0".

   Symmetric Heap Related Environment Variables
       The default behavior of the symmetric heap  can be modified  using  the
       following environment variable:

              Specifies  the  size, in bytes, of the symmetric heap memory per

              Default: the total machine memory in bytes divided by the number
              of processors on the system.

   Debugging Related Environment Variables
       Several  environment  variables  are  available  to assist in debugging
       SHMEM applications:

       SMA_DEBUG (toggle)
              Prints out copious data at job startup and during job  execution
              about SHMEM internal operations.

              Default:  Not enabled

       SMA_MALLOC_DEBUG (toggle)
              Activates  debug  checking  of  the  symmetric  heap.  With this
              variable set, the symmetric heap is checked for consistency upon
              each  invocation  of a symmetric heap related function.  Setting
              this variable significantly increases  the  overhead  associated
              with symmetric heap management operations.

              Default:  Not enabled

              When  enabled,  the  symmetric  memory allocation functions will
              check that equal addresses are returned on all PEs.  The program
              terminates   with  error  message  if  different  addresses  are
              returned.   See  shmalloc(3)  for  more  details  about  how  to
              configure   your   system   for  equal  symmetric  heap  address

              Default:  Not enabled

       SMA_VERSION (toggle)
              Prints the libsma library release version.

              Default:  Not enabled

   Memory Placement Related Environment Variables
       The available MPI memory  placement  environment  variables  should  be
       used.  See MPI(1) and omplace(1) for more details.

   Installing SHMEM software
       The  SHMEM  software is packaged with the Message Passing Toolkit (MPT)
       software product. You can find installation  instructions  in  the  MPT
       release notes.  See the README.relnotes file, the pathname of which can
       be found by typing rpm -ql sgi-mpt | grep README.relnotes.

   Compiling SHMEM Programs
       The SHMEM  library  functions  reside  in   SHMEM  programs
       require  the  MPI  runtime  library or  In the
       following examples, the -lmpi option is specified, but  -lmpi_mt  could
       be   used   in   its  place  if  the  MPI/SHMEM  application  used  the
       MPI_THREAD_MULTIPLE multithreading level.

       The following sample command lines compile programs that include  SHMEM

            cc c_program.c -lsma -lmpi

            gfortran fortran_program.f -lsma -lmpi

            ifort fortran_program.f -lsma -lmpi

   Running SHMEM Programs
       The  SHMEM programming model is layered on MPI infrastructure. Programs
       are started with an mpirun or mpiexec_mpt command, as in the  following

            mpirun -np 32 ./a.out

            mpiexec_mpt -np 32 ./a.out

            mpirun hostA, hostB -np 16 ./a.out

   Supported Systems
       The  SHMEM  API  is  supported  for  clusters connected via NUMAlink or

       SHMEM support is layered on MPI infrastructure.  The MPI memory mapping
       feature,  which  is  enabled  by default, is required for SHMEM support
       within a host or between NUMAlink-connected hosts.   In  addition,  the
       xpmem kernel module must be installed.

       SHMEM programs are started with an mpirun command, which determines the
       number of processing elements (PEs) to launch.

   MPI Interoperability
       SHMEM functions can be used in conjunction  with  MPI  message  passing
       functions  in  the  same  application.   Programs that use both MPI and
       SHMEM functions should call MPI_Init and MPI_Finalize but omit the call
       to  the start_pes function.  SHMEM PE numbers are equal to the MPI rank
       within  the  MPI_COMM_WORLD  environment  variable.   Note  that   this
       precludes  use  of  SHMEM  functions  between  processes  in  different
       MPI_COMM_WORLDs.   MPI  processes  started  using  the   MPI_Comm_spawn
       function,  for  example, cannot use SHMEM functions to communicate with
       their parent MPI processes.

       In MPI jobs that use TCP/sockets for interhost communication,  you  can
       use  SHMEM  functions to communicate with processes running on the same
       host.  Use the shmem_pe_accessible function to determine if a remote PE
       is accessible via SHMEM communication from the local PE.

       When  running  an  MPI application involving multiple executable files,
       one can use SHMEM functions to communicate with processes running  from
       the same or different executable files, provided that the communication
       is limited to symmetric data objects.  It is  important  to  note  that
       static  memory, such as a Fortran common block or C global variable, is
       symmetric between processes running from the same executable file,  but
       is  not  symmetric  between processes running from different executable
       files.  Data allocated from the symmetric heap (shmalloc  or  shpalloc)
       is  symmetric  across  the same or different executable files.  Use the
       shmem_addr_accessible function to  determine  if  a  local  address  is
       accessible via SHMEM communication from a remote PE.

       Note  that  the  shmem_pe_accessible  function returns TRUE only if the
       remote PE is a process running from the same  executable  file  as  the
       local  PE,  indicating  that  full  SHMEM  support  (static  memory and
       symmetric heap) is available.

       When using SHMEM functions within an MPI program, one  should  use  the
       MPI  memory  placement  environment  variables  when  using non-default
       memory placement options.

   Thread Safety
       None of the SHMEM communication functions, including  shmem_ptr  should
       be  considered  to  be  thread  safe.   When  used  in  a multithreaded
       environment, the programmer should take steps to ensure  that  multiple
       threads  in  a  PE  cannot  simultaneously  invoke  SHMEM communication

   Cache Coherency
       The SHMEM library was originally developed for systems that had limited
       cache coherency memory architectures.  On those architectures, it is at
       times necessary to handle cache coherency within the application.  This
       is  not  required  on SGI systems because cache coherency is handled by
       the hardware.

       The SHMEM cache management functions were retained for ease in  porting
       from these legacy platforms.  However, their use is no longer required.

       Note that cache coherency does not imply memory ordering,  particularly
       with  respect to put operations.  In cases in which the ordering of put
       operations is important,  one  must  use  either  the  memory  ordering
       functions  shmem_fence  or  shmem_quiet,  or one of the various barrier

       Example 1.  The following Fortran SHMEM program directs all PEs to  sum
       simultaneously the numbers in the VALUES variable across all PEs:

             REAL VALUES, SUM
             COMMON /C/ VALUES
             REAL WORK
             CALL START_PES(0)
             VALUES = MY_PE()
             CALL SHMEM_BARRIER_ALL       ! Synchronize all PEs
             SUM = 0.0
             DO I = 0,NUM_PES()-1
                CALL SHMEM_REAL_GET(WORK, VALUES, 1, I)   ! Get next value
                SUM = SUM + WORK                          ! Sum it
             PRINT*,'PE ',MY_PE(),' COMPUTED SUM=',SUM

       Example  2.   The  following  C  SHMEM program transfers an array of 10
       longs from PE 0 to PE 1:

             #include <stdio.h>
             #include <mpp/shmem.h>
             int main()
                 long source[10] = { 1, 2, 3, 4, 5,
                                     6, 7, 8, 9, 10 };
                 static long target[10];
                 if (_my_pe() == 0) {
                      /* put 10 elements into target on PE 1 */
                      shmem_long_put(target, source, 10, 1);
                 shmem_barrier_all();  /* sync sender and receiver */
                 if (_my_pe() == 1) {
                      printf("target[0] on PE %d is %ld\n", _my_pe(),
                 return 0;

       dplace(1), omplace(1)

       The following man pages also contain information  on  SHMEM  functions.
       See the specific man pages for implementation information.

       shmem_add(3),   shmem_and(3),  shmem_barrier(3),  shmem_barrier_all(3),
       shmem_broadcast(3), shmem_cache(3),  shmem_collect(3),  shmem_cswap(3),
       shmem_fadd(3),     shmem_fence(3),     shmem_finc(3),     shmem_get(3),
       shmem_iget(3),     shmem_inc(3),     shmem_iput(3),      shmem_lock(3),
       shmem_max(3), shmem_min(3), shmem_my_pe(3), shmem_or(3), shmem_prod(3),
       shmem_put(3),   shmem_quiet(3),   shmem_short_g(3)    shmem_short_p(3),
       shmem_sum(3),      shmem_swap(3),      shmem_wait(3),     shmem_xor(3),
       shmem_pe_accessible(3), shmem_addr_accessible(3), start_pes(3)



       MY_PE(3I), NUM_PES(3I)

       For  information  on  using  SHMEM  functions  with   message   passing
       functions, see the Message Passing Toolkit (MPT) User's Guide.


Output converted with man2html

home/search | what's new | help