SGI Techpubs Library

The new home for SGI documentation is the SGI Customer Portal, This site will be redirected to the new location later this month.

Linux  »  Man Pages
find in page

       intro_shmem - Introduction to the SHMEM programming model

       The  OpenSHMEM(TM) programming model consists of library functions that
       provide low-latency, high-bandwidth communication  for  use  in  highly
       parallelized   scalable   programs.  The  functions  in  the  OpenSHMEM
       application programming interface (API) provide a programming model for
       exchanging  data  between cooperating parallel processes. The resulting
       programs are similar  in  style  to  Message  Passing  Interface  (MPI)
       programs.  The OpenSHMEM API can be used either alone or in combination
       with  MPI  functions  in  the  same  parallel  program.   SGI's   SHMEM
       implementation   is  compliant  with  the  OpenSHMEM(TM)  1.2  standard

       A SHMEM program is SPMD (single program, multiple data) in style.   The
       SHMEM  processes,  called  processing elements or PEs, all start at the
       same time, and they all run the same program.  Usually the PEs  perform
       computation  on  their  own  subdomains  of  the  larger  problem,  and
       periodically communicate with other  PEs  to  exchange  information  on
       which the next computation phase depends.

       The SHMEM functions minimize the overhead associated with data transfer
       requests, maximize bandwidth, and minimize data latency.  Data  latency
       is  the  period  of  time that starts when a PE initiates a transfer of
       data and ends when a PE can use the data.

       SHMEM functions support remote data transfer  through  put  operations,
       which  transfer  data to a different PE, get operations, which transfer
       data from a different PE,  and  remote  pointers,  which  allow  direct
       references  to  data  objects  owned  by  another PE.  Other operations
       supported   are   collective   broadcast   and    reduction,    barrier
       synchronization,  and  atomic  memory  operations.   An  atomic  memory
       operation   is  an  atomic  read-and-update  operation,   such   as   a
       fetch-and-increment, on a remote or local data object.

   SHMEM Routines
        This section lists the significant SHMEM message-passing functions.

       *  PE queries:

            C/C++ only:
                      shmem_n_pes(3I), shmem_my_pe(3I)

            Fortran only:
                      SHMEM_N_PES(3I), SHMEM_MY_PE(3I)

       *  Elemental data put functions:

            C/C++ only:
                      shmem_double_p,        shmem_float_p,       shmem_int_p,
                      shmem_long_p, shmem_short_p

       *  Block data put functions:

            C/C++ and Fortran:
                      shmem_put32, shmem_put64, shmem_put128

            C/C++ only:
                      shmem_double_put,    shmem_float_put,     shmem_int_put,
                      shmem_long_put, shmem_short_put

            Fortran only:
                      shmem_complex_put, shmem_integer_put, shmem_logical_put,

       *  Elemental data get functions:

            C/C++ only:
                      shmem_double_g,       shmem_float_g,        shmem_int_g,
                      shmem_long_g, shmem_short_g

       *  Block data get functions:

            C/C++ and Fortran:
                      shmem_get32, shmem_get64, shmem_get128

            C/C++ only:
                      shmem_double_get,     shmem_float_get,    shmem_int_get,
                      shmem_long_get, shmem_short_get

            Fortran only:
                      shmem_complex_get, shmem_integer_get, shmem_logical_get,

       *  Strided put functions:

            C/C++ and Fortran:
                      shmem_iput32, shmem_iput64, shmem_iput128

            C/C++ only:
                      shmem_double_iput,   shmem_float_iput,   shmem_int_iput,
                      shmem_long_iput, shmem_short_iput

            Fortran only:
                      shmem_complex_iput,                  shmem_integer_iput,
                      shmem_logical_iput, shmem_real_iput

       *  Strided get functions:

            C/C++ and Fortran:
                      shmem_iget32, shmem_iget64, shmem_iget128

            C/C++ only:
                      shmem_double_iget,   shmem_float_iget,   shmem_int_iget,
                      shmem_long_iget, shmem_short_iget

            Fortran only:
                      shmem_complex_iget,                  shmem_integer_iget,
                      shmem_logical_iget, shmem_real_iget

       *  Point-to-point synchronization functions:

            C/C++ only:
                      shmem_int_wait,  shmem_int_wait_until,  shmem_long_wait,
                      shmem_long_wait_until,              shmem_longlong_wait,
                      shmem_longlong_wait_until,             shmem_short_wait,

            Fortran:  shmem_int4_wait, shmem_int4_wait_until, shmem_int8_wait,

       *  Barrier synchronization functions:

            C/C++ and Fortran:
                      shmem_barrier_all, shmem_barrier

       *  Atomic memory fetch-and-operate (fetch-op) functions:

            C/C++ and Fortran:

       *  Reduction functions:

            C/C++ only:
                      shmem_int_and_to_all,             shmem_long_and_to_all,
                      shmem_longlong_and_to_all,       shmem_short_and_to_all,
                      shmem_double_max_to_all,         shmem_float_max_to_all,
                      shmem_int_max_to_all,             shmem_long_max_to_all,
                      shmem_longdouble_max_to_all,  shmem_longlong_max_to_all,
                      shmem_short_max_to_all,         shmem_double_min_to_all,
                      shmem_float_min_to_all,            shmem_int_min_to_all,
                      shmem_long_min_to_all,      shmem_longdouble_min_to_all,
                      shmem_longlong_min_to_all,       shmem_short_min_to_all,
                      shmem_double_sum_to_all,         shmem_float_sum_to_all,
                      shmem_int_sum_to_all,             shmem_long_sum_to_all,
                      shmem_longdouble_sum_to_all   shmem_longlong_sum_to_all,
                      shmem_short_sum_to_all,        shmem_double_prod_to_all,
                      shmem_float_prod_to_all,          shmem_int_prod_to_all,
                      shmem_long_prod_to_all,    shmem_longdouble_prod_to_all,
                      shmem_longlong_prod_to_all,     shmem_short_prod_to_all,
                      shmem_int_or_to_all,               shmem_long_or_to_all,
                      shmem_longlong_or_to_all,         shmem_short_or_to_all,
                      shmem_int_xor_to_all,             shmem_long_xor_to_all,
                      shmem_longlong_xor_to_all, shmem_short_xor_to_all

            Fortran only:
                      shmem_int4_and_to_all,            shmem_int8_and_to_all,
                      shmem_real4_max_to_all,          shmem_real8_max_to_all,
                      shmem_real16_max_to_all,          shmem_int4_max_to_all,
                      shmem_int8_max_to_all,           shmem_real4_min_to_all,
                      shmem_real8_min_to_all,         shmem_real16_min_to_all,
                      shmem_int4_min_to_all,            shmem_int8_min_to_all,
                      shmem_real4_sum_to_all,          shmem_real8_sum_to_all,
                      shmem_real16_sum_to_all,          shmem_int4_sum_to_all,
                      shmem_int8_sum_to_all,          shmem_real4_prod_to_all,
                      shmem_real8_prod_to_all,       shmem_real16_prod_to_all,
                      shmem_int4_prod_to_all,          shmem_int8_prod_to_all,
                      shmem_int4_or_to_all,              shmem_int8_or_to_all,
                      shmem_int4_xor_to_all, shmem_int8_xor_to_all

       *  Broadcast functions:

            C/C++ and Fortran:
                      shmem_broadcast32, shmem_broadcast64

       *  Generalized barrier synchronization function:

            C/C++ and Fortran:

       *  Byte-granularity block put functions:

            C/C++ and Fortran:
                      shmem_putmem and shmem_getmem

            Fortran only:
                      shmem_character_put and shmem_character_get

       *  Collect functions:

            C/C++ and Fortran:
                      shmem_collect32,   shmem_collect64,    shmem_fcollect32,

       *  Atomic memory fetch-and-operate (fetch-op) functions:

            C/C++ only:
                      shmem_double_swap,   shmem_float_swap,  shmem_int_cswap,
                      shmem_int_fadd,     shmem_int_finc,      shmem_int_swap,
                      shmem_long_cswap,    shmem_long_fadd,   shmem_long_finc,
                      shmem_long_swap,                   shmem_longlong_cswap,
                      shmem_longlong_fadd,                shmem_longlong_finc,

            Fortran only:
                      shmem_int4_cswap,   shmem_int4_fadd,    shmem_int4_finc,
                      shmem_int4_swap,    shmem_int8_swap,   shmem_real4_swap,
                      shmem_real8_swap, shmem_int8_cswap

       *  Atomic memory operation functions:

            Fortran only:
                      shmem_int4_add, shmem_int4_inc

       *  Remote memory pointer function:

            C/C++ and Fortran:

       *  Accessibility query functions:

            C/C++ and Fortran:
                      shmem_pe_accessible, shmem_addr_accessible

   Symmetric Data Objects
       Consistent with the SPMD nature of the SHMEM programming model  is  the
       concept  of symmetric data objects.  These are arrays or variables that
       exist with the same size and  types  on  all  PEs.   Another  term  for
       symmetric  data  objects is "remotely accessible data objects."  In the
       interface definitions for SHMEM data transfer functions, one or more of
       the  parameters  are  typically  required  to  be symmetric or remotely

       The following kinds of data objects are symmetric:

       *  Fortran data objects in common blocks or with  the  SAVE  attribute.
          These  data  objects  must not be defined in a dynamic shared object

       *  Non-stack C and C++ variables.   These  data  objects  must  not  be
          defined in a DSO.

       *  Fortran arrays allocated with shpalloc(3F)

       *  C and C++ data allocated by shmalloc(3C)

   Collective Routines
       Some    SHMEM    functions,   for   example,   shmem_broadcast(3)   and
       shmem_float_sum_to_all(3),  are  classified  as  collective   functions
       because  they distribute work across a set of PEs.  They must be called
       concurrently by all PEs in the active  set  defined  by  the  PE_start,
       logPE_stride,  PE_size  argument  triplet.   The  following  man  pages
       describe the SHMEM collective functions:

       *  shmem_and(3)

       *  shmem_barrier(3)

       *  shmem_broadcast(3)

       *  shmem_collect(3)

       *  shmem_max(3)

       *  shmem_min(3)

       *  shmem_or(3)

       *  shmem_prod(3)

       *  shmem_sum(3)

       *  shmem_xor(3)

   Using the Symmetric Work Array, pSync
       Multiple pSync arrays are often needed if a particular PE calls a SHMEM
       collective  function twice without intervening barrier synchronization.
       Problems would occur if some PEs in the active set for call 2 arrive at
       call 2 before processing of call 1 is complete by all PEs in the call 1
       active set.  You can use  shmem_barrier()  or  shmem_barrier_all(3)  to
       perform  a  barrier  synchronization between consecutive calls to SHMEM
       collective functions.

       There are two special cases:

       *  The shmem_barrier(3) function allows the same pSync array to be used
          on consecutive calls as long as the active PE set does not change.

       *  If  the  same  collective function is called multiple times with the
          same active set, the calls may alternate between two  pSync  arrays.
          The  SHMEM  functions  guarantee  that  a  first  call is completely
          finished by all PEs by the time processing of a third call begins on
          any PE.

       Because  the  SHMEM  functions  restore pSync to its original contents,
       multiple calls that use the same pSync array do not require that  pSync
       be reinitialized after the first call.

       This  section  describes  environment  variables that control the SHMEM
       programming environment.  Environment variables identified  as  toggles
       may  be  set  on  or  off.   A  setting of "ON" is indicated when these
       environment variables are set to any of the following  case-insensitive
       values:  "ON",  "YES", "Y", or "1".  A setting of "OFF" is indicated by
       any of the following case-insensitive values: "OFF", "NO", "N", or "0".

   Symmetric Heap Related Environment Variables
       The  default  behavior of the symmetric heap  can be modified using the
       following environment variable:

              Specifies the size, in bytes, of the symmetric heap  memory  per

              Default: the total machine memory in bytes divided by the number
              of processors on the system.

   Debugging Related Environment Variables
       Several environment variables are  available  to  assist  in  debugging
       SHMEM applications:

       SMA_DEBUG (toggle)
              Prints  out copious data at job startup and during job execution
              about SHMEM internal operations.

              Default:  Not enabled

       SMA_MALLOC_DEBUG (toggle)
              Activates debug checking  of  the  symmetric  heap.   With  this
              variable set, the symmetric heap is checked for consistency upon
              each invocation of a symmetric heap related  function.   Setting
              this  variable  significantly  increases the overhead associated
              with symmetric heap management operations.

              Default:  Not enabled

              When enabled, the symmetric  memory  allocation  functions  will
              check that equal addresses are returned on all PEs.  The program
              terminates  with  error  message  if  different  addresses   are
              returned.    See  shmalloc(3)  for  more  details  about  how to
              configure  your  system  for  equal   symmetric   heap   address

              Default:  Not enabled

       SMA_VERSION (toggle)
              Prints the libsma library release version.

              Default:  Not enabled

   Memory Placement Related Environment Variables
       The  available  MPI  memory  placement  environment variables should be
       used.  See MPI(1) and omplace(1) for more details.

   Installing SGI SHMEM software
       The SGI SHMEM software is packaged with  the  Message  Passing  Toolkit
       (MPT)  software  product. You can find installation instructions in the
       MPT release notes.  See the README.relnotes file, the pathname of which
       can be found by typing rpm -ql sgi-mpt | grep README.relnotes.

   Compiling SHMEM Programs
       The  SHMEM  library  functions  reside  in   SHMEM programs
       require the MPI runtime library  or   In  the
       following  examples,  the -lmpi option is specified, but -lmpi_mt could
       be  used  in  its  place  if  the  MPI/SHMEM   application   used   the
       MPI_THREAD_MULTIPLE multithreading level.

       The  following sample command lines compile programs that include SHMEM

            oshcc c_program.c

            cc c_program.c -lsma -lmpi

            oshfort fortran_program.f

            gfortran fortran_program.f -lsma -lmpi

            ifort fortran_program.f -lsma -lmpi

   Running SHMEM Programs
       The SHMEM programming model is layered on MPI infrastructure.  Programs
       are  started  with  an oshrun, mpirun or mpiexec_mpt command, as in the
       following examples:

            oshrun -np 32 ./a.out

            mpirun -np 32 ./a.out

            mpiexec_mpt -np 32 ./a.out

            mpirun hostA, hostB -np 16 ./a.out

   Supported Systems
       The SHMEM API is supported  for  clusters  connected  via  NUMAlink  or

       SHMEM  support  is  layered  on SGI MPT infrastructure.  The MPT memory
       mapping feature, which is enabled by default,  is  required  for  SHMEM
       support   within  a  host  or  between  NUMAlink-connected  hosts.   In
       addition, the xpmem kernel module must be installed.

       SHMEM programs are started with an mpirun command, which determines the
       number of processing elements (PEs) to launch.

   MPI Interoperability
       SHMEM  functions  can  be  used in conjunction with MPI message passing
       functions in the same application.  Programs  that  use  both  MPI  and
       SHMEM  functions should continue to call shmem_init and shmem_finalize.
       SHMEM PE numbers are equal to the MPI rank  within  the  MPI_COMM_WORLD
       environment  variable.  Note that this precludes use of SHMEM functions
       between processes in different MPI_COMM_WORLDs.  MPI processes  started
       using  the  MPI_Comm_spawn  function,  for  example,  cannot  use SHMEM
       functions to communicate with their parent MPI processes.

       In MPI jobs that use TCP/sockets for interhost communication,  you  can
       use  SHMEM  functions to communicate with processes running on the same
       host.  Use the shmem_pe_accessible function to determine if a remote PE
       is accessible via SHMEM communication from the local PE.

       When  running  an  MPI application involving multiple executable files,
       one can use SHMEM functions to communicate with processes running  from
       the same or different executable files, provided that the communication
       is limited to symmetric data objects.  It is  important  to  note  that
       static  memory, such as a Fortran common block or C global variable, is
       symmetric between processes running from the same executable file,  but
       is  not  symmetric  between processes running from different executable
       files.  Data allocated from the symmetric heap (shmalloc  or  shpalloc)
       is  symmetric  across  the same or different executable files.  Use the
       shmem_addr_accessible function to  determine  if  a  local  address  is
       accessible via SHMEM communication from a remote PE.

       Note  that  the  shmem_pe_accessible  function returns TRUE only if the
       remote PE is a process running from the same  executable  file  as  the
       local  PE,  indicating  that  full  SHMEM  support  (static  memory and
       symmetric heap) is available.

       When using SHMEM functions within an MPI program, one  should  use  the
       MPI  memory  placement  environment  variables  when  using non-default
       memory placement options.

   Thread Safety
       None of the SHMEM communication functions, including  shmem_ptr  should
       be  considered  to  be  thread  safe.   When  used  in  a multithreaded
       environment, the programmer should take steps to ensure  that  multiple
       threads  in  a  PE  cannot  simultaneously  invoke  SHMEM communication

   Cache Coherency
       The SHMEM library was originally developed for systems that had limited
       cache coherency memory architectures.  On those architectures, it is at
       times necessary to handle cache coherency within the application.  This
       is  not  required  on SGI systems because cache coherency is handled by
       the hardware.

       The SHMEM cache management functions were retained for ease in  porting
       from these legacy platforms.  However, their use is no longer required.

       Note that cache coherency does not imply memory ordering,  particularly
       with  respect to put operations.  In cases in which the ordering of put
       operations is important,  one  must  use  either  the  memory  ordering
       functions  shmem_fence  or  shmem_quiet,  or one of the various barrier

       Example 1.  The following Fortran SHMEM program directs all PEs to  sum
       simultaneously the numbers in the VALUES variable across all PEs:

             REAL VALUES, SUM
             COMMON /C/ VALUES
             REAL WORK
             CALL SHMEM_INIT()
             VALUES = SHMEM_MY_PE()
             CALL SHMEM_BARRIER_ALL       ! Synchronize all PEs
             SUM = 0.0
             DO I = 0,SHMEM_N_PES()-1
                CALL SHMEM_REAL_GET(WORK, VALUES, 1, I)   ! Get next value
                SUM = SUM + WORK                          ! Sum it
             PRINT*,'PE ',SHMEM_MY_PE(),' COMPUTED SUM=',SUM
             CALL SHMEM_FINALIZE()

       Example  2.   The  following  C  SHMEM program transfers an array of 10
       longs from PE 0 to PE 1:

             #include <stdio.h>
             #include <shmem.h>
             int main()
                 long source[10] = { 1, 2, 3, 4, 5,
                                     6, 7, 8, 9, 10 };
                 static long target[10];
                 if (shmem_my_pe() == 0) {
                      /* put 10 elements into target on PE 1 */
                      shmem_long_put(target, source, 10, 1);
                 shmem_barrier_all();  /* sync sender and receiver */
                 if (shmem_my_pe() == 1) {
                      printf("target[0] on PE %d is %ld\n", shmem_my_pe(),
                 return 0;

       dplace(1), omplace(1)

       The following man pages also contain information  on  SHMEM  functions.
       See the specific man pages for implementation information.

       shmem_add(3),   shmem_and(3),  shmem_barrier(3),  shmem_barrier_all(3),
       shmem_broadcast(3), shmem_cache(3),  shmem_collect(3),  shmem_cswap(3),
       shmem_fadd(3),     shmem_fence(3),     shmem_finc(3),     shmem_get(3),
       shmem_iget(3),     shmem_inc(3),     shmem_iput(3),      shmem_lock(3),
       shmem_max(3), shmem_min(3), shmem_my_pe(3), shmem_or(3), shmem_prod(3),
       shmem_put(3),   shmem_quiet(3),   shmem_short_g(3)    shmem_short_p(3),
       shmem_sum(3),      shmem_swap(3),      shmem_wait(3),     shmem_xor(3),
       shmem_pe_accessible(3),    shmem_addr_accessible(3),     shmem_init(3),



       MY_PE(3I), NUM_PES(3I)

       For   information   on  using  SHMEM  functions  with  message  passing
       functions, see the Message Passing Toolkit (MPT) User's Guide.


Output converted with man2html

Home    •     What's New    •     Help    •     Terms of Use    •     Privacy Policy    •

© 2009 - 2015 Silicon Graphics International Corp. All Rights Reserved.