Skip navigation
NASA Logo, National Aeronautics and Space Administration
Modeling Guru Banner

Manage categories

Close

Create and manage categories in GEOS. Removing a category will not remove content.

Categories in GEOS
Add a new category (0 remaining)

Manage Announcements

Close

Create and manage announcements in GEOS. Try to limit the announcements to keep them useful.

Announcements in GEOS
Subject Author Date Actions

Blog Posts

26 Posts 1 2 Previous Next
0

Latest DAS

Posted by Carlos Cruz Jun 29, 2010

A few months ago Jules published the required steps to setup and run the GEOS5-DAS. Also, additional edits were required (provided at the end of the post) to a handful of files for the DAS to run without any problems. This setup also used 1/2 degree resolution background files thus making portability experiments inefficient. Fortunately the latest DAS tags, GEOSadas-5_5_2, is  the easiest tag I have had to build/run in a long time:

 

cvs co -r GEOSadas-5_5_2 GEOSadas
cd GEOSadas
source g5_modules
make install
cd ../Linux/bin
fvsetup

 

The greatest advance over earlier tags is that fvsetup actually contains defaults that work and the background files used are 2degree resolution making it feasible for portability experiments. Note that the default settings may not be optimal but they will allow you to get observational and background files. (Getting the latter is perhaps the biggest challenge in getting the DAS setup.)

3

According to Dan Kokron, these are the setting that are working (in the run.script for GEOS-5).  Bumping the ACK_TIMER to

22 and enabling RNDV_WRITE were necessary.

 

setenv I_MPI_USE_DYNAMIC_CONNECTIONS 0

setenv I_MPI_JOB_STARTUP_TIMEOUT 10000

setenv DAT_OVERRIDE /usr/local/dapl/1.2.12/etc/dat.conf

setenv I_MPI_DEVICE rdssm:OpenIB-mlx4_0-1

setenv DAPL_ACK_RETRY 7

setenv DAPL_ACK_TIMER 22

setenv DAPL_RNR_RETRY 7

setenv DAPL_RNR_TIMER 28

setenv I_MPI_RDMA_RNDV_WRITE 1

 

UPDATE (7May10): These settings are partially obsolete now. Please see Using IntelMPI on Discover

0

Parallel GEOS5 output

Posted by Carlos Cruz Dec 23, 2008

Recently we modified GEOS5 to produce parallel output of its HISTORY data streams (collections). We have tested such implementation by running the GCM on DISCOVER at 1/2 degree resolution (540x361) and outputting the MERRA HISTORY data stream (using only woodcrest chips). We find that for a one day simulation there is a significant increased performance as we increase the number of CPUS (see attched).

0

Intel MPI settings for GEOS5

Posted by Carlos Cruz Nov 22, 2008

These are potential edits (including Bill Putman's) to the GEOS5 makefiles using intel MPI. I started with Config files tagged under ams-GEOSdas-2_1_6-3 (revision 1.167). So far I have been able to build/run both gsi.x and GEOSgcm.x separately for 1-day simulations.

 

ESMA_arch.mk under Linux section:

 

old     MPFLAG  := -mp

---

new     MPFLAG  :=

 

old     FPE = -fpe0

---

new     FPE = -fp-model strict

 

old        FOPT = $(FOPT3)

---

new        FOPT = $(FOPT3) -w -vec-report0 -ftz -align all -fno-alias $(FPE)

 

old     LIB_SYS := -ldl -lc -lpthread -lrt

---

new     LIB_SYS := -ldl -lc -lpthread -lrt -lcprts -lunwind

 

old     LIB_SYS = -L$(GCC_DIR) -lstdc+

---

new     LIB_SYS = -L$(GCC_DIR) -lstdc+ -L$(MKLPATH) -lguide -lmkl_lapack -lmkl -lpthread

 

Under GMAO_shared:

 

GMAO_mpeu/mpi0/GNUmakefile

 

old FOPT = $(FOPT3)

---

new #FOPT = $(FOPT3)

 

GMAO_ods/GNUmakefile

 

old FOPT = $(FOPT3)

---

new #FOPT = $(FOPT3)

 

GMAO_pilgrim/GMAO_pilgrim_arch.mk

 

old FOPT = -O2

---

new #FOPT = -O2

 

Under GEOSgcs_GridComp:

 

GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/GEOSchem_GridComp/GOCART _GridComp/O3_GridComp/GNUmakefile

 

old FOPT = $(FOPT3)

---

new #FOPT = $(FOPT3)

 

GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/GEOSchem_GridComp/StratC hem_GridComp/SC_GridComp/GNUmakefile

 

old FOPT = $(FOPT3)

---

new #FOPT = $(FOPT3)

 

GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSsuperdyn_GridComp/FVdycore_GridComp/FVdyc ore_arch.mk

 

old         USER_FFLAGS = -mp -stack_temps -fno-alias -ftz -auto

---

new #        USER_FFLAGS = -mp -stack_temps -fno-alias -ftz -auto

 

-


Note: these edits are experimental. We have found that both the GCM and GSI do run (i.e. no crashes) over short simulation times (1 day). For both the GCM and GSI I ran simulations with 8 and 16 nodes (32 and 64 cpus) at MERRA resolutions (1/2 degree) and with executables compiled with gcc and icc.

0

Ticket Summary

Posted by swartzbr Oct 23, 2008

Here is a list of the current outstanding GEOSDAS tickets:

 

- 19827  Cannot run quarter degree GCM on 480 cores.

 

Error message is "OpenIB-cma: could not register DAPL memory region for receive buffer: DAT_INSUFFICIENT_RESOURCES() "

Testing on 10/20 showed that the PBS head node had different limit settings than the other non-head nodes. The sysadmins have tested a fix and put it in place on 10/22.  Dan indicates the same error occurs.

 

 

- 19784  Are PMI extensions available on discover?

 

The documentation is somewhat lacking, but the short answer is that we do not explicitly touch either of the two PMI related options listed in the reference manual. If you load the Intel MPI module and then do a:

printenv|grep I_MPI

you should be able to see all of the intel MPI specific variables that we have set.

 

 

- 19741  We see routine application (gsi.x) failures on discover when using Intel MPI in GEOSdas. The error message that I get is.

"unexpected DAPL event 4008"

 

Dan supplied the binary and code to reproduce this problem.  However Applications ran gsi.x 8 times in a row on 10/23 without reproducing the error. May have been fixed with the system changes made  on 10/22?

 

 

- 19738  Would like to run GEOS-5 DAS without the limitation of "scali=true", especially gsi.

 

Dan is working on this.

 

 

- 19090  Does the mpirun command that comes with Intel MPI recognize/use the PBS_NODEFILE environment variable when it figures out which nodes to distribute processes to?

 

Intel MPI has 2 node files, one to start the demon and other for mpirun. Instead of altering PBS_NODESFILE, you could create a machine file, which would be same as GCM_list file, and run the job using -machinefile option in the mpirun command.

 

 

- 18718  mpitune app gave abnormal completion without indicating why.

 

Not sure mpitune should be executed by someone other than the installer of Intel MPI.

 

 

- 18072  Intel MPI benchmark (IMB_2.3) on discover failed with SIGSEGV during a 600 process run.

 

This was attempting to reproduce a GEOS error, which was fixed by rewriting the I/O to use binary I/O. So this is lower priority.

0

In compiling-geos5-with-intel-mpi I suggested a way to build GEOSgcm.x unsing the Intel compiler version 10.1.17 as well as Intel MPI. Unfortunately such a build produces runtime errors such as

 

libc.so.6          00002B814D08C154  Unknown              
Unknown  Unknown
GEOSgcm.x          00000000004065E9  Unknown              
Unknown  Unknown
forrtl: error (65): floating invalid

 

 

I believe this is is due to linking with an incompatible libc.so (probably 32bit vs 64bit). I am not sure about this problem and this issue was not pursued.

 

Instead Bill Putman pointed out that such problems can be solved by avoiding the use of mpiifort (as well as mpiicc and mpiicpc) and using ifort instead (the difference between ifort and mpiifort is that the latter explicitly uses ifort with MPI). However Bill's changes excluded flags (-fpe0 -fp-model strict) that allow users to detect invalid floating operations (overflows and underflows). These are important and must be included, if at all possible. Furthermore these edits, to ESMA_arch.mk, also excluded system libraries, such as LAPACK, that are needed to build executables useful in the Data Assimilation System (DAS).

 

On a related note, Dan Kokron from GMAO has pointed out the following:

 

 

-


I am seeing routine application (gsi.x) failures on discover when using

Intel MPI in GEOSdas. The error message that I get is.

 

"unexpected DAPL event 4008"

 

A quick search landed me at the following URL's which might help resolve

this issue.

 

http://www.mail-archive.com/general@lists.openfabrics.org/msg11812.html//www.mail-archive.com/general@lists.openfabrics.org/msg11812.html

 

http://lists.openfabrics.org/pipermail/general/2007-February/033607.html

 

I have attempted to resolve these failures by doing the following.

 

1) adding "-genv I_MPI_RDMA_RNDV_WRITE" to my mpirun command line

2) adding set DAPL_CM_ROUTE_TIMEOUT_MS=20000 to me run script

3) running on half populated harpertown nodes (perhost=4)

 

my compile and runtime env is

 

comp/intel-9.1.052 2) lib/mkl-9.1.023 3) mpi/impi-3.1.038

Dan Kokron-----

One important thing to note is that Dan has used the Intel compiler version 9.1.052.

 

We believe we have come up with a new build that for the DAS (which includes GEOSgcm.x and gsi.x) that addresses some of the issues described above and also uses the newer Intel compiler version. The build is just an extension of Bill's edits with additional "floating point exception" flags and the inclusion of LAPACK libraries.

 

The modeling environment is:

 



echo $BASEDIR
/usr/local/other/baselibs/ESMF222rp3_NetCDF362b6_10.1.017_intelmpi
module list
Currently Loaded Modulefiles:
  1) comp/intel-10.1.017   3) mpi/impi-3.1.038
  2) lib/mkl-10.0.3.020    4) tool/tview-8.2.0.1

 

 

While in ESMA_arch the additional modifications include (under Linux, ifort):

 



    FPE = -fp-model strict -fpe0
    LIB_SYS = -L/usr/lib64 -ldl -lc -lpthread -lrt -lcprts -lunwind
    LIB_SYS += -lguide -lmkl_lapack -lmkl -lpthread

 

 

I have successfully ran standalone 1-day simulations of GEOSgcm.x and gsi.x and hopefully other users can adopt these settings for their GCM and DAS runs.

 

---

Carlos

2

One can build GEOS5 using various "modeling environments" (ME) 1. A common one is:

 

 BASEDIR: /usr/local/other/baselibs/ESMF220rp2_NetCDF362b6_9.1.052
 LD_LIBRARY_PATH: /usr/local/toolworks/totalview.8.2.0-1/lib:/opt/scali/lib64:/usr/local/intel/mkl/9.1.023/lib/em64t:/usr/local/intel/comp/9.1.052/lib:/usr/local/other/baselibs/ESMF220rp2_NetCDF362b6_9.1.052/Linux/lib
Currently Loaded Modulefiles:
  1) comp/intel-9.1.052   3) mpi/scali-5
  2) lib/mkl-9.1.023      4) tool/tview-8.2.0.1

 

This ME however, uses a buggy HDF library that saves the UNLIMITED dimension (time) of hdf files to size+1 when size=4. This problem is being investigated and the HDF software group will be contacted to report the bug.

 

Another ME that avoids the aforementioned bug uses an older build of the baselibs:

 

 BASEDIR: /usr/local/other_old/baselibs/v2_2r2_9.1.042.meta_new
 LD_LIBRARY_PATH: /usr/local/toolworks/totalview.8.2.0-1/lib:/opt/scali/lib64:/usr/local/intel/mkl/9.1.023/lib/em64t:/usr/local/intel/comp/9.1.042/lib:/usr/local/other_old/baselibs/v2_2r2_9.1.042.meta_new/Linux/lib
Currently Loaded Modulefiles:
  1) comp/intel-9.1.042   3) mpi/scali-5
  2) lib/mkl-9.1.023      4) tool/tview-8.2.0.1

 

Note that both MEs use scali MPI and ESMF version 2.2.0rp2. There are however advantages in moving to other MEs, one of them using Intel MPI (see High End Computing FAQ for more information). For example:

 

 BASEDIR: /usr/local/other/baselibs/ESMF222rp3_NetCDF362b6_10.1.017_intelmpi
 LD_LIBRARY_PATH: /usr/local/toolworks/totalview.8.2.0-1/lib:/usr/local/intel/mpi/3.1.038/lib64:/usr/local/intel/mkl/10.0.3.020/lib/em64t:/usr/local/intel/comp/10.1.017/lib:/usr/local/other/baselibs/ESMF222rp3_NetCDF362b6_10.1.017_intelmpi/Linux/lib
Currently Loaded Modulefiles:
  1) comp/intel-10.1.017   3) mpi/impi-3.1.038
  2) lib/mkl-10.0.3.020    4) tool/tview-8.2.0.1

 

This ME also uses a more recent version version of the Intel compiler but more important, and this is for GEOS5 users, it uses ESMF version 2.2.2rp3. This last detail may cause some problems when compiling MAPL. At least for the MERRA tag the only way to avoid compilation problem in MAPL is to use the following flags:

 

 gmake install XFLAGS="-DESMF_2_2_2_or_newer" ESMA_FC=mpiifort

 

The ESMA_FC=mpiifort is necessary because ESMA_arch.mk does not have an option to correctly set the fortran compiler comp/intel-10.1.017 (it assumes ESMA_FC=ifort).

 

-


1 To learn more about "useful" MEs look at Change modeling environment on DISCOVER.

1

FVdycore_wrapper timings

Posted by Carlos Cruz Oct 13, 2008

These are timings (in seconds on the root processor) for the FVdycore_wrapper 1 for a GEOS-5 (0.5 o with MERRA data stream) 1-day simulation:

 

Initialize

Other

Trans1

Trans2

After

Total

41.45570

1.173900

5.444600

13.55030

61.63320

 

Run

Other

Trans1

Trans2

After

Total

657.0381

6.054000

9.834699

39.93340

713.3965

 

Notes:

 

Other: timings for calculation that occur before and within Trans1 and Trans2

Trans1: timing for the first transpose

Trans2: timing for the second transpose

After: timings for calculations after Trans2

 

Job Resource Usage:

 

 

CPU Time Used

02:08:37

Walltime Used

00:33:52

Number of CPUs Requested

128

Walltime Requested

01:00:00

 

-


1  Wrapper for NASA finite-volume dynamical core

2

Archive GEOS-5 GCM output

Posted by Carlos Cruz Oct 2, 2008

The GEOS-5 GCM run.script provides some functionality to archive the model output (restarts and history collections) to mass storage. Unfortunately when submitting a job (via "qsub run.script") on DISCOVER the archiving will not take place because a simple "cp" is not possible from the login nodes to mass storage (on DIRAC). Therefore it is necessary, if archiving is desired, to modify the run.script. To enable archiving replace the code under "Archive History Output and Restarts" with the following code snippet:

 


set edate = e`cat cap_restart | cut -c1-8`_`cat cap_restart | cut -c10-11`z

# Move HISTORY Files to Holding Directory
# ---------------------------------------
foreach    collection ( $collections )
  /bin/mv `ls -1 *.${collection}.*` $EXPDIR/holding/$collection
end

set      PARM = PBS
set      FILE = $EXPDIR/listings/archive.${EXPID}_${edate}.batch

if( -e $FILE) /bin/rm $FILE
cat > ${FILE} <<EOF
#!/bin/csh
#$PARM -l walltime=1:00:00
#$PARM -l ncpus=1
#$PARM -V
#$PARM -N archivejob
#$PARM -W group_list=k3002
#$PARM -q datamove
#

mkdir -p $MASDIR/restarts
mkdir -p $MASDIR/holding


# Copy HISTORY Files to Holding Directory
# ---------------------------------------
foreach    collection ( $collections )
  if (! -e $MASDIR/holding/\$collection ) mkdir $MASDIR/holding/\$collection
  cp $SCRDIR/*.\$collection.* $MASDIR/holding/\$collection
end


foreach   rst ( $rst_files )
 if( -e $SCRDIR/\$rst ) then
       cp $SCRDIR/\$rst $MASDIR/restarts/\$rst.$edate
 endif
end

exit
EOF

cd   $EXPDIR/listings
qsub $FILE
cd   $SCRDIR

 

Note: $MASDIR must be defined. This could be something like $ARCHIVE/GEOS-5/$EXPDIR.

1

gcm_setup - caveats

Posted by Carlos Cruz Sep 12, 2008

The GEOS-5 GCM batch scripts generated by gcm_setup (and makexp) were written to facilitate the GMAO GCM developers test their software. However they have gained widespread use as they are the only scripts available to setup and run the GCM. There are however some caveats, some of which are summarized at the end of the aforementioned setup:

 


You must now copy GCM Initial Conditions into: $WRKDIR/$EXPID
in addition to your GEOSgcm.x executable and a set of Chemistry RC files.
The Chemistry RC files are located under your build:  ../src/GMAO_Shared/Chem_Base
You should modify the Chemistry RC files to reflect the type of Aerosols you want.

 

 

This can be confusing for new users....

If you use a modified gcm_setup then the experiment defaults to a set of model run parameters and restarts and there is no need to obtain any files: the resulting gcm_run.j script will fetch all the needed files and make other minor (and useful) changes.

 

To use, update two files under Applications/GEOSgcm_App to the SIVO_DEV_BRANCH 1 :

 


Applications/GEOSgcm_App
cvs upd -r SIVO_DEV_BRANCH gcm_setup gcm_run.j.tmpl

 

and run gcm_setup with up to three optional arguments:

 


gcm_setup expid exepath rstpath

for example:


gcm_setup exp001 /u/ccruz/GEOSgcm

 

 

and follow the instructions.

 

When done, gcm_setup will have created several files in a location of your choice:

 


geos5/exp001> ls -1
AGCM.tmpl
CAP.tmpl
gcm_post.j
gcm_regress.j
gcm_run.j
HISTORY.tmpl

 

Now you should be ready to submit your job via "qsub gcm_run.j", right? Not quite. You will need to manually edit the AGCM.tmpl (used to create the AGCM.rc file): comment out the lines (by prepending a #) that start with TURBULENCE_IMPORT_RESTART and TURBULENCE_INTERNAL_RESTART. This caveat is described in GEOS-5 DAS revision history (and unfortunately not in GEOS-5 GCM revision history):

 


~~~~~~
 NOTE
~~~~~~
The TURBULENCE_IMPORT restart is no longer commented out by default in the
AGCM.rc file. If no restart is present, then will need to BOOTSTRAP it (or
comment out in AGCM.rc).

 

Now you should be ready to perform the experiment. For more details consult the GEOS-5 AGCM User's Guide.

 

-


1 The branch was off of GEOSagcm-Eros_7_24 tag (MERRA).

1 2 Previous Next

Actions

Notifications

USAGov logo NASA Logo - nasa.gov