ACCRE Home

Enabling Researcher-Driven Innovation and Exploration


Vanderbilt Home
New Cluster: Frequently Asked Questions

I cannot log onto the cluster. I got this error:

The reason this message appreas is that the host's ssh key has been changed due to the OS upgrade. The simple fix to this is to go into your '.ssh' directory, and open the file 'known_hosts' (or known_hosts2, whichever one the error message specifies) in your favorite editor. Find the line that starts with the host that you were connecting to and delete the line. You will get a message like the one that you saw when you connected to a server for the first time.

Top of Page


My job script says matlab cannot be found. What happened?

We have changed the way how matlab will be maintained. In order to set matlab path automatically when you log in, add the following line to your .bashrc (for bash) or .cshrc (for csh, tcsh) depending on your shell.

setpkgs -a matlab

Top of Page


My job script ran fine on the old cluster. However, I saw random errors when running the same script on the new cluster. Some runs give no output at all, while others indicate "segmentation faults". How to resolve this?

We believe that the csh/tcsh on the newly installed 64 bit OS may have caused the problem. To fix this, you need to change your shell to bash and replace the first line of your pbs script with "#!/bin/bash".

If you have set up your own environmental variables in your .cshrc/.tcshrc, you will need to add proper entries to your .bash_profile or .bashrc after you change your shell to bash. e.g., if you have

alias rm '/bin/rm -i'

in your .cshrc, you should add:

alias rm='/bin/rm -i'

in your .bashrc or .bash_profile file. Or if you have:

set path = (/bin /usr/bin /sbin $HOME/bin)

in your .cshrc, you should add:

export PATH=$PATH:/bin:/usr/bin:/sbin:$HOME/bin

in your .bashrc or .bash_profile file etc.

Top of Page


I am trying to run a parallel code and got the following error message: 

   Warning: no access to tty (Bad file descriptor).
   Thus no job control in this shell.
   Thu May 29 17:03:51 CDT 2008
   /gpfs0/home/myuserid
   Nodes=/usr/spool/PBS/aux//145750.vmpsched
   nodefile------------------------: /usr/spool/PBS/aux//145750.vmpsched
   vmp231
   vmp231
   --------------------------------
   mpiexec_vmp231.vampire: cannot connect to local mpd (/tmp/mpd2.console_xud); possible causes:
     1. no mpd is running on this host
       2. an mpd is running but was started without a "console" (-n option)
       In case 1, you can start an mpd on this host with:
            mpd &
            and you will be able to run jobs just on this host.
            For more details on starting mpds on a set of hosts, see the MPICH2 Installation Guide.
            rm: No match.

For MPICH2 jobs, you should use mpiexec (in package "mpiexec") 
as job launcher to the cluster:

In your .cshrc or .bashrc, add

setpkgs -a mpiexec

Then in your pbs script, use:

mpiexec -n NUMBER_OF_CPUS YOUR_EXECUTABLES [arg1] [arg2] ....

to start the application. 

For more information about mpiexec, please refer to the manpage:

man mpiexec 

or go to: 

http://www.osc.edu/~pw/mpiexec/index.php
Top of Page


Last modified: May 30 2008 09:14:34 CST.