Stars Cluster

The High Performance Computing cluster
of Howard University's College of Arts and Sciences

WebMaster: Roberto De Leo

What is it

Stars is the High Performance Computing (HPC) cluster of the College of Arts and Sciences at Howard University. It was born thanks to a grant assigned to the P.I.s Roberto De Leo (Dept. of Mathematics), J. Harkless (Dept. of Chemistry) and P. Bezandry (Dept. of Mathematics) within Howard University's 2014 Research Resources and Small Equipment Program. All CoAS' students and researchers are eligible for access to Stars to perform their own numerical studies. This page contains all the instructions on how to get access and how to use the cluster.

Hardware

Stars currently consists in 6 nodes with 2 CPU Xeon E5-2640 v3 @ 2.60GHz with 8 cores each (on all machines is activated the Hyperthreading feature, so that every node has 32 active cores total). On each machine are installed 16GB of RAM and a 3.6TB HDD storage (in a RAID5 configuration) is concentrated on one machine and shared with the remaining five.

OS

The Operating System installed on Stars is Linux, kernel version 4.1.6 (distribution: Slackware 14.1).

Software installed

The following is a list of the major packages installed on the Stars cluster. You are always welcome to install other packages in your own home directory but if you think that some package is general enough to be useful for many other people just send an email to the current cluster admistrator Roberto De Leo ⟨roberto.deleo AT howard.edu⟩ to have it installed on the cluster.

How to get an account on the cluster

Send an email to the current cluster admistrator Roberto De Leo ⟨roberto.deleo AT howard.edu⟩ with your name, affiliation and your favorite login name (max 8 characters).

How to connect to the cluster

You can connect to the cluster through ssh to the cluster's server helios.physics.howard.edu. If you connect from outside the HU network, remember to use port 443 (due to the current firewall policy of HU it is not possible to use the default port 22 from remote).
If you use the command line and your login name is "doe", then you can connect with
ssh -p 443 doe@helios.physics.howard.edu
If you want, moreover, to be able to use graphical applications, then you should connect with
ssh -p 443 -YC doe@helios.physics.howard.edu

Look here for more ssh usage examples and how to use ssh/scp from Windows or Mac: ssh command Tutorial

What to do after you log in

Once you are logged into the system, all you get is a prompt waiting for your commands. In order to use the cluster, therefore, it is more or less mandatory that you know at least the basics of the Unix Command Line Interface (CLI), e.g. to see the list of your files, change directory, create or erase files and directories and so on. If you do not have any experience of Unix' CLI, here are a few good starting pointers:

How to transfer files from/to the cluster

You can transfer files to and from the cluster using scp through the port 443.
If you use the command line and your login name is "doe", then you can copy the file myfile to the cluster with
scp -P 443 myfile doe@helios.physics.howard.edu:
To copy recursively the entire directory myfile to the cluster use
scp -P 443 -r mydir doe@helios.physics.howard.edu:
Viceversa, you can copy the remote file myremotefile from the cluster to your machine with
scp -P 443 doe@helios.physics.howard.edu:myremotefile .

Look here for more scp usage examples: scp command Tutorial

How to submit jobs

All jobs on the cluster must be submitted through the scheduler system slurm. Jobs running on the cluster without passing through slurm will be stopped by the System Administrator without notice.
On the cluster there are 4 queues defined:

  • debug -- priority 1000, 1 hour maximum running time
  • fast -- priority 100, 6 hours maximum running time
  • medium -- priority 10, 1 day maximum running time
  • debug -- priority 1, default, no running time restriction

E.g., to run your program myjob on the medium cluster's queue from the command line, use
srun -p medium myjob
The scheduler will automatically take care of the node on which to run the job. When a higher priority job is launched and there are no more free cores, then a lower priority job will be suspended until a higher priority job ends. Very good tutorials on slurm can be found here:

How to see which jobs are running

There are three commands that, after you log in into the cluster, allow you to see which and whose jobs are currently running on it:

  • squeue -- view information about jobs located in the Slurm scheduling queue.
    Example:
    		deleo@helios:~$ squeue
    		JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    		2434      slow apolloni    deleo  R 18-21:57:51      1 rigel
    		2525      slow apolloni    deleo  R 6-02:46:02      1 aldebaran
    		2527      slow apolloni    deleo  R 6-02:46:02      1 aldebaran
    		2529      slow apolloni    deleo  R 6-02:46:02      1 aldebaran
    		
  • smap -- a colored version of the same information.
  • sview -- graphical user interface to view and modify Slurm state.