Skip to main content Skip to navigation
Kamiak User's Guide

Storage

Overview

Kamiak provides several options for data storage. This table lists what is currently available for use, click the name of each storage to learn more about it below.

NameMount on KamiakSnapshots for Data Recovery?ACL Support?Description
Home/homeYesYesNetwork based storage provided to all users; limited to 10GB.
Project/dataVaries (Default: Yes)YesNetwork based storage available for lease/purchase with rotational disks and SSD cache.
Scratch/scratchNoNoNetwork based storage available at no cost to use with 10,000 RPM rotational disks and SSD cache.
Local Scratch/localNoNoA local SSD or other flash device in each compute node available at no cost to use.
SciDAS/scidasNoNoNetwork based storage specific to the SciDAS project.

All storage spaces in Kamiak other than local scratch are available on every compute node. If you create a file in your home directory on a login node, that same file is also available within a job on compute nodes. As such, there’s usually no need to copy files between compute nodes.

Warning:
Storage on Kamiak is designed to be efficient, performant, and reliable. However, there are NO backups of any data residing on Kamiak. Data owners (you) are responsible for protecting their own data by copying it out of Kamiak onto another system.

Home Space

Kamiak users are provided with a 10GB home directory regardless of their affiliation to the university. Home directories are named using your WSU NID and your usage can be viewed using the quota command:

$ echo $HOME
/home/my.NID

$ cd $HOME

$ quota -s -f /home
Disk quotas for user my.NID (uid 8003): 
     Filesystem   space   quota   limit   grace   files   quota   limit   grace
10.110.0.11:/home
                  8001M  10240M  10240M            139k   4295m   4295m

The size of your home directories cannot be increased. To acquire more space you will need to use other storage, such as Project space or Scratch space.

Project Space

“Project space” is a term we use to refer to bulk storage which is available for lease or purchase. This space is provided by large (4TB – 6TB) rotational disks in RAID arrays within an enterprise-class storage array. It includes an SSD-based cache and is served to compute nodes over a high speed network. It is mounted at /data on every compute node in Kamiak and each project space has its own mount (e.g. /data/cahnrs) inside of /data. This means every project space is separate, has its own capacity and file count limits, and is unaffected by other project spaces becoming full or otherwise unusable. Project space is purchased independently of compute resources in units as low as 100GB.

Kamiak’s sponsoring colleges (CAHNRS, CAS, VCEA) own project space which can be utilized by members of the respective college. Training events and courses which utilize Kamiak may also provide project space for their students. This means you may already have project space available to you.

Can I use an existing project space or do I need to purchase? If you are a PI and are affiliated with a sponsoring college, request access to their space. Once that is approved you and all members of your lab on Kamiak will be able to use your college’s project space. If you are not a PI, ask your PI if they already have purchased space for their lab or if they have access to space provided by a sponsoring college. Otherwise, PI or not, you’ll need to purchase project space or utilize other storage, such as Scratch space.

Project space on Kamiak includes the following features, which can be enabled, disabled, or changed by request of the space’s owner:

Snapshot Data RecoverySnapshots can be used to recover recently lost files. By default projects spaces receive a daily snapshot, of which 3 are retained and available at any time. The oldest is removed as new snapshots are automatically created.
DeduplicationDeduplication of data allows the system to store only a single copy of duplicate "blocks" of data. This can lead to large storage savings on some datasets with minimal performance impact. Disabled by default.
CompressionCompression of data allows the system to use less space to store data. This can lead to large storage savings on some datasets with a sometimes significant performance impact. Disabled by default.
QuotasQuotas can be used to limit the space or number of files of a particular user or group. A directory can also be given these limits regardless of who owns the data in it. Disabled by default.
Access Control ListsACLs provide a means of allowing or restricting access to files and directories. (Hint: Ever need to chmod 777 a file? Use ACLs instead.)

For more information about purchasing your own project space, see the Become an Investor page.

Scratch Space

Kamiak provides temporary, free to use, fast scratch storage to all users. There is no cost to use it, but there are limitations including:

  • To use scratch, you must create a Workspace (see below) to keep your data in.
  • Each workspace (and all data within it) has a max lifetime of 2 weeks until it is removed automatically.
  • This is a shared resource with finite space. Care must be taken to ensure appropriate use of it.

Scratch

This storage is provided by high speed (10,000 RPM) rotational disks in RAID arrays within an enterprise-class storage array. It includes an SSD-based cache and is served to compute nodes over a high speed network.

Local Scratch

This storage is provided by a local SSD or flash device within each compute node. As such, data stored here is only available on the compute node the workspace was created on. This is generally the fastest storage available in Kamiak. However, it is small due to it being a single disk (size varies by node; 400GB or more).

Intro to Workspace Maker

Simple answer: Run the command mkworkspace on Kamiak either on a login node, within an interactive job, or within a batch job to create a workspace.

Workspace Maker is a utility used to manage access to Scratch space on Kamiak. In order to use any Scratch space, you’ll need to create what we call a “workspace” to store your data within. A workspace is simply a directory that expires and is automatically removed. The maximum lifetime of a scratch workspace on Kamiak is 2 weeks. Clearly, scratch storage is not intended for permanent data and users should consider project space for that use. To use workspaces you’ll need to know three commands: mkworkspace, lsworkspace, and rmworkspace. Only the first is required since removal of a workspace is automatic if we let it expire. Run each command with the option --help to see what options are available.

Let’s look at an example of creating, using, and removing a workspace interactively:

$ mkworkspace 
Successfully created workspace.  Details:
    Workspace: /scratch/my.NID_616253
    User: my.NID
    Group: its_p_sys_ur_kam-its
    Expiration: 2017-08-20 16:21:30

$ cd /scratch/my.NID_616253

[my.NID_616253]$ touch file.txt

[my.NID_616253]$ cd

$ lsworkspace
Workspace: /scratch/my.NID_616253
    Creation host: login-p1n01
    Creation time: 2017-08-6 16:21:30
    User owner: my.NID
    Group owner: its_p_sys_ur_kam-its
    Expiration time: 2017-08-20 16:21:30

$ rmworkspace -n my.NID_616253 --force
Remove workspace '/scratch/my.NID_616253' (expired 2017-08-20 16:21:30)?  y or n: y
Removing workspace /scratch/my.NID_616253
rmworkspace completed, total removed workspaces: 1

Note that expiration date. Shortly after the workspace expires it and all of the data within will be deleted. Now let’s use a workspace in a batch job script:

#!/bin/bash      
#SBATCH --time=0-00:010:00    ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --ntasks-per-node=1   ### Number of tasks to be launched per Node

my_workspace="$(mkworkspace --quiet)"

echo "My workspace is: $my_workspace"

cd $my_workspace

echo 'Hello!' > file.txt

This job will simply create a workspace, place the text “Hello!” into a file in the workspace, and end the job. Let’s extend it slightly to use a Local Scratch workspace by using the option –backend to mkworkspace (see mkworkspace --help for all options):

#!/bin/bash      
#SBATCH --time=0-00:010:00    ### Wall clock time limit in Days-HH:MM:SS
#SBATCH --ntasks-per-node=1   ### Number of tasks to be launched per Node

my_workspace="$(mkworkspace --backend=/local)"

echo "My workspace is: $my_workspace"

cd $my_workspace

echo 'Hello!' > file.txt

See Workspace Maker’s full documentation for more information on how to utilize workspaces. Of particular note this includes examples of automatic removal of workspaces and management of them within jobs. We recommend you remove workspaces as soon as they are no longer needed rather than allowing them to expire. That is so the system can promptly release the storage space for others to use.

Snapshots for Data Recovery

Although data on Kamiak is not backed up, storage snapshots are created periodically on both Home and Project space but not on other storage including Scratch and Local Scratch. These snapshots can be used to recover data that has been lost or over-written recently. It is imperative that when data loss occurs users act quickly to recover it from a snapshot. Otherwise, the data will be lost permanently when the snapshots expire and are automatically removed.

Listing snapshots

Snapshots are contained in a hidden folder called .snapshot. Note the leading dot and the fact that this directory is invisible (even with the -a option of ls) but its contents can be viewed directly:

$ ls -1 /home/.snapshot
daily.2017-01-03_0010
daily.2017-01-04_0010
daily.2017-01-05_0010
hourly.2017-01-05_0705
hourly.2017-01-05_0805
hourly.2017-01-05_0905
hourly.2017-01-05_1005
weekly.2016-12-25_0015
weekly.2017-01-01_0015
$ ls -1 /data/myLabName/.snapshot
daily.2017-01-03_0010
daily.2017-01-04_0010
daily.2017-01-05_0010

Recovering Data

To recover data from a snapshot simply identify which snapshot contains the data then copy it out of the snapshot and into your storage:

$ rm -f test.py # oops
$ ls test.py
ls: cannot access test.py: No such file or directory
$ cp /home/.snapshot/hourly.2017-01-05_1005/myWSU.NID/test.py test.py
$ ls test.py
test.py

In the above example “hourly.2017-01-05_1005” is the most recent snapshot, so that’s where we copy our data from.  Any changes made to test.py since this snapshot was made are lost and unrecoverable. It is important to act quickly and copy the data out of the snapshot before it expires and the data is lost forever.

Managing Snapshots

The time, frequency, and retention of snapshots vary between storage spaces on Kamiak. Owners of Project space can request a custom snapshot schedule or disable them entirely. The latter may be desirable if the data has a high rate of change which causes snapshots to become large and consume capacity excessively. Submit a Service Request to learn more or request a change.

Permissions and Data Security

Storage on Kamiak provides data security via standard Unix/Linux file ownership and permissions. An explanation of how ownership and permissions behave in Linux can be found at this site and in various other guides available online.

Data owners on Kamiak (you) are responsible for the security of their data and must allow or restrict access to their data as needed. Importantly, allowing “other” the write permission (i.e. chmod 777) is considered dangerous. You are highly encouraged to utilize Access Control Lists to allow specific additional users or groups to access your data.

Permissions

Every file and directory has two owners: the user owner and group owner. The permissions of read (r), write (w), and execute (x) are effective for the user owner, group owner, and all other users separately. Let’s look at an example:

$ cd /data/myLabName/

$ mkdir example

$ ls -ld example/
drwxr-xr-x 2 my.NID its_p_sys_ur_kam-its 4096 Aug 10 08:14 example

In that example we created a new directory in Project space. The file is owned by my user and my lab group. However, the permissions rwxr-xr-x show that group only has read access. Let’s allow our group to write data into the directory:

$ chmod g+w example/

$ ls -ld example/
drwxrwxr-x 2 my.NID its_p_sys_ur_kam-its 4096 Aug 10 08:14 example/

We can also secure the directory by preventing anyone else (users not in our lab group) from accessing data in the directory:

$ chmod o-rwx example/

$ ls -ld example/
drwxrwx--- 2 my.NID its_p_sys_ur_kam-its 4096 Aug 10 08:14 example/

Default Permissions and umask

By default files and directories you create are readable but not writable by any other user on the system. To have new files automatically have a desired permission, you will can configure your umask. To restrict access to members of your lab, run umask 0007 which removes “other” access and gives the group owner full read, write, and execute access. In order for your new umask to be permanent, add the umask command to a login script in your home directory. This is typically .bash_profile or .bashrc (note the leading dot in the file names) but differs if you use a shell other than Bash.

Access Control Lists (ACLs)

Kamiak has NFSv4 Access Control Lists (ACLs) enabled which can be managed with the commands nfs4_setfacl and nfs4_getfacl. Users who are interested in utilizing nfs4_acl should be aware that this type of ACL is distinct from POSIX ACLs. More information about ACLs can be found at this site. Complete documentation can be found by running man nfs4_acl. Let’s look at a very simple example of adding an ACL to an existing directory to give a user in another lab access to our data:

$ ls -ld example/
drwxrwx--- 2 my.NID its_p_sys_ur_kam-its 4096 Aug 10 08:14 example/

$ nfs4_getfacl example
A::OWNER@:rwaDxtTnNcCy
A:g:GROUP@:rwaDxtTnNcy
A::EVERYONE@:tcy

$ nfs4_setfacl -a A:df:other.person@ad.wsu.edu:RWX example/                                      

$ nfs4_getfacl example
A:fd:other.person@ad.wsu.edu:rwaDxtTnNcCy
A::OWNER@:rwaDxtTnNcCy
A:g:GROUP@:rwaDxtTnNcy
A::EVERYONE@:tcy

If we provide standard permissions of RWX (must be capitalized) nfs4_setfacl will convert it into the ACL format automatically. Initially the directory was only accessible by the user owner and members of the lab group. We added another user to be able to read and write data within the directory. We also included the inheritance flags ‘f’ and ‘d’ so this ACL entry will be added to any new files and directories that are created. To the the same with a group instead of a single user, we also include the ‘g’ flag:

$ nfs4_setfacl -a A:dfg:its_p_sys_us_kam-cahnrs@ad.wsu.edu:RWX example/                         

$ nfs4_getfacl example
A:fdg:its_p_sys_us_kam-cahnrs@ad.wsu.edu:rwaDxtTnNcCy
A:fd:other.person@ad.wsu.edu:rwaDxtTnNcCy
A::OWNER@:rwaDxtTnNcCy
A:g:GROUP@:rwaDxtTnNcy
A::EVERYONE@:tcy

Run man nfs4_acl to see further documentation on ACLs. Note that you will need to specify users and groups as whateverName@ad.wsu.edu.