|
|
# Bash Scripts
|
|
|
|
|
|
## Introduction
|
|
|
One of the many strengths of Linux is the ability to combine multiple simple programs to perform a desired task. Howevern, you will often find yourself running the same set of commands with different input data. For example, with RNAstructure command line programs, many analysis steps are split between several programs. For instance, the partition function can be calculated with the program **_partition_**. However the output of **_partition_** is simply a binary file that contains the dynamic programming arrays that were filled by the program.
|
|
|
|
|
|
That binary file can be used to calculate base pair probabilities using the program **_ProbabilityPlot_**. Alternatively, it can be used to predict a secondary structure using the programs **_ProbablePair_**, **_MaxExpect_**, or **_Probknot_**. Or it can be used to generate a sampling of the thermodynamic ensemble using the program **_stochastic_**.
|
|
|
|
|
|
Shell scripts are simply a way to chain together multiple commands that can often otherwise be typed into the terminal. Additionally, shell scripting also provides the ability to include program flow control (loops, if statements) to increase the sophistication of the script.
|
|
|
|
|
|
There are multiple shells available in Linux that have similar scripting syntax. Bash is one of the most commonly used shells.
|
|
|
|
|
|
## Components of a shell script
|
|
|
There are three main components to a shell script, illustrated in this simple shell script.
|
|
|
```bash
|
|
|
#!/bin/bash
|
|
|
# echo is a program prints a string to the output
|
|
|
echo Hello World
|
|
|
```
|
|
|
The first line of a shell script begins with a **shebang** (`#!`). This line specifies a shell intrepreter that should be used. In this case, the bash shell is used. Other lines that begin with a `#` are comments and have no impact on the output of the shell script. They are added to help you understand the logic of the script. The other lines are commands that determines that actual computations that will be performed.
|
|
|
|
|
|
## Executing a script
|
|
|
If we were to save the above script in a text file named _hello.sh_ and run the script, we first have make sure that the file is executable. You can add execution permissions to the file with the command:
|
|
|
```
|
|
|
chmod o+x hello.sh
|
|
|
```
|
|
|
This command gives execution privileges to the owner of the file. Afterwards, the script can be run with the command:
|
|
|
```
|
|
|
./hello.sh
|
|
|
```
|
|
|
In this case, the `./` in front of _hello.sh_ tells the shell to run the commands in the file.
|
|
|
|
|
|
|
|
|
## Variables
|
|
|
Often it is convenient to store information in variables. To store information in a variable, simply use the construction `variable=value`. To use the variable, use `$variable`. For example:
|
|
|
|
|
|
```bash
|
|
|
#!/bin/bash
|
|
|
# Store a string in a variable
|
|
|
my_string="Hello World"
|
|
|
|
|
|
# echo is a program prints a string to the output
|
|
|
echo $my_string
|
|
|
```
|
|
|
|
|
|
A more practical example would be to string together multiple programs:
|
|
|
```bash
|
|
|
#!/bin/bash
|
|
|
# This script runs a partition function, followed by the prediction of
|
|
|
# a maximum expected accuracy secondary structure
|
|
|
|
|
|
# Define a path to a sequence file for the input
|
|
|
seq_file="RA4800.seq"
|
|
|
|
|
|
# Define a path to a structure file for the output
|
|
|
ct_file="RA4800.ct"
|
|
|
|
|
|
# Calculate the partition function, the output will be saved to foo.pfs
|
|
|
partition $seq_file foo.pfs
|
|
|
|
|
|
# Predict a secondary structure from the pfs file
|
|
|
MaxExpect foo.pfs $ct_file
|
|
|
|
|
|
# foo.pfs is no longer needed, so delete it
|
|
|
rm foo.pfs
|
|
|
```
|
|
|
|
|
|
## Arguments
|
|
|
The previous script example is useful to execute multiple programs with a single command. However, the script needs to be modified in order to change the input and output files. It would be more convenient to tell the script at execution time what paths to use. To do this, you can access arguments from within the shell script. An argument can be thought of as a word in the command used to run a program. For example `partition $seq_file foo.pfs` provides three arguments in the command. The 0th argument is a program name, in this case `partition`. The 1st argument is `$seq_file` and the 2nd is `foo.pfs`. In bash, the arguments are stored in variables whose names are simply the number of the argument. For example:
|
|
|
|
|
|
```bash
|
|
|
#!/bin/bash
|
|
|
# This script runs a partition function, followed by the prediction of
|
|
|
# a maximum expected accuracy secondary structure
|
|
|
|
|
|
# Define a path to a sequence file for the input
|
|
|
seq_file=$1
|
|
|
|
|
|
# Define a path to a structure file for the output
|
|
|
ct_file=$2
|
|
|
|
|
|
# Calculate the partition function, the output will be saved to foo.pfs
|
|
|
partition $seq_file foo.pfs
|
|
|
|
|
|
# Predict a secondary structure from the pfs file
|
|
|
MaxExpect foo.pfs $ct_file
|
|
|
|
|
|
# foo.pfs is no longer needed, so delete it
|
|
|
rm foo.pfs
|
|
|
```
|
|
|
Now, the input sequence file and output structure file can be specified when you execute the script.
|
|
|
|
|
|
## Program Flow
|
|
|
|