... | @@ -5,7 +5,6 @@ |
... | @@ -5,7 +5,6 @@ |
|
The purpose of this workshop is to introduce SQL commands and to provide examples of how to leverage the power of databases to make storing and analyzing data easier.
|
|
The purpose of this workshop is to introduce SQL commands and to provide examples of how to leverage the power of databases to make storing and analyzing data easier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Required Software
|
|
## Required Software
|
|
|
|
|
|
This workshop uses SQLite3 as an example implementation of SQL. This workshop will use SQLiteStudio as an easy-to-use graphical interface for SQLite3. SQLiteStudio can be found [here](https://sqlitestudio.pl/index.rvt). It is available as an installed package or as a portable application that does not require installation. Alternatively, command line only utilities for SQLite can be found [here](https://www.sqlite.org/index.html).
|
|
This workshop uses SQLite3 as an example implementation of SQL. This workshop will use SQLiteStudio as an easy-to-use graphical interface for SQLite3. SQLiteStudio can be found [here](https://sqlitestudio.pl/index.rvt). It is available as an installed package or as a portable application that does not require installation. Alternatively, command line only utilities for SQLite can be found [here](https://www.sqlite.org/index.html).
|
... | @@ -122,13 +121,13 @@ The WHERE clause can use several different operators: |
... | @@ -122,13 +121,13 @@ The WHERE clause can use several different operators: |
|
|
|
|
|
When using strings in the WHERE clause, enclose the strings in either single or double quotation marks.
|
|
When using strings in the WHERE clause, enclose the strings in either single or double quotation marks.
|
|
|
|
|
|
Write a SQL command to display sequences whose length is greater than 800? How many are there?
|
|
> **Write a SQL command to display sequences whose length is greater than 800? How many are there?**
|
|
|
|
|
|
##### Combining multiple clauses
|
|
##### Combining multiple clauses
|
|
|
|
|
|
The WHERE clause can also include Boolean logic with AND and OR operations. Parenthesis can also be used to combine different logic operations.
|
|
The WHERE clause can also include Boolean logic with AND and OR operations. Parenthesis can also be used to combine different logic operations.
|
|
|
|
|
|
Write a SQL command to display tmRNA sequences less than 200 nucleotides long.
|
|
>**Write a SQL command to display tmRNA sequences less than 200 nucleotides long.**
|
|
|
|
|
|
#### LIMIT clause
|
|
#### LIMIT clause
|
|
|
|
|
... | @@ -162,7 +161,7 @@ SELECT avg(Length) |
... | @@ -162,7 +161,7 @@ SELECT avg(Length) |
|
FROM Sequences;
|
|
FROM Sequences;
|
|
```
|
|
```
|
|
|
|
|
|
Can you write an SQL command that will return the average Sensitivity and PPV for all the scores in the Accuracy table?
|
|
> **Can you write an SQL command that will return the average Sensitivity and PPV for all the scores in the Accuracy table?**
|
|
|
|
|
|
#### Naming Columns
|
|
#### Naming Columns
|
|
|
|
|
... | @@ -185,7 +184,7 @@ FROM Sequences |
... | @@ -185,7 +184,7 @@ FROM Sequences |
|
GROUP BY Family;
|
|
GROUP BY Family;
|
|
```
|
|
```
|
|
|
|
|
|
Write an SQL command to calculate the average Sensitivity and PPV for each parameter set from the Accuracy table. Sort based on average Sensitivity and print out the 10 best performing parameter sets.
|
|
>**Write an SQL command to calculate the average Sensitivity and PPV for each parameter set from the Accuracy table. Sort based on average Sensitivity and print out the 10 best performing parameter sets.**
|
|
|
|
|
|
It might be obvious, but the 'GROUP BY' clause will only group data that is in the table. It does not check for completeness of the data. You should verify that all the data you expect is in the table. One way to do that is to use the count function to make sure that every average uses the same amount of data.
|
|
It might be obvious, but the 'GROUP BY' clause will only group data that is in the table. It does not check for completeness of the data. You should verify that all the data you expect is in the table. One way to do that is to use the count function to make sure that every average uses the same amount of data.
|
|
|
|
|
... | @@ -193,7 +192,7 @@ It might be obvious, but the 'GROUP BY' clause will only group data that is in t |
... | @@ -193,7 +192,7 @@ It might be obvious, but the 'GROUP BY' clause will only group data that is in t |
|
|
|
|
|
You can filter the groups shown by using the HAVING clause. This clause is used similarly to the WHERE clause to filter the groups.
|
|
You can filter the groups shown by using the HAVING clause. This clause is used similarly to the WHERE clause to filter the groups.
|
|
|
|
|
|
Use the HAVING clause to verify that no parameter set is missing any sequences.
|
|
>**Use the HAVING clause to verify that no parameter set is missing any sequences.**
|
|
|
|
|
|
|
|
|
|
|
|
|
... | @@ -213,9 +212,9 @@ FROM Accuracy INNER JOIN Sequences ON Accuracy.Sequence = Sequences.Sequence |
... | @@ -213,9 +212,9 @@ FROM Accuracy INNER JOIN Sequences ON Accuracy.Sequence = Sequences.Sequence |
|
ORDER BY Parameter_Set;
|
|
ORDER BY Parameter_Set;
|
|
```
|
|
```
|
|
|
|
|
|
Write a SQL command that performs an INNER JOIN between Accuracy and Sequence. Then calculate the average Sensitivity and PPV for each RNA family across all parameter sets, as well as the average length for each family.
|
|
>**Write a SQL command that performs an INNER JOIN between Accuracy and Sequence. Then calculate the average Sensitivity and PPV for each RNA family across all parameter sets, as well as the average length for each family.**
|
|
|
|
|
|
Write a SQL command that performs an INNER JOIN between Accuracy and Sequence. Then calculate the average Sensitivity and PPV for 5S rRNA for each parameter set.
|
|
>**Write a SQL command that performs an INNER JOIN between Accuracy and Sequence. Then calculate the average Sensitivity and PPV for 5S rRNA for each parameter set.**
|
|
|
|
|
|
|
|
|
|
|
|
|
... | | ... | |