Getting Started with SQL for Data Science: 4 Things you need to master
What is SQL?
Structured Query Language (SQL) is a standard database language used to create, maintain and retrieve relational databases. As a data scientist, you will find many instances where you will be required to work with relational databases (data stores in tables like excel). You will in a great way need SQL to be able to work with such databases. This article is a beginner’s guide and seeks to give a basic overview of how one can get started with SQL by explaining four features that the beginner needs to master i.e. creating a database, creating a table, inserting data into a table and using select statement.
A Little bit of History
For the lovers of history, this is an interesting section but if you are not a fun of history, feel free to skip this part. SQL was developed in 1972 by Raymond Boyce and Donald Chamberlin. This happened after Edgar F. Codd, an IBM researcher and Oxford Mathematician published a paper, “A Relational Model of Data for Large Shared Data Banks” in 1970. In his research paper, Codd proposed that all databases be represented in the form of relations. This was a significantly influential general theory of data management which became a foundation for Relational Databases Management Systems (RDBMS). It was upon this theory that Chamberlin and Boyce developed Structured English Query Language (SEQUEL) to interact with IBM System R database. Years later, in 1979, there was a trademark issue with the firm hence the name was changed from SEQUEL to SQL.
MySQL is one of the most common SQL database management systems available. It was developed by Oracle. It is a free open source version of SQL.
For download and guidelines on installation on Windows, Linux or Mac OS, click here.
Types of SQL Commands
SQL commands are categorized depending on their functionalities.
a. Data Definition Language (DDL)
These commands are used to define the structure of a database or its objects. They are used to create and make changes (modify) the physical structure of tables in a database. They include; CREATE, DROP, ALTER, RENAME and TRUNCATE.
b. Data Manipulation Language (DML)
These commands are used for data manipulation inside tables and databases. They include; SELECT, INSERT, UPDATE, DELETE, LOCK and MERGE.
c. Data Control Language (DCL)
These are used to give rights, permissions and other controls of the database system. They help in providing access to users and specifying their roles accordingly. These are; GRANT and REVOKE.
d. Data Query Language (DQL)
This command is used to retrieve and fetch data from databases/tables. It is often used in combination with other clauses in SQL. The only DQL command is SELECT.
1. Creating a Database
To get started with SQL, let us now dive into some basic programming using MySQL. I am using Windows OS and MySQL 8.0 Command Line Client to write my codes.
One of the things you need to master while getting started with SQL is creating a database. SQL stores data in databases. This is the beginning point while working with SQL. You can use a database to store as many tables as you need.
Let us create a database called employees as shown below. Note that at the end of every SQL query there must be semi-colon (;) before you run it.
The statement below the query (Query OK, …) shows that we successfully created the database. We can check the database by using the show databases query as shown below. This query shows all the databases in the MySQL server.
From the figure above, you can see that employees is one of the databases listed.
2. Creating a table
The second feature you need to master is creating tables. Creating a table is very important since all data in relational databases exist in tables. Tables contain columns and rows. Before creating a table, you must select the database into which you want to create the table.
You can then proceed and create a table as follows.
We have created a table called JobDetails. We can use show tables to view a list of all tables created in our database employees.
From the figure above, there is only one table in the database employees. When creating a table, you must specify the columns and their respective data types. For example, Name has a string data types varchar(255) while Salary has a numeric data type int (16). For more details about MySQL data types, you can check this documentation.
3. Inserting data into a table
After that, the next feature you need to master as a data scientist in working with SQL is inserting data into tables. As a data scientist, in some cases you will need to insert data into tables while on other instances, you will load formatted data stored in excel sheets into an existing table.
Now that we have successfully created a table, let us insert some hypothetical data into the table.
4. Using SELECT Statement
Finally, there is the SELECT statement. After creating a table with data, we can use SELECT statement to read/show the data in the table as follows;
select * – means select all. You can also specify selection by stating the column you want to show. For example, select Department will return data in the Department column only. The select statement in combination with other clauses can be used to query and manipulate tables/databases. A good mastery of this feature provides a firm foundation in using SQL.
In conclusion, SQL is very simple and easy to learn. If you are getting started with SQL, I suggest that you take time to play around with it by creating databases, creating tables and inserting data. You can also use select statement to query the data. These three features are very important starting points for a beginner in SQL.