According to Google, there are currently more than 7,100 different languages spoken around the world. Surprising, isn't it? If we count all the computer programming languages that exist, there are more than 7,700 languages. Structured Query Language or SQL is one of the most widely spoken programming languages in the world. Do you speak SQL? Take a look at the 10 SQL tricks you should know!
As far as we know, the first language spoken by humans dates back to 3500 BC. The first programming language was conceived by Ada Lovelace in 1883. Structured Query Language or SQL was created in 1970 by two IBM researchers, Raymond Boyce and Donald Chamberlin.
With more than 50 years of history, SQL is currently in the top 10 of the most used programming languages around the world and probably the first in the world of data analytics.
It is a language that any data analyst or data scientist should know, since it is the basis for working with relational databases.
Structured Query Language or SQL is the programming language we use to communicate with databases. As its name suggests, the language allows data analysts to manipulate and extract data from a relational database, among many other things, and to perform queries on data sets.
In short, it is the language used to perform queries on a database.
SQL assumes that all data in a database are related and must be represented according to these relationships. For this reason, SQL is mainly used to work with relational databases. The most popular databases that use SQL are SQL Server, MySQL, Oracle and PostgreSQL.
The advantage of SQL, besides being one of the basic languages of data science, is that it is easy to read and understand. In fact, it is a language whose syntax is quite similar to that of English.
Another advantage of SQL is that it is a flexible and versatile language that offers many possibilities and ways of visualizing data to meet the specific demands of each user. SQL also helps us to ensure data integrity by making sure that the data is valid, consistent and error-free.
Example of a SQL command
One of the most basic functions we can perform in SQL is to join two independent tables using the 'Left Join' statement, which integrates all the records of the left table with the records of the right table:
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName
As we can see in this example, the SQL language is very similar to English. 'SELECT' means select, 'FROM' is from and 'ORDER BY' means sort by. Basically, it is a programming language that uses the same syntax as English, so learning it is not complicated.
Of all, 'SELECT' is the most basic SQL command and the one that anyone who wants to get started in this computer language should learn first.
In SQL, 'SELECT' is used to describe the set of data we want to get from a database. That is, it is the main query and the one we will probably use the most for everything. To make the query, we must specify which tables contain the data, the relationships between the data, the fields or calculations applied to the data, the criteria that the data must meet to be included in the selection and how we want to sort the data.
Beyond 'SELECT', some of the other most basic SQL commands are:
SQL Command | What does it do? |
SELECT | It shows us a list of those fields that contain data of interest according to our query. |
FROM | It shows us the tables that contain the fields of the 'SELECT' clause. |
WHERE | It specifies the field criteria the data must meet to be included in the selection. |
ORDER BY | It specifies how to sort the data. |
GROUP BY | In commands that include aggregated functions, 'Group by' shows us the fields that are not summarized in the SELECT clause. |
HAVING | In commands that include aggregated functions, it specifies the conditions that apply to the fields summarized in the 'SELECT' clause. |
A continuación listamos los que, según nuestra experiencia, son los 10 mejores consejos para trabajar con este lenguaje de programación.
Here are what, in our experience, are the 10 best tips for working with this programming language.
When naming our data tables and table columns, it is advisable to use names as simple, easy to understand and descriptive as possible. This will make our work much easier when performing queries and executing our commands.
It is more advisable to use descriptive names than to use short names that are not understood.
Although it is not mandatory because SQL does not distinguish between uppercase and lowercase letters, most data analysts use lowercase for tables and columns and uppercase for the rest.
This distinction between lowercase and uppercase makes queries easier to read and understand.
It is essential to follow the SQL execution order when writing our queries and commands. The established order of SQL provides better performance and avoids problems.
SQL execution order is:
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY
The organization of databases in tables facilitates the understanding of the data and their relationships. However, it is not advisable to use a large number of one- or two-column tables, since this data can probably be integrated into other tables.
Having fewer tables will simplify the work and avoid overloading the database.
Try to find a middle ground. Normalizing data is efficient, but in the right measure. However, creating extremely long tables with lots and lots of columns is not recommended either.
It is preferable that our tables do not reach 20 columns in order not to overload the database and to facilitate data retrieval and analysis.
In SQL, relationships between tables are created through keys. Keys are fields entered into the relational table and there are many ways to name them. However, it is important to be consistent in naming the keys. Otherwise, it will be more complex to create the relationships.
It is advisable to name the secondary key after the primary key. For example, if our primary key is labeled "id", we can call the secondary key «tablename_id».
Do not use dots in table, column and field names. In SQL dots are used to separate tables from fields. Continuing with the previous example, when you want to refer to another element, use the underscore: «tablename_id».
The data maintenance process and the data validation and verification process will be easier and faster if we get rid of the legacy columns and tables.
The inclusion of dates is one of the most complex SQL clauses. Therefore, good SQL practices advise to store days, months and years in the format of the 'datetime' field. It is easiest to store a single time zone to avoid complications when analyzing time series.
The data contained in a relational database comes from an external source. In this sense, it is important not to lose sight of the source of origin and to label the views accordingly, referencing the origin of the data.