Ever heard of SQL? You may have heard about it in the context of data analysis but never thought it would apply to you as a marketer. Or, you may have thought, “That’s for the advanced data users. I could never do that.”
Well, you couldn’t be more wrong. The most successful marketers are data-driven, and one of the most important parts of being data-driven is collecting data from databases quickly. SQL is the most popular tool out there for doing just that.
If your company already stores data in a database, you may need to learn SQL to access the data. But don‘t worry — you’re in the right place to start. Let’s jump right in.
How to Query a SQL Database
- Ensure you have a database management application (ex. MySQL Workbench, Sequel Pro).
- If not, download a database management application and work with your company to connect your database.
- Understand your database and its hierarchy.
- Find out which fields are in your tables.
- Begin writing an SQL query to pull your desired data.
What is SQL?
SQL is a programming language that allows you to manage and manipulate relational databases. Typically pronounced “sequel,” SQL is an essential tool for companies that need to regularly access and analyze large data sets. SQL allows you to retrieve specific data with a query, update existing data, insert new data, delete data, and much more.
With SQL, you don’t need to download and open a huge Excel spreadsheet to get the answers you seek.
You can ask questions like “Which customers purchased a red jumpsuit in the past six months?” and SQL fetches the data from your database and returns it to you without you needing to manually sift through a CSV.
Why use SQL?
SQL is a useful tool for companies that utilize data (hint, most of them do). Here are some examples and reasons why you might want to hop on the SQL train.
- Your data is safer in SQL since it is more difficult for users to accidentally delete it or corrupt it compared to an Excel sheet
- SQL allows you to manage datasets exceeding thousands of records
- SQL allows multiple users to access the same database seamlessly
- Role-based authorizations allow you to control the visibility of sensitive data
- SQL facilitates powerful data visualization
- SQL enforces data integrity so your data is always accurate and consistent
The SQL Database Hierarchy
An SQL database is a relational database, which means the data is structured in tables that are related to one another based on predefined relationships.
Information in an SQL database is structured hierarchically, similar to a family tree, meaning that items at the top level have a broader scope and branch downward into multiple, more specific sub-entities.
In the context of SQL, the top level is the database server, also called the instance. Your instance is where all of your data is stored. Within an instance, there can be multiple databases, each containing data organized based on some broad categorization.
A database is broken down into tables. The table is where the actual data lives. Once you’re at the table level, data is organized by columns and rows and housed within fields, almost exactly like an Excel spreadsheet.
Let‘s pretend we’re working with multiple databases about people in the United States. Entering the query “SHOW DATABASES;” reveals each database in your system, including one titled NewEngland.
A database contains tables, and within those tables is your data.
If we use the query “SHOW TABLES in NewEngland;”, the result is tables for each state in New England:
people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Finally, you need to find out which fields are in the tables. Fields are the specific pieces of data that you can pull from your database.
For example, if you want to pull someone’s address, the field name may not just be “address” — it may be separated into address_city, address_state, address_zip. To figure this out, use the query “Describe people_massachusetts;”.
This provides a list of all the data you can pull using SQL.
Let’s do a quick review of the hierarchy using our New England example:
- Our database is NewEngland.
- Our tables within that database are people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
- Our fields within the people_massachusetts table include: address_city, address_state, address_zip, hair_color, age, first_name, and last_name.
Now, let’s write some simple SQL queries to pull data from our NewEngland database.
How to Write SQL Queries
Before we begin, ensure you have a database management application allowing you to pull data from your database. Some options include MySQL or Sequel Pro.
Start by downloading one of these options, then talk to your company’s IT department about how to connect to your database. Your option will depend on your product’s back end, so check with your product team to ensure you select the correct one.
To learn how to write an SQL query, let’s use the following question:
Who are the people with red hair in Massachusetts who were born in 2003?
Using the SELECT command
SELECT chooses the fields that you want displayed in your chart. This is the specific piece of information that you want to pull from your database. In the example above, we want to find the people who fit the rest of the criteria.
Query 1:
SELECT
first_name,
last_name
;
Using the FROM command
FROM pinpoints the table that you want to pull the data from.
In the earlier section, we learned that there were six tables for each of the six states in New England: people_connecticut, people_maine, people_massachusetts, people_newhampshire, people_rhodeisland, and people_vermont.
Because we‘re looking for people in Massachusetts specifically, we’ll pull data from that specific table.
Here is our SQL query:
SELECT
first_name,
last_name
FROM
people_massachusetts
;
Using the WHERE command
WHERE allows you to filter a query to be more specific. In our example, we want to filter our query to include only people with red hair who were born in 2003. Let’s start with the red hair filter.
Query 2:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
;
hair_color could have been part of your initial SELECT statement if you wanted to look at all of the people in Massachusetts and their hair color. But if you want to filter to see only people with red hair, you can do so with a WHERE statement.
Using the BETWEEN command
Besides equals (=), BETWEEN is another operator you can use for conditional queries. A BETWEEN statement is true for values that fall between the specified minimum and maximum values.
In our case, we can use BETWEEN to pull records from a specific year, like 2003.
Query 3:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
;
Using the AND command
AND allows you to add additional criteria to your WHERE statement. Remember, we want to filter by people who had red hair in addition to people who were born in 2003. Since our WHERE statement is taken up by the red hair criteria, how can we filter by a specific birth year as well?
That‘s where the AND statement comes in. In this case, the AND statement is a date property — but it doesn’t necessarily have to be. (Note: Check the format of your dates with your product team to ensure they are correct.)
Query 4:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
;
Using the OR command
OR can also be used with a WHERE statement. With AND, both conditions must be true to appear in results (e.g., hair color must be red and must be born in 2003). With OR, either condition must be true to appear in results (e.g., hair color must be red or must be born in 2003).
Here’s what an OR statement looks like in action.
Query 5:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
OR
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
;
Using the NOT command
NOT is used in a WHERE statement to display values in which the specified condition is untrue. If we wanted to pull up all Massachusetts residents without red hair, we can use the following query.
Query 6:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE NOT
hair_color = ‘red’
;
Using the ORDER BY command
Calculations and organization also can be done within a query. That‘s where the ORDER BY and GROUP BY functions come in. First, we’ll look at our SQL queries with the ORDER BY and then GROUP BY functions. Then, we’ll briefly examine the difference between the two.
An ORDER BY clause allows you to sort by any of the fields that you have specified in the SELECT statement. In this case, let’s order by last name.
Query 7:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
ORDER BY
last_name
;
Using the GROUP BY command
GROUP BY is similar to ORDER BY but aggregates similar data. For example, if you have any duplicates in your data, you can use GROUP BY to count the number of duplicates in your fields.
Query 8:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
GROUP BY
last_name
;
ORDER BY VS. GROUP BY
To show the difference between an ORDER BY statement and a GROUP BY statement, let‘s briefly step outside our Massachusetts example to look at a very simple dataset. Below is a list of four employees’ ID numbers and names.
If we were to use an ORDER BY statement on this list, the names of the employees would get sorted in alphabetical order. The result would look like this:
If we used a GROUP BY statement instead, the employees would be counted based on the number of times they appeared in the initial table. Note that Peter appeared twice in the initial table, so the result would look like this:
With me so far? Okay, let‘s return to the SQL query we’ve been creating about red-haired Massachusetts people born in 2003.
Using the LIMIT Function
It may take a long time to run your queries, depending on the amount of data you have in your database. This can be frustrating, especially if you’ve made an error in your query and now need to wait before continuing. If you want to test a query, the LIMIT function lets you limit the number of results you get.
For example, if we suspect thousands of people have red hair in Massachusetts, we may want to test out our query using LIMIT before we run it in full to ensure we‘re getting the information we want. Let’s say, for instance, we only want to see the first 100 people in our result.
Query 8:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
ORDER BY
last_name
LIMIT
100
;
Using the INSERT INTO command
In addition to retrieving information from a relational database, SQL can also be used to modify the contents of a database.
Of course, you’ll need permission to change your company’s data. But, in case you’re ever in charge of managing the contents of a database, we’ll share some queries you should know.
First is the INSERT INTO statement for putting new values into your database.
If we want to add a new person to the Massachusetts table, we can do so by first providing the name of the table we want to modify and the fields within the table we want to add to.
Next, we write VALUE with each respective value we want to add.
Query 9:
INSERT INTO
people_massachusetts (address_city, address_state, address_zip, hair_color, age, first_name, last_name)
VALUES
(Cambridge, Massachusetts, 02139, blonde, 32, Jane, Doe)
;
Alternatively, if you are adding a value to every field in the table, you don’t need to specify fields. The values will be added to columns in the order they are listed in the query.
Query 10:
INSERT INTO
people_massachusetts
VALUES
(Cambridge, Massachusetts, 02139, blonde, 32, Jane, Doe)
;
If you only want to add values to specific fields, you must specify these fields. Say we only want to insert a record with first_name, last_name, and address_state — we can use the following query.
Query 11:
INSERT INTO
people_massachusetts (first_name, last_name, address_state)
VALUES
(Jane, Doe, Massachusetts)
;
Using the UPDATE Command
You can use UPDATE if you want to replace existing values in your database with different ones. What if, for example, someone is recorded in the database as having red hair when they actually have brown hair? We can update this record with UPDATE and WHERE statements.
Query 12:
UPDATE
people_massachusetts
SET
hair_color = ‘brown’
WHERE
first_name = ‘Jane’
AND
last_name = ‘Doe’
;
Or, say there’s a problem in your table where some values for “address_state” appear as “Massachusetts” and others appear as “MA.” To change all instances of “MA” to “Massachusetts,” we can use a simple query and update multiple records simultaneously.
Query 13:
UPDATE
people_massachusetts
SET
address_state = ‘Massachusetts’
WHERE
address_state = MA
;
Be careful when using UPDATE. If you don’t specify which records to change with a WHERE statement, you’ll change all values in the table.
Using the DELETE command
DELETE removes records from your table. Like with UPDATE, be sure to include a WHERE statement so you don’t accidentally delete your entire table.
Or, if we happen to find several records in our people_massachusetts table who actually lived in Maine, we can delete these entries quickly by targeting the address_state field.
Query 13:
DELETE FROM
people_massachusetts
WHERE
address_state = ‘maine’
;
Bonus: Advanced SQL Tips
Now that you’ve learned how to create a simple SQL query, let’s discuss some other tricks that you can use to take your queries up a notch, starting with the asterisk.
* (asterisk)
When you add an asterisk character to your SQL query, it tells the query that you want to include all the columns of data in your results.
In the Massachusetts example we‘ve been using, we’ve only had two column names: first_name and last_name. But let’s say we had 15 columns of data that we want to see in our results — it would be a pain to type all 15 column names in the SELECT statement. Instead, if you replace the names of those columns with an asterisk, the query will know to pull all of the columns into the results.
Here’s what the SQL query would look like.
Query 13:
SELECT
*
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
ORDER BY
last_name
LIMIT
100
;
% (percent symbol)
The percent symbol is a wildcard character, meaning it can represent one or more characters in a database value. Wildcard characters are helpful for locating records that share common characters. They are typically used with the LIKE operator to find a pattern in the data.
For instance, if we wanted to get the names of every person in our table whose zip code begins with “02”, we can write the following query.
Query 14:
SELECT
first_name,
last_name
WHERE
address_zip LIKE ‘02%’
;
Here, “%” stands in for any group of digits that follow “02”, so this query turns up any record with a value for address_zip that begins with “02”.
LAST 30 DAYS
Once I started using SQL regularly, I found that one of my go-to queries involved finding which people took an action or fulfilled a certain set of criteria within the last 30 days.
Let’s pretend today is December 1, 2021. You could create these parameters by making the birth_date span between November 1, 2021, and November 30, 2021. That SQL query would look like this:
Query 15:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2021-11-01’ AND ‘2021-11-30’
ORDER BY
last_name
LIMIT
100
;
But that would require considering which dates cover the last 30 days, and you’d have to constantly update this query.
Instead, to make the dates automatically span the last 30 days no matter which day it is, you can type this under AND: birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))
(Note: You’ll want to double-check this syntax with your product team because it may differ based on the software you use to pull your SQL queries.)
Your full SQL query would, therefore, look as follows.
Query 16:
SELECT
first_name,
last_name
FROM
people_massachusetts
WHERE
hair_color = ‘red’
AND
birth_date >= (DATE_SUB(CURDATE(),INTERVAL 30))
ORDER BY
last_name
LIMIT
100
;
COUNT
In some cases, you may want to count the number of times that a criterion of a field appears. For example, let‘s say you want to count the number of times the different hair colors appear for the people you are tallying up from Massachusetts.
In this case, COUNT will come in handy, so you don’t have to manually add up the number of people with different hair colors or export that information to Excel.
Here’s what that SQL query would look like:
Query 17:
SELECT
hair_color,
COUNT(hair_color)
FROM
people_massachusetts
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
GROUP BY
hair_color
;
AVG
AVG calculates the average of an attribute in the results of your query, excluding NULL values (empty). In our example, we could use AVG to calculate the average age of Massachusetts residents in our query.
Here’s what our SQL query could look like:
Query 18:
SELECT
AVG(age)
FROM
people_massachusetts
;
SUM
SUM is another simple calculation you can do in SQL. It calculates the total value of all attributes from your query. So, if we wanted to add up all the ages of Massachusetts residents, we can use the following query.
Query 19:
SELECT
SUM(age)
FROM
people_massachusetts
;
Using MIN and MAX
MIN and MAX are two SQL functions that give you the smallest and largest values of a given field. We can use it to identify the oldest and youngest members of our Massachusetts table:
The following query will give us the record of the youngest people.
Query 20:
SELECT
MIN(age)
FROM
people_massachusetts
;
And this query gives us the oldest:
Query 21:
SELECT
MAX(age)
FROM
people_massachusetts
;
Using the JOIN command
There may be a time when you need to access information from two different tables in one SQL query. In SQL, you can use a JOIN clause to do this.
(For those familiar with Excel formulas, this is similar to using the VLOOKUP formula when you need to combine information from two different sheets in Excel.)
Let‘s say we have one table that has data on all Massachusetts residents’ user IDs and birthdates. In addition, we have an entirely separate table containing all Massachusetts residents’ user IDs and their hair color.
If we want to determine the hair color of Massachusetts residents born in 2003, we’d need to access information from both tables and combine them. This works because both tables share a matching column: user IDs.
Our SELECT statement will also change slightly because we‘re calling out fields from two different tables. Instead of just listing out the fields we want to include in our results, we’ll need to specify which table they’re coming from.
(Note: The asterisk function may be useful here so your query includes both tables in your results.)
To specify a field from a specific table, all we have to do is combine the table‘s name with the field’s name. For example, our SELECT statement would say “table.field” — with the period separating the table and field names.
We’re also assuming a few things in this case:
- The Massachusetts birthdate table includes the following fields: first_name, last_name, user_id, birthdate
- The Massachusetts hair color table includes the following fields: user_id, hair_color
Your SQL query would look as follows.
Query 21:
SELECT
birthdate_massachusetts.first_name,
birthdate_massachusetts.last_name
FROM
birthdate_massachusetts JOIN haircolor_massachusetts USING (user_id)
WHERE
hair_color = ‘red’
AND
birth_date BETWEEN ‘2003-01-01’ AND ‘2003-12-31’
ORDER BY
last_name
;
This query would join the two tables using the field “user_id” which appears in both the birthdate_massachusetts table and the haircolor_massachusetts table. You can then see a table of people born in 2003 with red hair.
Using a CASE statement
Use a CASE statement when you want to return different results to your query based on which condition is met. Conditions are evaluated in order. The corresponding result is returned once a condition is met, and all following conditions are skipped over.
You can include an ELSE condition at the end if no conditions are met. Without an ELSE, the query will return NULL if no conditions are met.
Here’s an example of using CASE to return a string based on the query.
Query 22:
SELECT
first_name,
last_name
FROM
people_massachusetts
CASE
WHEN hair_color = ‘brown’ THEN ‘This person has brown hair.’
WHEN hair_color = ‘blonde’ THEN ‘This person has blonde hair.’
WHEN hair_color = ‘red’ THEN ‘This person has red hair.’
ELSE ‘Hair color not known.’
END
;
Basic SQL Queries Marketers Should Know
Congratulations! You‘re ready to run your own SQL queries.
While there’s a lot more you can do with SQL, I hope you found this overview of the basics helpful so you can get your hands dirty.
With a strong foundation of the basics, you can navigate SQL better and work toward some of the more complex examples.
Editor’s note: This post was originally published in March 2015 and has been updated for comprehensiveness.
Credit: Source link