Data demands are coming from all sides of the modern business organization. And data professionals are under enormous pressure to manage multiple, different sources of data — quickly.
To effectively query and manage a wide variety of datasets requires a baseline knowledge of structured query language (SQL), which is based upon two foundational functionalities – scalar functions and aggregate functions. But even the pros can use a primer once in a while, so we’ve put together a list of the most commonly used scalar and aggregate SQL functions and examples to help wrangle and manage the wild world of data.
Scalar Functions in SQL
Data professionals often need to manipulate individual data values to standardize the format of data in a dataset or to extract specific pieces of data from a larger string. Scalar functions can help with this manipulation.
For example, a data professional might use the UCASE function to display customer names in uppercase or the SUBSTRING function to extract a customer’s phone number from a string containing their contact information.
The CONCAT function can combine multiple strings together into a single string. Say you wanted to pull a customer’s full name, so you could put in an email addressed to them. You would use the CONCAT function to combine a customer’s first and last name like this:
SELECT CONCAT(first_name, ‘ ‘, last_name) FROM customers;
The UCASE function is used to convert a string to uppercase. This standardizes the format of data in a dataset, so it’s possible to compare values and search for data. A data professional might use the UCASE function so all customer names are always displayed in uppercase.
For the UCASE function, the example query would look like this:
SELECT UCASE(first_name) FROM customers;
The LCASE is the equal and opposite function of UCASE. LCASE converts a string to lowercase. For example, a data professional might use the LCASE function to make sure that customer emails are always displayed in lowercase. To do that, you’d only need to enter:
SELECT LCASE(email) FROM customers;
The SUBSTRING function is used to extract a portion of a string, making it possible to quickly extract specific pieces of data from a larger string. A data professional might use the SUBSTRING function to extract just a customer’s phone number from a string containing their phone number, address, and email. That would look like the following:
SELECT SUBSTRING(phone_number, 1, 10) FROM customers;
Like SUBSTRING, the MID function is used to pull just part of a string and does it by specifying the start and length.
MID was created for VB6 (Visual Basic 6.0), but it’s still commonly used in SQL.
Both functions can be used to quickly extract specific pieces of data from a larger string. To use MID to pull customer addresses, you would enter the following:
SELECT MID(address, 10, 15) FROM customers;
The REPLACE function is used to replace all occurrences of a substring with another string — which is useful when you need to clean up data that has been poorly formatted or entered in a non-standard way.
Say a customer address includes an abbreviation that you need to replace with the full name. You can use the REPLACE function to quickly replace all occurrences of the abbreviation with the full name. It would look like this:
SELECT REPLACE(address, ‘St.’, ‘Street’) FROM customers;
Aggregate Functions in SQL
Aggregate functions can quickly calculate the aggregate value of a set of values — like the sum of a set of numbers or the average of a set of numbers.
Aggregate functions are useful for determining the total value of a dataset, such as the total revenue of a company or the total cost of goods sold.
The COUNT function is often used to determine the number of records in a table, as well as the number of records that meet certain criteria. With COUNT, it’s possible to determine the size and scope of a dataset and check whether it’s complete and accurate.
A data professional might use the COUNT function to determine the number of customers in a customer table or the number of products in a product table. To count the number of customers, you would enter the following:
SELECT COUNT(*) FROM customers;
The SUM function is used to calculate the sum of a set of values. A data professional might use the SUM function to calculate the total revenue of a company or the total cost of goods sold. You would enter:
SELECT SUM(revenue) FROM orders;
The AVG function is used to calculate the average in a set of values. It’s a simple but powerful function that comes in handy. A company might need to look up their average revenue or the average cost of goods sold with the following:
SELECT AVG(cost_of_goods_sold) FROM orders;
The MIN function is used to find the minimum value in a set of values, like the lowest annual revenue in the history of an organization or the lowest cost of goods sold with this query:
SELECT MIN(revenue) FROM orders;
The MAX function is used to find the maximum value in a set of values, such as the highest revenue of a company or the highest cost of goods sold. That query would look like this:
SELECT MAX(cost_of_goods_sold) FROM orders;
The GROUP_CONCAT function is used to combine the values of a field or expression across a group of records (rows) into a single string.
Unlike the CONCAT function, which combines two or more strings into a single string, the GROUP_CONCAT function combines the values of a field or expression across a group of records (rows) into a single string with the values comma separated. This enables a data professional to quickly combine multiple pieces of data from a large dataset into a single value.
A data professional might use the GROUP_CONCAT function to combine the names of all customers in a customer table into a single string or combine the names of all products in a product table into a single string. To combine all products into a single string, enter:
SELECT GROUP_CONCAT(name) FROM products;
Data Management with SQL Functions
To truly master SQL database management, you’ll need a database integration platform. Without an integration platform, managing data from multiple sources can be time-consuming and error-prone and lead to inconsistent data across systems.
Data integration is critical for efficiently integrating data from various sources, automating the ETL processes, and ensuring that the data is up to date and consistent across all systems.
If you are looking for more guidance on data integration, check out The Ultimate Guide to Data Integration. This comprehensive e-book covers the basics of data integration and provides practical advice for data professionals.