Using substring() to get parts of a string in PostgreSQL

The substring function in PostgreSQL is a versatile tool for extracting specific portions of a string based on character positions. This function is especially useful when you know the exact position of the substring you want to extract from a larger string.

Here's an example to illustrate the use of substring with both from and for parameters. Imagine you have a table named products with a column product_code that contains codes in the format "Category-XXXX", where "XXXX" represents a 4-digit numerical code and "Category" is a text identifier for the product category. If you want to extract just the numerical part of these codes, you could write the following SQL query:


SELECT substring(product_code from 10 for 4) AS numerical_code

FROM products;

In this query, substring(product_code from 10 for 4) extracts a substring starting from the 10th character of the product_code string (assuming the "Category" identifier is consistently 9 characters long followed by a hyphen) and extracts 4 characters from that starting point, which corresponds to the "XXXX" numerical code part of the product_code.

The substring function can be utilized with various parameters, but most commonly, it is used with the from and for keywords to specify the starting position and the length of the substring to be extracted, respectively.

  • The from parameter indicates the position at which the extraction should begin, with the first character in the string being position 1 (not 0).
  • The for parameter specifies the number of characters to extract from the starting point. If the for parameter is omitted, the substring from the from position to the end of the string is returned.

You can use the substring function without the for parameter if you want to extract all characters starting from a certain position to the end of the string. For example, to get everything after the hyphen in the product_code, assuming the hyphen is always at the 9th position, you could adjust the query like so:


SELECT substring(product_code from 10) AS code_suffix

FROM products;

Negative values - counting from the end of the string

Using negative values in the from parameter of the substring function in PostgreSQL allows you to specify the starting position for extraction from the end of the string, rather than the beginning. This feature is particularly useful when you want to extract a substring starting a certain number of characters back from the end of the string, but the length of the string is not consistent across records or is unknown.

When a negative value is used with from, PostgreSQL counts backward from the end of the string to determine where the substring extraction should start. For example, a from value of -4 means the extraction starts at the fourth character from the end of the string.

Here is an example that demonstrates the use of a negative from value in the substring function. Suppose you have a table named messages with a column msg_text that contains various text messages, and you want to extract the last three characters from each message (perhaps to analyze message endings or to find common sign-offs). You could write the following SQL query:


SELECT substring(msg_text from -3) AS ending

FROM messages;

In this query, substring(msg_text from -3) extracts the last three characters of each string in the msg_text column. This operation is performed by starting the substring extraction three characters from the end of the string and continuing to the end of the string since the for parameter is not specified.

This ability to use negative values for from enhances the flexibility of the substring function, making it a valuable tool for dealing with variable-length strings or when the part of interest is closer to the end of the string. It simplifies the process of extracting substrings from the end without needing to calculate the exact starting position based on the total length of the string, thereby streamlining data manipulation tasks in PostgreSQL.