Introduction to full-
text search
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Brian Piccolo
Sr. Director, Digital Strategy
Topics
Full Text search
Extending PostgreSQL
Improving full text search with extensions
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
The LIKE operator
_ wildcard: Used to match exactly one character.
% wildcard: Used to match zero or more characters.
SELECT title
FROM film
WHERE title LIKE 'ELF%';
+----------------------+
| title |
+----------------------+
| ELF PARTY |
+----------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
The LIKE operator
SELECT title
FROM film
WHERE title LIKE '%ELF';
+----------------------+
| title |
+----------------------+
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+----------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
The LIKE operator
SELECT title
FROM film
WHERE title LIKE '%elf%'; it0s case sensitivd
+----------------------+
| title |
+----------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
LIKE versus full-text search
While the LIKE operator is a simple pattern matching tool in your SQL toolbox, it's an
expensive operation in terms of performance
SELECT title, description
FROM film
match operator
WHERE to_tsvector(title) @@ to_tsquery('elf'); this statement? accounts for variation and is case insensitive
+----------------------+
| title |
+----------------------+
| ELF PARTY |
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+----------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
What is full-text search?
Full text search provides a means for performing natural language queries of text data in your
database.
Stemming
Spelling mistakes
Ranking
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Full-text search syntax explained
SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('elf');
SELECT to_tsvector(description)
FROM film;
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Let's practice!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Extending
PostgreSQL
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Brian Piccolo
Sr. Director, Digital Strategy
User-de ned data types
Enumerated data types
CREATE TYPE dayofweek AS ENUM (
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday'
);
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Getting information about user-de ned data
types
SELECT typname, typcategory
FROM pg_type table to get info about all data types availabe in your databse
WHERE typname='dayofweek';
+-----------+-------------+
| typname | typcategory |
|-----------|-------------|
| dayofweek | E |
+-----------+-------------+
E= enum type
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Getting information about user-de ned data
types
SELECT column_name, data_type, udt_name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name ='film';
+-----------------------------------------------+
| column_name | data_type | udt_name |
|-------------|-------------------|-------------|
| title | character varying | varchar |
| rating | USER-DEFINED | mpaa_rating |
+-----------------------------------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
User-de ned functions
CREATE FUNCTION squared(i integer) RETURNS integer AS $$
BEGIN
RETURN i * i;
END;
$$ LANGUAGE plpgsql; $$ specifies that the function will be suing sql as the language
SELECT squared(10);
+---------+
| squared |
|---------|
| 100 |
+---------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
User-de ned functions in the Sakila database
get_customer_balance(customer_id, effective_data): calculates the current outstanding balance for
a given customer.
inventory_held_by_customer(inventory_id): returns the customer_id that is currently renting an
inventory item or null if it's currently available.
inventory_in_stock(inventory_id): returns a boolean value of whether an inventory item is
currently in stock.
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Let's practice!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Intro to PostreSQL
extensions
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Brian Piccolo
Sr. Director, Digital Strategy
Intro to PostgreSQL extensions
Commonly used extensions
PostGIS adds support for allowing location queries to be run in sql
PostPic allows for image processing within the databse
fuzzystrmatch
extend full text search capabilities by finding similarities between strings
pg_trgm
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Querying extension meta data
Available Extensions Installed Extensions
SELECT name SELECT extname
FROM pg_available_extensions; FROM pg_extension;
+--------------------+ +---------+
| name | | name |
|--------------------| |---------|
| dblink | | plpgsql |
| pg_stat_statements | +---------+
+--------------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
--Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
--Confirm that fuzzstrmatch has been enabled
SELECT extname FROM pg_extension;
+---------------+
| name |
|---------------|
| plpgsql |
| fuzzystrmatch |
+---------------+
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Using fuzzystrmatch or fuzzy searching
SELECT levenshtein('GUMBO', 'GAMBOL');
+-------------+
| levenshtein |
|-------------|
| 2 |
+-------------+
levenshtein distance represents the number of edits
required to convert one string to another string being
compared.
In a search application or when performing data
analysis on any data that contains manual user input,
you will always want to account for typos or incorrect
spellings. The levenshtein function provides a great
method for performing this task
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Compare two strings with pg_trgm
SELECT similarity('GUMBO', 'GAMBOL');
+------------+
| similarity |
|------------|
| 0.18181818 |
+------------+
determie the similarity of two strings using trigram 8groups of 3 consecutive chars in a str) matchings
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
-- Select the title and description columns
In this exercise, we are going to use many of the techniques
SELECT
and concepts we learned throughout the course to generate
title,
a data set that we could use to predict whether the words
description
and phrases used to describe a film have an impact on the
FROM
number of rentals.
film
WHERE
First, you need to create a tsvector from the description
-- Match "Astounding Drama" in the description
column in the film table. You will match against a tsquery to
to_tsvector(description) @@
determine if the phrase "Astounding Drama" leads to more
to_tsquery('Astounding & Drama');
rentals per month.
SELECT
title,
Next, create a new column using the description,
similarity function to rank the film -- Calculate the similarity
Let's practice!
descriptions based on this phrase. similarity(description, 'Astounding Drama')
FROM
film
WHERE
to_tsvector(description) @@
to_tsquery('Astounding & Drama')
ORDER BY
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
similarity(description, 'Astounding Drama') DESC;
Putting it All
Together
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Brian Piccolo
Sr. Director, Digital Strategy
Functions for manipulating data recap and review
Common data types in PostgreSQL
Date/time functions and operators
Parsing and manipulating text
PostgreSQL Extensions and full-text search
FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL
Thank you!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L