mysql: “thai food near me”, or: doing geo distance calculations in your database.
we're all familiar with the whole "thai food near me" thing. you type that phrase into your phone and it responds with a list of thai restaurants that are, well, near you. and we have a kind-of understanding of how that works under the hood: google or whoever has a database of thai restaurants with their latitudes and longitudes and knows our location from our phone and then does 'some process' to figure out which thai places are nearby.
in this post,we'll be going over that 'some process' part, looking at how to use mysql to do some standard location stuff. we'll cover mysql's POINT
and POLYGON
types, finding the distance between two points on a sphere (which the earth, contrary to what you may have read on the internet, is), determining if a point is inside of a polygon defined by points, and look at things like 'spatial reference systems' which define how coordinates are plotted on the surface of the earth.
a restaurant attempts an sql injection attack.
making a POINT
in mysql
mysql has a whole suite of functions and data types devoted to spatial data. the number of them is dizzying and the official documentation is almost criminally dense. fortunately, we can accomplish what we want to do using only a small subset. we'll start with POINT
.
POINT
is both a datatype and a function that returns that data type. if we wanted to define a point on a good, old-fashioned x/y graph, we can do it like so:
SELECT POINT(3, 7);
the result of that query is our x/y point in a value of type POINT
. mysql stores POINT
in a binary format, so the result of our select is not particularly useful:
SELECT POINT(3, 7);
+------------------------------------------------------+
| POINT(3, 7) |
+------------------------------------------------------+
| 0x00000000010100000000000000000008400000000000001C40 |
+------------------------------------------------------+
mysql addresses this by providing two convenience functions to extract the x
and y
values from a point:
ST_X()
ST_Y()
they both accept a POINT
value as an argument. for instance:
SELECT ST_X(POINT(3,7)) AS x, ST_Y(POINT(3,7)) AS y;
+------+------+
| x | y |
+------+------+
| 3 | 7 |
+------+------+
because POINT
is a data type, we can use it in table definitions, just like we would INT
or VARCHAR
.
CREATE TABLE `some_coords` (
`coords` POINT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
if we have a column of type POINT
, only POINT
data can go in there. we'll cover this more later.
a short digression on x, y, maps and a lack of standards
we all learned in school how to plot points on blue-lined graph paper using the x-axis, which runs horizontally, and the y-axis, which is vertical. points were defined as x/y
; horizontal first, vertical second. this is the way it has been forever, and everyone agrees on it.
except the people who make maps.
the people who make maps define points as latitude/longitude. latitude, of course, runs north-south, which is vertical on a map. longitude, the east-west axis, is horizontal. the map people, in essence, decided to use y/x
.
obviously, this creates problems. let's look at what happens when we create a POINT
representing the location of the ship & anchor pub in central calgary, alberta (where i have been known, on occasion, to blog from)
SELECT ST_X(POINT(51.037913, -114.073277)) as longitude, ST_Y(POINT(51.037913, -114.073277)) as latitude;
+-----------+-------------+
| longitude | latitude |
+-----------+-------------+
| 51.037913 | -114.073277 |
+-----------+-------------+
latitude and longitude are mixed up; our pub is in the wrong place. what's worse, since the maximum value for latitude is 90, we've put the ship & anchor somewhere out in space. not good.
mysql addresses this issue by providing two functions to replace ST_X()
and ST_Y()
when using points on a map or globe:
ST_Latitude()
ST_Longitude()
this is good stuff, except, if we try to use them in our above query, we get this error message:
ERROR 3726 (22S00): Function st_latitude is only defined for geographic spatial reference systems, but one of its arguments is in SRID 0, which is not geographic.
this error looks daunting (what the hell is SRID 0
?), but all mysql is telling us here is that the POINT
s we're using haven't been defined as being map points. they're just regular, old bags of x's and y's.
we'll go over SRID
s and SRS
s later on.
latitude and longitude has always been y/x.
a better way to make a POINT
: well-known text
so far, we've selected a value of type POINT
by using the function POINT()
. this works fine for now, but there is a better, more-flexible way to do this that will make working with POINT
s and POLYGON
s easier when things start getting more complicated.
the ST_GeomFromText()
function takes as an argument a text expression (a string) of the geometric object we want to create (a POINT
in this case), and returns a value of the correct type.
these text expressions are formatted using a syntax called "well-known text". the format is, basically, the name of the geometric object you want to create (ie. POINT
) and the coordinates that define it. let's look:
SELECT ST_GeomFromText('POINT(51.037913 -114.073277)');
this looks very straightforward, but there's a glaring question: where is the comma separating the arguments in our POINT
call?
the answer is that the well-known text here isn't a call to the function POINT()
, it's a definition of the data type POINT
.
back at the beginning of this discussion, we went over how POINT
is both a function and a datatype. when we use POINT()
as a function, the coordinates are arguments that are separated by a comma. when we define a value using POINT
as a type, the coordinates do not take a comma.
we can use ST_GeomFromText()
to create any sort of geometric object that's defined in the well-known text. there aren't many of these, and we'll be sticking in this post to POINT
s and POLYGON
s (which include things like squares and triangles).
spatial reference systems: not all points are the same
on my desk i have a small chess board where i occasionally work through annotated games. it's my idea of "fun". that chess board is a coordinates system. i also have a large, widescreen computer monitor on my desk. it's a coordinate system as well.
however, just because my chess board and monitor are both coordinate systems doesn't mean that the coordinates from one can be transferred to the other. the x/y position of my white bishop is meaningless on my monitor; that x/y point only has meaning in the context of the chess board.
a context defines things like the origin points, axes, units of measurement and the like. useful stuff that helps us make sense of what a coordinate actually means.
when it comes to plotting points and lines and polygons on the surface of the earth, that context is called a 'spatial reference system', or SRS.
there are a lot of different SRSs. a lot. some of them treat the earth as a sphere, others as a projected flat map. some cover the entire planet, many more only deal with a sub region, like a country. some include a z axis from the center of the earth, most don't.
if we want to peruse all the different SRSs that mysql has, we can run this select:
SELECT * FROM INFORMATION_SCHEMA.ST_SPATIAL_REFERENCE_SYSTEMS;
there are about five thousand of them.
fortunately, we don't need to read through all of these to choose one. we're just going to use 4326, a global, unprojected coordinate system that (just about) everybody uses.
that 4326 number is the id of the SRS. it's called, unsurprisingly, an SRID
. if we remember back to when we tried to call the ST_Latitude()
function on the POINT
we made, we got the error:
ERROR 3726 (22S00): Function st_latitude is only defined for geographic spatial reference systems, but one of its arguments is in SRID 0, which is not geographic.
now that we have more of an understanding about SRSs, we can see that here mysql is complaining that we are asking for the latitude, but the SRS of our POINT
isn't one that uses latitude and longitude. the SRS we are using, according to the error message, is SRID
0.
SRID
0 is just a 'flat, cartesian plane' with no units. think of it as a sheet of that blue-lined graph paper from math class stretching off into infinity in all directions. this is a great SRS for some applications, but is not very meaningful for using latitude and longitude to map places on a spherical earth. SRID
0 is the default SRS that mysql assigns to POINT
s (and other shapes) when one is not specified.
by comparison, the 4326 SRS is specifically designed for global mapping. it treats the surface of the earth as an ellipsoid, uses degrees for measurement and defines the axes as the equator and prime meridian. exactly what we want. 4326 is, in turn, based on a big set of data about the earth called the world geodetic system 1984, or WSG84, that was compiled in that year in an effort to unify and standardize the mishmash of national mapping data. if you're one of those 'further reading' types, you can read over a detailed explainer on SRID
4326 here or peruse the surprisingly-entertaining wikipedia entry on WSG84.
a developer accidentally uses SRID 0
for their geolocation select.
actually using SRID
4326
using SRID
4326 as our SRS when creating a POINT
is pretty straightforward; we just add the SRID
as a second argument to ST_GeomFromText()
.
SELECT ST_GeomFromText('POINT(51.037913 -114.073277)', 4326);
and, just like that, our x/y values are now treated as longitude and latitude coordinates on earth. let's try ST_Latitude()
again:
SELECT ST_Latitude(ST_GeomFromText('POINT(51.037913 -114.073277)', 4326)) AS latitude;
+-----------+
| latitude |
+-----------+
| 51.037913 |
+-----------+
exactly what we wanted.
creating a table for our POINT
s
selecting geometric data like POINT
s (or POLYGON
s or LINESTRING
s) created using literal data is fine, but what we probably want to do is persist that data in a table so we can use it later. let's do that. we'll start with creating our table.
CREATE TABLE `calgary` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`coords` POINT NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
here, we've defined a pretty standard-looking table of notable locations in the city of calgary, alberta. the interesting column here is coords
, which is defined as a POINT
.
that POINT
doesn't have an SRS associated with it. this means that on every insert, we will have to define the SRID
we are using for our point. this is very flexible, but if we want to we can add the SRS to the column definition.
CREATE TABLE `calgary` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`coords` POINT SRID 4326 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
by defining our coords
column as POINT SRID 4326
we are enforcing that any POINT
in that column must be of SRID
4326. if we try to insert a point that has a different SRID
, mysql will complain with an error like:
ERROR 3643 (HY000): The SRID of the geometry does not match the SRID of the column 'coords'. The SRID of the geometry is 0, but the SRID of the column is 4326. Consider changing the SRID of the geometry or the SRID property of the column.
for all the examples going forward, we will be using a table with a coords
column that does not define the SRID
.
now that we have a table, we can insert some rows. we'll add a list of calgary landmarks.
INSERT INTO calgary VALUES (null, 'calgary tower', ST_GeomFromText('POINT(51.044270 -114.062019)', 4326));
INSERT INTO calgary VALUES (null, 'peace bridge', ST_GeomFromText('POINT(51.0542 -114.0793)', 4326));
INSERT INTO calgary VALUES (null, 'saddledome', ST_GeomFromText('POINT(51.0374 -114.0519)', 4326));
INSERT INTO calgary VALUES (null, 'national music centre', ST_GeomFromText('POINT(51.04250 -114.06083)', 4326));
INSERT INTO calgary VALUES (null, 'baitun nur mosque', ST_GeomFromText('POINT(51.101743 -113.972039)', 4326));
INSERT INTO calgary VALUES (null, 'olympic oval', ST_GeomFromText('POINT(51.07694 -114.13556)', 4326));
INSERT INTO calgary VALUES (null, 'heritage park', ST_GeomFromText('POINT(50.98528 -114.10833)', 4326));
INSERT INTO calgary VALUES (null, 'international avenue', ST_GeomFromText('POINT(51.03778 -113.98167)', 4326));
INSERT INTO calgary VALUES (null, 'fort calgary', ST_GeomFromText('POINT(51.045139 -114.045778)', 4326));
there's a lot of things to see in calgary!
in these insert statements, we create our point using ST_GeomFromText()
and set the SRID
as 4326 like so:
ST_GeomFromText('POINT(51.0542 -114.0793)', 4326)
we can then select this data back, getting the latitude and longitude of each location with ST_latitude()
and ST_longitude()
.
SELECT id,
name,
ST_Latitude(coords) AS latitude,
ST_Longitude(coords) AS longitude
FROM calgary;
+----+-----------------------+-----------+-------------+
| id | name | latitude | longitude |
+----+-----------------------+-----------+-------------+
| 1 | calgary tower | 51.04427 | -114.062019 |
| 2 | peace bridge | 51.0542 | -114.0793 |
| 3 | saddledome | 51.0374 | -114.0519 |
| 4 | national music centre | 51.0425 | -114.06083 |
| 5 | baitun nur mosque | 51.101743 | -113.972039 |
| 6 | olympic oval | 51.07694 | -114.13556 |
| 7 | heritage park | 50.98528 | -114.10833 |
| 8 | international avenue | 51.03778 | -113.98167 |
| 9 | fort calgary | 51.045139 | -114.045778 |
+----+-----------------------+-----------+-------------+
at last, calculating distance
so far, we've made some spatial POINT
s and assigned them to SRID
4326 so we can actually make sense of them as latitude and longitude. it's finally time to focus on what we really want to do: getting the distance between two points.
to do this, we're going to use mysql's ST_Distance_Sphere()
function.
as one would expect, ST_Distance_Sphere()
calculates the distance between two points, provided as arguments to the function, on a sphere. the distance returned will always be the shortest one (since, on a sphere, we can always go the opposite direction and travel further to get to the same place). the unit of measurement is meters.
ST_Distance_Sphere()
takes an optional third argument: the radius of the sphere. if we do not set this argument, the value 6,370,986 meters is used. that's the radius of the earth, and is the value we almost certainly want to use.
knowing all that, an example select would look like:
SELECT name,
ST_Distance_Sphere(ST_GeomFromText('POINT(51.037913 -114.073277)', 4326), coords) AS distance_meters
FROM calgary;
+-----------------------+--------------------+
| name | distance_meters |
+-----------------------+--------------------+
| calgary tower | 1057.9217149476015 |
| peace bridge | 1859.336539883446 |
| saddledome | 1495.7790780297603 |
| national music centre | 1008.7085120625501 |
| baitun nur mosque | 10020.62038333001 |
| olympic oval | 6146.6116509785015 |
| heritage park | 6345.541637300453 |
| international avenue | 6405.199613693066 |
| fort calgary | 2083.730747912871 |
+-----------------------+--------------------+
here we can see that we passed two POINT
arguments to ST_Distance_Sphere()
. The first is one we constructed from literal values using ST_GeomFromText()
. it's the location of the ship & anchor pub in central calgary, where i promise i am not writing this post. the second argument is our coords
column.
the result is the distance from our starting POINT
, the ship & anchor, to all the POINT
s in our table, in meters.
from here, building 'near me' functionality is just a matter of applying a WHERE
or ORDER BY
clause.
going regional: finding points inside a square (or any shape)
perhaps, instead of a basic 'near me' feature, we want our users to be able to draw a square on a map and say "show me all the calgary landmarks in here."
to do this, the fist step we need to take is defining a square.
creating a square
a square is a type of polygon, and mysql provides a POLYGON
data type that we can use to describe a square (or any shape). POLYGON
s are defined by a set of coordinates that identify the corners of the shape. this means, to create a square, we provide POLYGON
with five coordinate sets.
wait, five? don't we mean four? a square has four corners, after all.
the important thing to note here is that a polygon must be closed. this means that the first coordinate set and the last coordinate set must be the same. it completes the shape by going back to the beginning. the result is that a square is defined has having five sets of coordinates. to illustrate, let's look at this glorious ascii diagram that shows the five coordinates that create a square.
1/5 ---- 4
| |
| |
2 ---- 3
with that in mind, we can create a square of latitude and longitude values. the example we'll be using is this square covering most of downtown calgary.
a square covering most of downtown calgary.
to select this as a POLYGON
in mysql, we would do:
SELECT ST_GeomFromText('POLYGON( ( 51.053913 -114.094391, 51.028008 -114.094391, 51.028008 -114.037743, 51.053913 -114.037743, 51.053913 -114.094391) )', 4326);
given our experience creating a POINT
, this should be fairly straightforward. the only difference is that instead of passing one coordinate set to POINT
, we pass five to POLYGON
. the result is a geometric shape, stored in a binary format, that we can use for comparisons against POINTS
or, even, other POLYGON
s.
finding POINT
s 'within' a square
we now have a POLYGON
defined from some literal values, and a table full of POINT
s, all that's left is to find out which POINT
s in our table are inside our POLYGON
. we can do this with the mysql function ST_Within()
. here's an example:
SELECT name,
ST_Latitude(coords) AS latitude,
ST_Longitude(coords) AS longitude
FROM calgary
WHERE ST_Within(
coords,
ST_GeomFromText('POLYGON( ( 51.053913 -114.094391, 51.028008 -114.094391, 51.028008 -114.037743, 51.053913 -114.037743, 51.053913 -114.094391) )', 4326)
)
we can see that ST_Within()
takes two arguments: a POINT
, and a POLYGON
. if the POINT
is 'within' the POLYGON
, ST_Within()
returns 1. if it isn't, we get a 0.
conclusion
once we have an understanding of how to create POINT
s and POLYGON
s and use ST_Distance_Sphere()
and ST_Within()
we can combine and extrapolate them to get more complex data, like "the closest daycare in a given school district" or "all the burrito busses on this side of the river" or, even, answer the question that has driven so many of the great minds in computer science: "where is a thai restaurant near me"?
🔎 this post originally appeared in the grant horwood technical blog