Eight brutally efficient steps to Geo-code & Reverse Geo-code with Tiger DB

Let us understand Geo-coding first!

Geo-coding is the process of finding associated geographic coordinates from  geographic data, such as street addresses and so forth. To perform this wizardry, you will need access to a geocoding service. There are plenty out there to choose from, many of them  free for personal or non-commercial projects. However, if you want to include geo-coding in a commercial app, or expect to exceed a free tier’s query limit, you may want to consider running your own geo-coding server.

This post will run through the steps necessary to get the postgis_tiger_geocoder extension running on a PostgreSQL database.

Pre-requisites

This tutorial assumes you have a PostgreSQL database with PostGIS support up and running. If you need help setting one up, you can refer to this post. I will be picking up exactly where I left using the same database we just finished setting up.

Step 1 – Preparation

In order for the PostGIS geocoder extension to work, we’ll need to download TIGER/Line files. TIGER/Line files are a custom text based format that the Census Bureau published until 2014, at which point they switched to shapefiles. The name stuck on, though.

We will start by creating a directory  where we have read/write access. I’m just going to use my home directory:

$ cd
 $ mkdir tiger

Step 2 – Update Loader Variables

The postgis_tiger_geocoder is composed of four components:

1. The data loader functions

2.The address normalizer

3. The address geo-coder, and

4. The reverse geocoder.

You saw the address normalizer at the end of the previous walk-through. Its sole purpose is to parse a user entered address and attempts to transform it into a normalized format that the geocoder accepts.

The next component we are going use is the data loader. It might seem intimidating at first sight, but once understood, the data loader functions will soon become your favorite thing world. Yes, really! Trying to do this manually would be absolutely horrific.

First, we need to update a few records in the database, so that the data loader knows where to put everything and what tools to use. The first values we are going to edit are stored in the loader_variables table. We will start  by just selecting everything to see what is in there.

$ psql gistestdb
 gistestdb=# SELECT * FROM tiger.loader_variables;

tiger_year | website_root | staging_fold | data_schema | staging_schema

———–+——————————————-+————–+————-+—————-

2013 | ftp://ftp2.census.gov/geo/tiger/TIGER2013 | /gisdata | tiger_data | tiger_staging

(1 row)

gistestdb=#

The only value we care about at this point is going to be the staging_fold variable. We want to point at the directory we made in step one. This is where all the datasets will be downloaded to and processed. Keep in mind, you will need to change the path to wherever it is on your system. I will be using /home/peter/tiger:

gistestdb=# UPDATE tiger.loader_variables SET staging_fold = '/home/peter/tiger';

UPDATE 1

gistestdb=# SELECT staging_fold FROM tiger.loader_variables;

staging_fold

——————-

/home/peter/tiger

(1 row)

Step 3 – Create a Loader Profile

Next, we will be creating a Data Loader profile that matches our system. You can read the PostGIS docs for the official instructions. Essentially, all we are doing in this step is editing the shell variables. Data Loader will be declaring at top of the shell scripts it  generates to load the TIGER/Line datasets into our database. The shell scripts require the following variables:

  • TMPDIR = the directory you created in step 1
  • UNZIPTOOL = whatever tool you use to unzip files (we installed unzip in the previous tutorial)
  • WGETTOOL = whatever tool you use to download files.  We will be using wget
  • PGBIN = the directory your posgtres binaries live, chances are it is simply /usr/bin but you can test by checking where your psql binary lives:
$ which psql/usr/bin/psql$
  • PGPORT = the port that PostgreSQL listens on
  • PGHOST = the hostname of the PostgreSQL server
  • PGUSER = this needs to be a database superuser.  We are using the account we setup for our OS level user account
  • PGPASSWORD = the password for the PGUSER account
  • PGDATABASE = the name of the database we will be loading all the data into. I am using gistestdb
  • PSQL = the location of our psql binary
  • SHP2PGSQL = the location of our shp2pgsql script

First we will need to insert a new profile record for our user account. We will use the pre-existing sh template as a starting point. Remember, I am using the account name peter, but you should change that to whatever matches your setup.

 $ psql gistestdbgistestdb=# INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_command)gistestdb-# SELECT 'peter', declare_sect, pgbin, wget, unzip_command, psql, path_sep, loader, environ_set_command, county_process_commandgistestdb-# FROM tiger.loader_platformgistestdb-# WHERE os = 'sh';INSERT 0 1gistestdb=#

Next let’s take a look at the default values that were present in the sh profile:

 $ psql gistestdbgistestdb=# SELECT * FROM tiger.loader_platform WHERE os = 'peter'; os | declare_sect | pgbin | wget | unzip_command | psql | path_sep | loader | environ_set_command | county_process_command------+------------------------------------+-------+------+---------------------------------------------------------------+---------+----------+--------------+---------------------+-------------------------------------------------------------------------------------------------------------------------------peter | TMPDIR="${staging_fold}/temp/" +| | wget | rm -f ${TMPDIR}/*.* +| ${PSQL} | / | ${SHP2PGSQL} | export | for z in *${table_name}.dbf; do + | UNZIPTOOL=unzip +| | | ${PSQL} -c "DROP SCHEMA IF EXISTS ${staging_schema} CASCADE;"+| | | | | ${loader} -D -s 4269 -g the_geom -W "latin1" $z ${staging_schema}.${state_abbrev}_${table_name} | ${psql} + | WGETTOOL="/usr/bin/wget" +| | | ${PSQL} -c "CREATE SCHEMA ${staging_schema};" +| | | | | ${PSQL} -c "SELECT loader_load_staged_data(lower('${state_abbrev}_${table_name}'), lower('${state_abbrev}_${lookup_name}'));"+ | export PGBIN=/usr/pgsql-9.0/bin +| | | for z in *.zip; do $UNZIPTOOL -o -d $TMPDIR $z; done +| | | | | done | export PGPORT=5432 +| | | for z in */*.zip; do $UNZIPTOOL -o -d $TMPDIR $z; done +| | | | | | export PGHOST=localhost +| | | cd $TMPDIR; +| | | | | | export PGUSER=postgres +| | | | | | | | | export PGPASSWORD=yourpasswordhere+| | | | | | | | | export PGDATABASE=geocoder +| | | | | | | | | PSQL=${PGBIN}/psql +| | | | | | | | | SHP2PGSQL=${PGBIN}/shp2pgsql +| | | | | | | | | cd ${staging_fold} +| | | | | | | | | | | | | | | | |(1 row) gistestdb=#

As you can see, the values we need to edit are all contained within the declare_sect column. So again, you will need to update this to match your system, but it should be pretty similar:

 $ psql gistestdbgistestdb=# UPDATE tiger.loader_platformgistestdb-# SET declare_sect =gistestdb-# 'TMPDIR="${staging_fold}/temp/"gistestdb'# UNZIPTOOL=unzipgistestdb'# WGETTOOL="/usr/bin/wget"gistestdb'# export PGBIN=/usr/bingistestdb'# export PGPORT=5432gistestdb'# export PGHOST=localhostgistestdb'# export PGUSER=petergistestdb'# export PGPASSWORD=MyAwesomePasswordgistestdb'# export PGDATABASE=gistestdbgistestdb'# PSQL=${PGBIN}/psqlgistestdb'# SHP2PGSQL=${PGBIN}/shp2pgsqlgistestdb'# cd ${staging_fold}'gistestdb-# WHERE os = 'peter';UPDATE 1gistestdb=#

Let us check our values to make sure everything looks good:

<table>
 <tbody>
 <tr>
 <td></td>
 <td>gistestdb=# SELECT declare_sect FROM tiger.loader_platform WHERE os='peter';               declare_sect----------------------------------------- TMPDIR="${staging_fold}/temp/"         +  UNZIPTOOL=unzip                       +  WGETTOOL="/usr/bin/wget"              +  export PGBIN=/usr/bin                 +  export PGPORT=5432                    +  export PGHOST=localhost               +  export PGUSER=peter                   +  export PGPASSWORD=MyAwesomePassword   +  export PGDATABASE=gistestdb           +  PSQL=${PGBIN}/psql                    +  SHP2PGSQL=${PGBIN}/shp2pgsql          +  cd ${staging_fold}(1 row) gistestdb=#</td>
 </tr>
 </tbody>
 </table>

If everything looks good and matches your system, the Data Loader should be all set to start generating scripts using your database profile.

Step 4 – Generate the Nation Loader Script

The first script we will be generating is the Loader_Generate_Nation_Script which loads in the county and state lookup tables. We will store it in our staging folder and execute it there.

$ psql gistestdbgistestdb=# copy (select loader_generate_nation_script(‘peter’)) to ‘/home/peter/tiger/tiger_nation.sh’ with binarygistestdb=# q$

Now, I highly suspect there is a better way to do this, but I have yet to take  time to figure it out. Basically, what we are going to do is manually delete the first two lines and edit the third to remove the extraneous garbage that our copy command added to the script. So, say good bye to your favorite text editor and change the first three lines from:

PGCOPYÿ^M^@^@^@^@^@^@^@^@^@^@^A^@^@^K¸TMPDIR=”/home/peter/tiger/temp/”

to simply:

TMPDIR=”/home/peter/tiger/temp/”

Next, we just have to remove the two trailing bits of cruft the last command added to the end of our script. So again, open the script in a text editor and change:

${PSQL} -c “VACUUM ANALYZE tiger_data.county_all_lookup;” ÿÿ

to:

${PSQL} -c “VACUUM ANALYZE tiger_data.county_all_lookup;”

If anyone knows or figures out a better way to export that script and skip the manual editing, please let me know. But in the meantime, we are ready to run it:

$ sh ~/tiger/tiger_nation.sh –2014-04-23 15:28:26–  ftp://ftp2.census.gov/geo/tiger/TIGER2013/STATE/           => ‘ftp2.census.gov/geo/tiger/TIGER2013/STATE/.listing’Resolving ftp2.census.gov (ftp2.census.gov)… 148.129.75.35, 2610:20:2010:a09:1000:0:9481:4b23Connecting to ftp2.census.gov (ftp2.census.gov)|148.129.75.35|:21… connected.Logging in as anonymous … Logged in!==> SYST … done.    ==> PWD … done.blahblahblahINSERT 0 1INSERT 0 1etcetcetcINSERT 0 3234VACUUM$

Step 5 – Generate the State Loader Script

Next, is the Loader_Generate_Script to pull in our State data. It can take a VERY long time to run depending on how many states you are loading up. Luckily, it is quite chatty, so you can be pretty sure it has not locked up on you. We will start by generating a script to download and create the tables for Delaware and Maryland, using the same data loader profile as before.

$ psql gistestdbgistestdb=# copy (select loader_generate_script(ARRAY[‘DE’, ‘MD’], ‘peter’)) to ‘/home/peter/tiger/tiger_states.sh’ with binarygistestdb=# q$

Clean out the junk

Open the tiger_states.sh script you just created in a text editor and clean up the file in exactly the same was as we did the nation script (delete the first two lines, clean up the third line, delete the last two characters).

Run the script

Once that’s been done, you should grab a coffee and a good book. The script does not require any input once it’s been executed so unless you hit some sort of error, you’ll have nothing left to do but wait. Go ahead and run it:

$ sh ~/tiger/tiger_states.shblahblahblah$

Step 6 – Install the Missing Indexes

Once the script that installs the state TIGER/Line files has finished running, you’re ready to do a final bit of setup. During the Load Data process, there are most likely going to be some missing indexes. TheMissing_Indexes_Generate_Script is a convenience method that checks to see if anything is missing. If you run it and it produces no output, then all your tables have the key indexes in place. TheInstall_Missing_Indexes script does exactly what you think it does:

$ psql gistestdbgistestdb=# SELECT missing_indexes_generate_script();gistestdb=# SELECT install_missing_indexes();  install_missing_indexes————————-  t(1 row) gistestdb=#

Step 7 – Geocode Your First Address

If everything has worked properly up to this point, you can finally geocode your first address. Since, we installed the Delaware and Maryland TIGER/Line data sets, I’m going to geocode the location of the Delaware Art Museum.

<table>
 <tbody>
 <tr>
 <td></td>
 <td>$ psql gistestdbgistestdb=# SELECT g.rating,gistestdb-# ST_X(g.geomout) As lon,gistestdb-# ST_Y(g.geomout) As lat,gistestdb-# (addy).address As stno,gistestdb-# (addy).streetname As street,gistestdb-# (addy).streettypeabbrev As styp,gistestdb-# (addy).location As city,gistestdb-# (addy).stateabbrev As st,gistestdb-# (addy).zipgistestdb-# FROM geocode('2301 Kentmere Pkwy, Wilmington, DE 19806') As g;  rating |        lon        |       lat        | stno |  street  | styp |    city    | st |  zip--------+-------------------+------------------+------+----------+------+------------+----+-------      0 | -75.5648444934456 | 39.7649668311956 | 2301 | Kentmere | Pkwy | Wilmington | DE | 19806(1 row) gistestdb=#</td>
 </tr>
 </tbody>
 </table>

We got a single hit, and it’s perfectly accurate! If you’d like to do a sanity check, just enter the lat/long coordinates as our address into the google maps

Step 8 – Reverse Geocode an Address

The other function we would like to test is reverse geo-coding, or, the ability to convert a set of coordinates into an address. I don’t want Maryland to feel left out, so we will reverse geocode the Baltimore Museum of Art this time.

$ psql gistestdbgistestdb=# SELECT pprint_addy(r.addy[1]) As st1,gistestdb-# pprint_addy(r.addy[2]) As st2,gistestdb-# pprint_addy(r.addy[3]) As st3,gistestdb-# array_to_string(r.street, ,) As cross_streetsgistestdb-# FROM reverse_geocode(ST_GeomFromText(POINT(-76.6177879 39.3262418),4269),true) As r; st1 | st2 | st3 | cross_streets————————————+———————————+—————————————+————— Art Museum Dr, Baltimore, MD 21218 | Charles St, Baltimore, MD 21218 | 16 Art Museum Dr, Baltimore, MD 21218 | N Charles St (1 row)gistestdb=#

Voila! Did you find this article interesting. Do you have something to share? Feel free to write to me at udayakumar_k@trigent.com

Author