data engineering apis

Step-by-Step Instructions for Setting Up AWS RDS PostgreSQL and Populating with Data

by Interzoid Team


Set up an AWS PostgreSQL RDS database and load it with data

In this walkthrough, we will set up a PostgreSQL database on Ubuntu, connect to it, populate it with data, and then use a third-party tool to analyze the data for data quality issues.


Part One: Set up PostgreSQL on AWS RDS

1. Log into your AWS console and select the RDS service.

2. Click "Create Database" and select "PostgreSQL" as the engine. Be sure to scroll down to the RDS "Create Database", rather than the more complex Aurora database choice.

3. Set your DBInstance Name (this will identify it and be used in your connection string).

4. Set your username and password.

5. Select your DB instance size - choose an instance type and storage amount that fits your needs. The defaults are ok for this walkthrough. The burstable classes are a good choice for lower resource usage.

6. Configure the DB instance settings:

- Select the VPC you want to launch the database within (ok to just use the default).
- Set public accessibility to "yes".

7. For connectivity, under Security Group, create a new one. At minimum, allow port 5432 access. To make it easy for testing, you can allow ingress (inbound access) on port 5432 from 0.0.0.0/0. This will allow access to the PostgreSQL port from any IP address so that you can connect from anywhere, however it is recommended that you lock it down to the specific IP address of a server you will be connecting to it from, especially in a production environment.

8. Click "Create Database". It will now launch and provision the RDS Postgres instance.


Part Two: Preparing to Connect to the Database

To connect:

1. Once the status shows "Available", click on the instance name to view details.

2. Scroll down to "Connectivity & Security". Copy the Endpoint.

3. Using a SQL client like psql, pgAdmin, or DBeaver, connect to the endpoint using the master username and password as part of a connection string (see below). We will use psql to load the database with sample data.

4. You will then be able to use SQL statements to create tables, insert rows, run queries and more on your RDS Postgres database.


Part Three: Installing psql to Connect to the Database

In order to run SQL statements on this data, you can install psql, available on many platforms.

For example, here are the steps to install psql on Ubuntu:

1. Update the package repository(this ensures you get the latest version of psql):

$ sudo apt update

2. Install the postgresql client package:

$ sudo apt install postgresql-client-common

3. Install the libpq-dev package that provides the library needed for software to communicate with PostgreSQL databases:

$ sudo apt install libpq-dev

4. Install postgresql:

$ sudo apt install postgresql

5. Verify psql installed correctly:

$ psql --version

You should see the version of psql installed.


Part Four: Loading Sample Data into the PostgreSQL database:

The following will provide access to your Postgres instance on AWS (you will be prompted for the password)

$ psql --host=your-specific-endpoint.rds.amazonaws.com --port=5432 --username=postgres --password --

You should then be at the psql command line.

For a sample database script to populate a database, get it here: https://dl.interzoid.com/csv/companiessql.txt.

You can execute these statements and the table with data will be created in the default postgres database. You can try it from the psql prompt here:

> select * from companies;


Part Five: Running Redundant Data Match Reports using a Third Party Tool

Once you have data populated into the database, you can test it from a third-party product such as Interzoid's Cloud Data Connect Wizard, which will identify inconsistent and duplicate from a SQL table.

This is the connection string for the database we have created and loaded with data:

postgres://user:password@your-custom-endpoint.rds.amazonaws.com/database-name?sslmode=require

To run a match job with the Interzoid Cloud Data Connect Wizard, choose AWS RDS PostgreSQL as the data source, "companies", as the table name, and "company" as the column name to match on. Run the "Match Report" and view the results. It's that easy!

To register for an API key with Interzoid (free tier available), click here.

For more documentation on using PostgreSQL data sources for matching and cleansing database tables, click here.

See our Snowflake Native Application. Achieve Data Quality built-in to SQL statements.
Identify inconsistent and duplicate data quickly and easily in data tables and files.
More...
Connect Directly to Cloud SQL Databases and Perform Data Quality Analysis
Achieve better, more consistent, more usable data.
More...
Try our Pay-as-you-Go Option
Start increasing the usability and value of your data for $20 USD!
More...
Launch Our Entire Data Quality Matching System on an AWS EC2 Instance
Deploy to the instance type of your choice in any AWS data center globally. Start analyzing data and identifying matches across many databases and file types in minutes.
More...
Free Usage Credits
Register for an Interzoid API account and receive free usage credits. Improve the value and usability of your strategic data assets now.
Automate API Integration into Cloud Databases
Run live data quality exception and enhancement reports on major Cloud Data Platforms direct from your browser.
More...
Check out our APIs and SDKs
Easily integrate better data everywhere.
More...
Example API Usage Code on Github
Sample Code for invoking APIs on Interzoid in multiple programming languages
Business Case: Cloud APIs and Cloud Databases
See the business case for API-driven data enhancement - directly within your important datasets
More...
Documentation and Overview
See our documentation site.
More...
Product Newsletter
Receive Interzoid product and technology updates.
More...

All content (c) 2019-2024 Interzoid Incorporated. Questions or assistance? Contact support@interzoid.com

201 Spear Street, Suite 1100, San Francisco, CA 94105-6164

Interested in Data Cleansing Services?
Let us put our Generative AI-enhanced data tools and processes to work for you.

Start Here
Terms of Service
Privacy Policy

Use the Interzoid Cloud Connect Data Platform and Start to Supercharge your Cloud Data now.
Connect to your data and start running data analysis reports in minutes: connect.interzoid.com
API Integration Examples and SDKs: github.com/interzoid
Documentation and Overview: Docs site
Interzoid Product and Technology Newsletter: Subscribe
Partnership Interest? Inquire