Accessing Snowflake Data with Python using the REST API

Developers often need to access data stored in Snowflake for reporting, analytics, or application integration purposes. Snowflake provides a REST API that allows programmatic access to read and write data.

This comprehensive guide will teach you how to use Python to connect to Snowflake and query data via the REST API.

[/su_box]

Overview of Snowflake’s REST API

Snowflake’s REST API allows sending HTTP requests to execute SQL queries and statements. It provides a straightforward way to integrate Snowflake into Python applications and scripts.

Here are some key capabilities of the REST API:

  • CRUD Operations – Create, read, update and delete Snowflake data and metadata.
  • Query Execution – Run SELECT queries to obtain result sets in JSON.
  • Stored Procedures – Execute stored procedures and user-defined functions.
  • Security – Supports OAuth 2.0 authentication and access control.
  • Asynchronous – Run long-running queries in the background then fetch results later.
  • Low Overhead – Lightweight API reduces overhead compared to full client libraries.

The REST API runs over HTTPS and uses standard HTTP methods like GET, POST, PUT and DELETE. Responses are in JSON format.

Prerequisites

To follow this guide and use the Snowflake REST API in Python, you will need:

  • A Snowflake account with a username, password, account identifier, etc.
  • An existing database, schema, tables, and data in Snowflake.
  • Python 3 installed on your development machine.
  • The Python requests library installed.

This guide assumes you already have a Snowflake account set up with objects and data you want to access.

Authenticating with Snowflake

To make requests to the Snowflake REST API, you first need to authenticate and obtain an access token using OAuth 2.0.

Here is sample Python code to handle the OAuth flow:

import requests

# Snowflake OAuth 2.0 params 
OAUTH_HOST = '<your_snowflake_account>.snowflakecomputing.com'
TOKEN_URL = 'https://{}/oauth/token'.format(OAUTH_HOST)
CLIENT_ID = '<your_client_id>' 
CLIENT_SECRET = '<your_client_secret>'

# Authenticate and obtain access token
data = {
  'grant_type': 'password',
  'username': '<your_username>',
  'password': '<your_password>'
}
response = requests.post(TOKEN_URL, data=data, 
                         auth=(CLIENT_ID, CLIENT_SECRET))

access_token = response.json()['access_token']

This code makes a POST request to the /oauth/token endpoint to authenticate and get back an access token.

The access token will be used in subsequent API requests by passing it in the Authorization header:

headers = {
   'Authorization': 'Bearer {}'.format(access_token)
}

With the auth token, you can now make API calls. Tokens are valid for 45 minutes initially.

Running a Query

To run a query, make a POST request to the /queries endpoint. Here is sample code:

import json
import requests 

# Example query 
QUERY = "SELECT * FROM customers LIMIT 10"

# Set headers and body
headers = {'Authorization': 'Bearer {}'.format(access_token)}
data = {'query': QUERY}

# Execute query via API
response = requests.post(
    'https://<your_account>.snowflakecomputing.com/queries', 
    headers=headers,
    data=json.dumps(data)
)

results = response.json()
print(results)

The JSON request body contains the SQL query string. The resulting response will contain the rows of data in JSON format.

You can parse through the results to access individual rows/columns as needed.

Using Query Parameters

To add bind parameters to your query, specify them in a params array:

params = [
    {
        'name': 'p_customer_id',
        'type': 'NUMBER',
        'value': 123
    }
]

data = {
   'query': "SELECT * FROM customers WHERE id = :p_customer_id",
   'params': params
} 

Any :named_params in the query will be replaced with the bound values.

You can also pass over arrays, objects and variables to be parsed/inserted by Snowflake:

params = [
   {
     'name': 'p_user_ids',
     'type': 'ARRAY',
     'value': [123, 456, 789] 
   }
]

Handling Large Result Sets

If a query returns a very large result set, you may want to fetch it in chunks rather than all at once.

You can use the rowset parameter to control pagination:

data = {
  'query': "SELECT * FROM big_table",

  # Paginate into chunks of 2500 rows
  'rowset': 2500 
}

response1 = requests.post('/queries', json=data) 

# Increment offset to get next 2500 rows
data['rowOffset'] = 2500
response2 = requests.post('/queries', json=data)

Each call will return up to rowset rows starting at the given offset. Stitch together the results to construct the full result set.

You can also have the query run asynchronously in the background by specifying asyncExec=true.

Inserting Data

To insert new data, make a POST to the /insert endpoint.

The body contains the table, columns, and rows to insert:

data = {
  'table': 'customers',
  'columns': ['FIRST_NAME', 'LAST_NAME', 'EMAIL'],
  'rows': [
    ['John', 'Doe', '[email protected]'],
    ['Jane', 'Doe', '[email protected]'] 
  ]
}

requests.post('/insert', headers=headers, json=data)

You can insert multiple rows at a time making it very fast for bulk data loads into Snowflake.

Updating Data

Use the /update endpoint to modify existing rows:

data = {
  'table': 'customers',
  
  'column': 'EMAIL',
  'value': '[email protected]', 
  
  'predicate': "FIRST_NAME = 'John'"
}

requests.post('/update', headers=headers, json=data)

The update statement gets generated under the hood based on the table, column, value and predicate specified.

You can update multiple columns at once by passing array values.

Deleting Data

To delete rows, similarly make a POST call to /delete:

data = {
  'table': 'customers',
  
  'predicate': "LAST_NAME = 'Doe'" 
}

requests.post('/delete', headers=headers, json=data)

The predicate defines which rows matching the condition will get deleted.

Calling Stored Procedures

Snowflake allows creating reusable stored procedures. Call them via the /exec-proc endpoint:

data = {
  'procedure': 'ADD_CUSTOMER',
  
  'parameters': [
    { 'name': 'p_first_name', 'value': 'John' },
    { 'name': 'p_last_name', 'value': 'Doe' }, 
    { 'name': 'p_email', 'value': '[email protected]' }
  ]
} 

response = requests.post('/exec-proc', headers=headers, json=data)

Pass the procedure name and its input parameters. The response will include any OUT parameters and result sets.

You can execute multi-statement procedures, UDFs, and other Snowflake programmatic constructs.

Handling Errors

In case of any errors, the REST API will return standard HTTP status codes like 400, 500, etc.

You can catch errors in Python like:

try:
  response = requests.post('/exec-proc', headers=headers, json=data)
  
  if response.status_code != 200:
    raise Exception(response.text)
      
  # Handle successful response

except Exception as e:
  print("Error: " + str(e))

This prints the error details. Common exceptions include invalid SQL, authentication issues, missing objects, etc.

For authorization failures, you may need to re-authenticate and obtain a fresh access token.

Summary

This guide covered the essential techniques for accessing Snowflake from Python using the REST API:

  • Authenticating with OAuth 2.0
  • Executing queries and fetching results
  • Running parameterized queries
  • Inserting, updating, and deleting data
  • Calling stored procedures
  • Handling errors and exceptions

The REST API provides a quick way to get Snowflake data into Python for analytics, reporting, ML and more. With proper error handling, it can be used to build robust ETL and data integration pipelines.

Example Code Summary

Here is a summary of the example Python code covered in this guide:

Authentication

import requests 

# Snowflake OAuth 2.0 Settings
OAUTH_HOST = '<your_account>.snowflakecomputing.com'
TOKEN_URL = 'https://{}/oauth/token'.format(OAUTH_HOST)
CLIENT_ID = '<your_client_id>'
CLIENT_SECRET = '<your_client_secret>'  

# Authenticate and get access token
data = {
  'grant_type': 'password', 
  'username': '<your_username>',
  'password': '<your_password>' 
}

response = requests.post(TOKEN_URL, 
                         data=data,
                         auth=(CLIENT_ID, CLIENT_SECRET))
                         
access_token = response.json()['access_token']

Executing Queries

QUERY = "SELECT * FROM customers" 

headers = {'Authorization': 'Bearer {}'.format(access_token)}
data = {'query': QUERY}

response = requests.post('/queries', headers=headers, json=data)
results = response.json()

Parameterized Queries

params = [
  {
    'name': 'p_customer_id',
    'type': 'NUMBER',
    'value': 123
  }
]

data = {
  'query': "SELECT * FROM customers WHERE id = :p_customer_id",
  'params': params 
}

Inserting Data

data = {
  'table': 'customers',
  'columns': ['FIRST_NAME', 'LAST_NAME', 'EMAIL'],
  'rows': [
    ['John', 'Doe', '[email protected]'],
    ['Jane', 'Doe', '[email protected]']
  ]
}

requests.post('/insert', headers=headers, json=data)

This summarizes the common patterns like executing statements, using parameters, inserting data, etc. Refer to the full examples for additional context.

External Resources

For more details on interacting with Snowflake via Python and the REST API, refer to these resources:

The Snowflake REST API enables seamless integration with Python for both analysis and ETL workloads. With proper error handling, it can provide reliable and scalable data access.

Leave a Comment