# Tableau VizQL Data Service (VDS) API

The Tableau VDS API allows you to query published Tableau data sources programmatically and retrieve data as JSON for analysis in Python. This guide covers authentication, discovery, querying, filtering, and optimization strategies.

## When to Use VDS API

**Use VDS when:**
- You need to query data from a **published Tableau data source**
- You want to use that data outside of Tableau (e.g. in DataFrames, custom logic, visualizations)
- Your query must respect Tableau's data source logic (joins, calculations, row-level security, filters, extracts)

**VDS vs REST API:**
- REST API is typically used for metadata operations (list users, workbooks, etc.), but can also download data from visualizations -- either summary data or underlying data, as well as export PDFs and PNGs.
- VDS API is used for querying actual data from published data sources, just as Tableau visualizations do.

---

## Authentication

### Prerequisites

You have a Tableau Personal Access Token (PAT) in SECRETS with the following information:
- `server`: Your Tableau Server URL (e.g., "https://tableau.example.com")
- `tokenName`: PAT name
- `tokenSecret`: PAT secret

### Authentication Code

```python
import requests
import xml.etree.ElementTree as ET
import json

tableau_creds = json.loads(SECRETS['TABLEAU_PAT'])

# Sign in via Tableau REST API
signin_url = f"{tableau_creds['server']}/api/3.21/auth/signin"
signin_payload = {
    "credentials": {
        "personalAccessTokenName": tableau_creds["tokenName"],
        "personalAccessTokenSecret": tableau_creds["tokenSecret"],
        "site": {"contentUrl": ""},  # default site, or specify site name if needed
    }
}

resp = requests.post(signin_url, json=signin_payload)
resp.raise_for_status()

# Parse XML response for auth token and site id
ns = {"t": "http://tableau.com/api"}
root = ET.fromstring(resp.text)
credentials = root.find(".//t:credentials", ns)
auth_token = credentials.get("token")
site = root.find(".//t:site", ns)
site_id = site.get("id")

# Headers for VDS and REST calls
headers = {
    "X-Tableau-Auth": auth_token,
    "Content-Type": "application/json",
}

print(f"✓ Authenticated to Tableau Server: {tableau_creds['server']}")
print(f"✓ Site ID: {site_id}")
```

**Returns:**
- `server`: Tableau Server URL
- `auth_token`: Authentication token for API calls
- `site_id`: Site ID for REST API endpoints
- `headers`: Headers dict for requests

---

## Standard Workflow

The typical VDS workflow follows these steps:

1. **Authenticate** → Get auth token and site ID
2. **List data sources** → Find the right data source by name
3. **Read metadata** → Discover available fields and types
4. **Query data** → Retrieve rows with filters
5. **Analyze in Python** → Use pandas for analysis and visualization

---

## API Operations

### List Data Sources

Find all published data sources on your Tableau Server.

```python
import requests
import xml.etree.ElementTree as ET

# List all data sources via REST API (returns XML)
list_url = f"{tableau_creds['server']}/api/3.21/sites/{site_id}/datasources"
resp = requests.get(list_url, headers=headers)
resp.raise_for_status()

# Parse XML response
ns = {"t": "http://tableau.com/api"}
root = ET.fromstring(resp.text)
datasources = root.findall(".//t:datasource", ns)

# Extract data source info
ds_list = []
for ds in datasources:
    ds_list.append({
        'name': ds.get('name'),
        'id': ds.get('id'),
        'type': ds.get('type'),
        'contentUrl': ds.get('contentUrl')
    })

print(f"✓ Found {len(ds_list)} data sources")

# Example: Search for a specific data source
search_term = "Superstore"
matches = [ds for ds in ds_list if search_term.lower() in ds['name'].lower()]
```

**Returns:**
- `ds_list`: List of dicts with data source metadata
  - `name`: Data source name
  - `id`: Data source LUID (use for VDS calls)
  - `type`: Data source type (e.g., 'excel-direct')
  - `contentUrl`: URL-safe name

---
### Read Metadata and Data Model

Discover what fields/columns are available and understand the data model structure before querying.

```python
import requests

# Read metadata from a data source
datasource_luid = "your-datasource-id-here"  # Get from list_datasources

# Get field metadata
metadata_url = f"{tableau_creds['server']}/api/v1/vizql-data-service/read-metadata"
metadata_payload = {
    "datasource": {
        "datasourceLuid": datasource_luid,
    }
}

resp = requests.post(metadata_url, headers=headers, json=metadata_payload)
resp.raise_for_status()

meta = resp.json()
fields = meta.get("data", [])

print(f" Found {len(fields)} fields in data source")

# Create lookup dict by field caption
field_by_caption = {f["fieldCaption"]: f for f in fields}

# Show field details
for field in fields[:10]:  # First 10 fields
    print(f"  - {field['fieldCaption']} ({field['dataType']}) [{field.get('columnClass', 'N/A')}]")

# Get data model (relationships between tables)
model_url = f"{tableau_creds['server']}/api/v1/vizql-data-service/get-datasource-model"
model_payload = {
    "datasource": {
        "datasourceLuid": datasource_luid,
    }
}

resp = requests.post(model_url, headers=headers, json=model_payload)
resp.raise_for_status()

model = resp.json()
tables = model.get("logicalTables", [])
relationships = model.get("logicalTableRelationships", [])

print(f"\n Found {len(tables)} logical tables")
print(f" Found {len(relationships)} relationships")

# Create table lookup
table_by_id = {t["logicalTableId"]: t["caption"] for t in tables}

# Show table relationships
print("\nTable Relationships:")
for rel in relationships[:10]:  # First 10 relationships
    from_table = table_by_id.get(rel["fromLogicalTable"]["logicalTableId"], "Unknown")
    to_table = table_by_id.get(rel["toLogicalTable"]["logicalTableId"], "Unknown")
    print(f"  {from_table} → {to_table}")

# Build relationship graph for validation
from collections import defaultdict
relationship_graph = defaultdict(set)
for rel in relationships:
    from_id = rel["fromLogicalTable"]["logicalTableId"]
    to_id = rel["toLogicalTable"]["logicalTableId"]
    relationship_graph[from_id].add(to_id)
    relationship_graph[to_id].add(from_id)  # Bidirectional

print(f"\n Built relationship graph with {len(relationship_graph)} connected tables")
```

**Returns from read-metadata:**
- `fields`: List of field metadata dicts
  - `fieldCaption`: Display name (use this in queries)
  - `fieldName`: Internal name
  - `dataType`: STRING, INTEGER, REAL, DATE, etc.
  - `columnClass`: COLUMN, CALCULATION, BIN, etc.
  - `formula`: For calculated fields (if applicable)
  - `logicalTableId`: Which table this field belongs to
- `field_by_caption`: Dict for easy field lookup by caption

**Returns from get-datasource-model:**
- `logicalTables`: List of table metadata dicts
  - `logicalTableId`: Unique identifier for the table
  - `caption`: Display name of the table
- `logicalTableRelationships`: List of relationship dicts
  - `fromLogicalTable`: Source table in the relationship
  - `toLogicalTable`: Target table in the relationship
- `relationship_graph`: Dict mapping table IDs to connected table IDs

**Important Notes:**
- Always use `fieldCaption` (display name) in queries, not `fieldName`
- Use the relationship graph to validate that fields from different tables can be joined
- The model shows the actual Tableau relationships, not inferred connections
- This prevents invalid queries by checking if tables are connected before querying

---

### Query Data

Retrieve data from a published data source.

```python
import requests
import pandas as pd

# Query data from a data source
datasource_luid = "your-datasource-id-here"  # Get from list_datasources

query_url = f"{tableau_creds['server']}/api/v1/vizql-data-service/query-datasource"

query_payload = {
    "datasource": {
        "datasourceLuid": datasource_luid,
    },
    "query": {
        "fields": [
            {"fieldCaption": "Field Name 1"},
            {"fieldCaption": "Field Name 2"},
            # Add more fields as needed
        ],
        # Optional: Add filters here (see Filtering Strategies section)
        # "filters": [...]
    },
    "options": {
        "returnFormat": "OBJECTS",
    },
}

resp = requests.post(query_url, headers=headers, json=query_payload)
resp.raise_for_status()

raw = resp.json()
rows = raw.get("data", [])

# Convert to DataFrame
df = pd.DataFrame(rows)

print(f"✓ Retrieved {len(df)} rows")
print(f"✓ Columns: {list(df.columns)}")
```

**Returns:**
- `df`: pandas DataFrame with query results

**Important Notes:**
- Use `fieldCaption` from metadata, not made-up names
- `limit` and `maxRows` are NOT supported in some Tableau versions
- Limit rows in pandas after retrieval: `df.head(n)`
- Sort in pandas after retrieval: `df.sort_values()`
- Filters are optional but recommended to reduce data volume

---

## Filtering Strategies

### Filter Decision Tree

```
Do you have exact values from metadata or domain probe?
├─ YES → Use SET filter (fastest, case-insensitive)
│
└─ NO → What kind of matching do you need?
    ├─ Simple substring search → Use CONTAINS + LOWER
    ├─ Complex pattern matching → Use REGEXP_MATCH
    ├─ Numeric/date range → Use custom calculation
    └─ Not sure what values exist? → Do domain probe first
```

### 1. SET Filter - Exact Match (Fastest)

Use when you have exact values from metadata or domain probe.

```python
set_filter = {
    "field": {"fieldCaption": "Category"},
    "filterType": "SET",
    "values": ["Technology", "Furniture"]  # Case-insensitive, exact spacing required
}
```

**Characteristics:**
- Fastest performance
- Case-insensitive
- Requires exact spacing
- Best for known values

---

### 2. CONTAINS Filter - Substring Search

Use for user search terms and partial matching.

```python
search_term = "tech"
contains_filter = {
    "field": {
        "calculation": f"INT(CONTAINS(LOWER([Category]),'{search_term.lower()}'))"
    },
    "filterType": "QUANTITATIVE_NUMERICAL",
    "quantitativeFilterType": "MIN",
    "min": 1
}
```

**Characteristics:**
- Case-insensitive with LOWER()
- Good for user input
- Moderate performance

---

### 3. REGEXP Filter - Pattern Matching

Use for complex patterns, starts with, ends with, OR logic.

```python
pattern = "(?i)^tech"  # Starts with "tech" (case-insensitive)
regexp_filter = {
    "field": {
        "calculation": f"INT(REGEXP_MATCH([Category], '{pattern}'))"
    },
    "filterType": "QUANTITATIVE_NUMERICAL",
    "quantitativeFilterType": "MIN",
    "min": 1
}
```

**Characteristics:**
- Powerful pattern matching
- Use `(?i)` for case-insensitive
- Slower than SET or CONTAINS

---

### 4. Exact Match with Case Normalization

```python
exact_filter = {
    "field": {
        "calculation": "INT(UPPER([Category]) = 'TECHNOLOGY')"
    },
    "filterType": "QUANTITATIVE_NUMERICAL",
    "quantitativeFilterType": "MIN",
    "min": 1
}
```

---

### 5. Multiple Conditions (OR Logic)

```python
or_filter = {
    "field": {
        "calculation": "INT(CONTAINS(LOWER([Category]),'tech') OR CONTAINS(LOWER([Category]),'furniture'))"
    },
    "filterType": "QUANTITATIVE_NUMERICAL",
    "quantitativeFilterType": "MIN",
    "min": 1
}
```

---

### 6. Numeric Range Filter

```python
range_filter = {
    "field": {
        "calculation": "INT([Sales] > 100 AND [Sales] < 1000)"
    },
    "filterType": "QUANTITATIVE_NUMERICAL",
    "quantitativeFilterType": "MIN",
    "min": 1
}
```

---

### Filter Performance Ranking

**Performance:** SET > Simple calc > CONTAINS > REGEXP > Complex calc

---

## Advanced Patterns

### Domain Probe Pattern (Fuzzy Matching with Optimal Performance)

Use when user provides fuzzy/partial text and you want optimal query performance.

**Pattern:** Query all unique dimension values → Fuzzy match in Python → Use SET filter with exact values

```python
import requests
import pandas as pd
from difflib import get_close_matches

# Step 1: Domain probe - get all unique values from a dimension
query_url = f"{tableau_creds['server']}/api/v1/vizql-data-service/query-datasource"

probe_payload = {
    "datasource": {"datasourceLuid": datasource_luid},
    "query": {
        "fields": [{"fieldCaption": "Category"}],  # Dimension to probe
    },
    "options": {"returnFormat": "OBJECTS"},
}

resp = requests.post(query_url, headers=headers, json=probe_payload)
resp.raise_for_status()
rows = resp.json().get("data", [])
df_probe = pd.DataFrame(rows)

all_values = df_probe['Category'].unique()
print(f"✓ Found {len(all_values)} unique values")

# Step 2: Fuzzy match user input in Python
user_input = "tech"  # User's search term
matches = get_close_matches(
    user_input.lower(), 
    [v.lower() for v in all_values],
    n=5,  # Top 5 matches
    cutoff=0.6  # Similarity threshold (0-1)
)

# Step 3: Map back to original case-sensitive values
exact_values = [v for v in all_values if v.lower() in matches]
print(f"✓ Matched values: {exact_values}")

# Step 4: Use SET filter with exact values (fastest query)
final_query_payload = {
    "datasource": {"datasourceLuid": datasource_luid},
    "query": {
        "fields": [
            {"fieldCaption": "Category"},
            {"fieldCaption": "Sales"},
        ],
        "filters": [
            {
                "field": {"fieldCaption": "Category"},
                "filterType": "SET",
                "values": exact_values  # Exact matched values
            }
        ],
    },
    "options": {"returnFormat": "OBJECTS"},
}

resp = requests.post(query_url, headers=headers, json=final_query_payload)
resp.raise_for_status()
df = pd.DataFrame(resp.json().get("data", []))

print(f"✓ Retrieved {len(df)} rows with matched categories")
```

**When to use domain probe:**
- Dimension has manageable unique values (< 10,000)
- Want fastest subsequent query performance
- Need to provide suggestions/autocomplete to user
- Want full control over fuzzy matching logic

**When to skip:**
- Dimension has millions of unique values
- Need immediate results without two-step process
- CONTAINS or REGEXP filter is sufficient

---

## Performance Optimization

### Best Practices

1. **Request only needed fields** - Don't query all columns
2. **Use filters to reduce data volume** - Filter at source, not in pandas
3. **Prefer SET filters** when you have exact values (fastest)
4. **Use domain probes** for fuzzy matching with optimal performance
5. **Limit results in pandas** after retrieval (VDS limit not supported in some versions)
6. **Cache authentication tokens** - Don't re-authenticate for every query
7. **Batch queries** when possible to reduce API calls

### Performance Tips

- **SET filters** are case-insensitive and fastest
- **CONTAINS** is slower than SET but faster than REGEXP
- **REGEXP** is powerful but slowest
- **Custom calculations** vary in performance based on complexity
- **Domain probes** add an extra query but enable fastest subsequent queries

---

## Error Handling

### Common Errors and Solutions

| Error | Cause | Solution |
|-------|-------|----------|
| **401 Unauthorized** | Auth token expired | Re-authenticate to get new token |
| **404 Not Found** | Invalid data source LUID or VDS not enabled | Verify LUID from list_datasources |
| **404 "Unrecognized field"** | Feature not supported in this Tableau version | Use alternative approach or upgrade |
| **Empty data array** | No rows matched filters | Check filter values against metadata |
| **Field not found** | Invalid field name | Use `fieldCaption` from metadata, not made-up names |
| **XML parse error** | Wrong API endpoint | REST API returns XML, VDS returns JSON |

### Error Handling Pattern

```python
try:
    resp = requests.post(query_url, headers=headers, json=query_payload)
    resp.raise_for_status()
    data = resp.json()
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 401:
        print("Authentication expired, re-authenticating...")
        # Re-run authentication code
    elif e.response.status_code == 404:
        print("Data source not found or VDS not enabled")
    else:
        print(f"HTTP error: {e}")
        print(f"Response: {e.response.text}")
except Exception as e:
    print(f"Error: {e}")
```
---
## Important Notes

### API Behavior

- **REST API returns XML**, VDS returns **JSON**
- Use `fieldCaption` (display name) in queries, not `fieldName`
- SET filters are **case-insensitive** but require exact spacing
- VDS respects user permissions and row-level security
- Some Tableau versions don't support `limit` or `maxRows` in queries

### Data Considerations

- **Aliases vs raw values**: Use domain probes to discover actual values
- **Calculated fields**: Available in metadata with `formula` property
- **Aggregations**: Tableau handles aggregations based on data source logic
- **Extracts**: VDS queries extracts if published data source uses them

### Security

- Store PAT credentials securely (environment variables, secrets manager)
- Auth tokens expire - implement re-authentication logic
- VDS respects Tableau's row-level security and user permissions
- Never log or expose PAT secrets in code or logs

---

## Additional Resources

### VizQL Data Service (VDS) API Documentation

- **[VizQL Data Service Introduction](https://help.tableau.com/current/api/vizql-data-service/en-us/index.html)** - Official VDS API overview and getting started guide
- **[Query a Data Source](https://help.tableau.com/current/api/vizql-data-service/en-us/docs/vds_create_queries.html)** - Detailed documentation on constructing VDS queries
- **[Get Data Source Information](https://help.tableau.com/current/api/vizql-data-service/en-us/docs/vds_get_data_source_info.html)** - Reading metadata from published data sources
- **[Configuration](https://help.tableau.com/current/api/vizql-data-service/en-us/docs/vds_configuration.html)** - VDS configuration and setup requirements

### Tableau REST API Documentation

- **[REST API Reference](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref.htm)** - Complete REST API reference
- **[Authentication Methods](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref_authentication.htm)** - Authentication endpoints including Personal Access Tokens
- **[Authentication Concepts](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_concepts_auth.htm)** - Understanding Tableau authentication flows
- **[Data Sources Methods](https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_ref_data_sources.htm)** - REST API endpoints for listing and managing data sources

### Related Documentation

- **[Tableau Server VizQL Data Service](https://help.tableau.com/current/server/en-us/server_process_vizql-data-service.htm)** - Understanding the VizQL Data Service process