Ariel Pohoryles
NOV 12, 2024
icon
4 min read
Ingest data using Rivery

Product companies collect and analyze valuable data that their clients would love to integrate directly into their own data ecosystems. While most products offer access to the data via integrated reporting interfaces built natively or using embedded BI tools, some clients need direct access to the underlying analytical data. This direct access is essential for integration scenarios where clients need to analyze the data alongside their own data in warehouses or data lakes.

In this post, we’ll explore different methods for sharing analytical data housed in modern data warehouses like Google BigQuery or Snowflake with your clients.

The Challenge

Product companies often store analytical data within their own data warehouses, yet sharing this data outside of the product’s interface brings both technical and security challenges. Clients are increasingly looking to:

  • Integrate product analytics directly into their data systems
  • Perform custom analyses with their preferred tools
  • Combine product data with other business metrics
  • Automate data-driven workflows

Four Approaches to Data Sharing

1. Custom API Development

The Traditional Approach

Building a custom API remains the most common solution for exposing analytical data. This approach involves:

  • Creating REST or GraphQL endpoints that expose specific datasets
  • Implementing authentication and rate limiting
  • Maintaining API documentation
  • Supporting client integration efforts

Pros:

  • Full control over data access
  • Familiar integration pattern for most developers
  • Granular access control

Cons:

  • Requires significant development effort on top of the product core offerings
  • Ongoing maintenance overhead
  • Clients need to build their own data ingestion processes

2. Automated Data Pipeline Solution

The Modern Approach

Using tools like Rivery, companies can create direct data pipelines from their warehouse to their clients’ storage solutions.

Pros:

  • No custom development required
  • Flexible scheduling options or API-triggered data updates
  • Minimal client integration effort

Cons:

  • Requires sharing connection credentials. Note, Rivery makes it easy to share an external link to establish a connection to a certain storage target so the client won’t need to share their credentials with the product vendor Rivery user.  
  • May need separate accounts per client

3. Client-Controlled Environment

The Self-Service Approach

Providing clients with their own dedicated data pipeline environment offers more control over data replication. For example, in Rivery, the product vendor can add an environment for their client, pre-configured with a secured connection to Snowflake or BigQuery as the data sources. The client can then create their own data pipelines using a no-code experience. 

Pros:

  • Client maintains control over data sync
  • Flexible configuration options for the client
  • Reduced maintenance for the product company

Cons:

  • Requires client training
  • Additional overhead in environment management

4. Native Data Sharing Features

The Platform-Native Approach

Modern data warehouses offer built-in data sharing capabilities so data can be consumed by 3rd parties as long as they have an account for those data warehouses. For example:

Pros:

  • Native platform integration
  • Robust security controls
  • Minimal setup required

Cons:

  • Clients must use the same platform
  • May increase client costs
  • Limited to platform-specific features

Solution Comparison

Aspect

Custom API

Automated Pipeline

Client-Controlled Environment

Native Data Sharing

Implementation Effort

High

Low

Medium

Low

Maintenance Overhead

High

Low

Medium

Low

Client Technical Requirements

High

Low

Medium

Medium

Setup Time


Weeks/Months

Hours/Days

Days/Weeks

Hours/Days

Flexibility

High


 Medium


High

Low

Security Control

Custom

Tool-dependent

Tool-dependent

Platform-native

Cost Structure

Development + Infrastructure

 Per-pipeline data volume

Per-pipeline data volume

Platform-dependent

Client Independence

High

Medium

High

Low

Scalability

Custom

Built-in

Built-in

Built-in

Best For

Custom needs, high control

Quick implementation

Savvy clients

Same-platform clients

Choosing the Right Approach

When selecting a data-sharing strategy, consider:

  1. Client Technical Capability
    • Do they have the resources to integrate an API?
    • Are they familiar with data pipeline tools?
    • Do they use compatible data platforms?
  2. Data Volume and Frequency
    • How much data needs to be shared?
    • How often does it need to be updated?
    • What are the performance requirements?
  3. Security Requirements
    • What data governance policies apply?
    • How sensitive is the data?
    • What audit trails are needed?
  4. Implementation Effort
    • Available development resources
    • Maintenance capacity
    • Timeline constraints

Monetization Considerations

Adding data sharing capabilities can create new revenue streams but also incur additional costs. Consider the following options when making your choice to monetize this capability:

  • Tiered pricing based on data volume
  • Premium feature up-charges
  • API call quotas
  • Data freshness options
  • Added feature to premium plan

Recommendation

For many product companies, the automated data pipeline approach (Option 2) can offer the best balance of:

  • Implementation effort and rapid delivery time
  • Client usability and satisfaction
  • Maintenance overhead
  • Flexibility

This approach allows quick deployment while giving clients the freedom to use data as they see fit, making it an excellent starting point for data sharing initiatives.

Conclusion

As the demand for direct data access grows, product companies must evolve their data sharing capabilities. While multiple solutions exist, automated data pipelines offer a compelling mix of flexibility and ease of implementation. Whatever approach you choose, remember to factor in both technical requirements and business considerations to create a sustainable data sharing strategy.

Minimize the firefighting.
Maximize ROI on pipelines.

icon icon