Integrate any data with and for LLMs
Extract & load in minutes
Instantly connect to your data sources with 200+ managed integrationsEasily sync any data source to create effective AI apps based on your organization data
Prep data for LLM usage
Transform the data and feed your LLM with the ideal structure for its RAG workflowsRun push-down SQL, Python scripts or both in a single workflow
Setup workflows with ease
Trigger generative AI transformations right after ingestion dependenciesAccelerate your workflow development via no-code orchestration
Seamless structured and unstructured data ingestion
Integrate directly with LLMs to rapidly build reliable AI apps
Build personalized AI apps with Amazon Q
- Sync data into Amazon Q to create Retrieval based LLM apps
- Organize data in logical documents to improve RAG processing and AI referencing
- Trigger Amazon Q data syncs using Rivery’s kit to ensure data freshness
Run GenAI workflows within AI-enabled data warehouses
- Transform and analyze data using AI via simple SQL queries orchestrated via Rivery
- Leverage the native vector support offered by Snowflake Cortex and BigQuery’s Vertex AI as part of your pipelines
- Process and incrementally store vector data in Snowflake
Bring all your data to AI
- Use 200+ managed integrations to quickly ingest your data
- Configure your own custom connections without external solutions
- Use Rivery Copilot to generate new custom integrations
Katia Sebih,
Senior Data Engineer, at Welcome to the Jungle.
Arm yourself with AI pipeline resources
FAQs
For most generative AI applications based on LLMs, an AI pipeline involves the extraction of unstructured data from different sources, and preparation of that data so it could be used as part of an AI application (i.e. chatbot, copilot, or other) often via a RAG workflow, and finally the orchestration of that process so that the AI applications uses that data (behind the scenes that means storing it in a Vector database).
Generative AI applications that are based on LLMs will typically use unstructured data such as free-form text. This text can be located in files, specific databases/data warehouses columns, or as part of a response from API calls as a GET REST API call. The following article lists some common examples.
Retrieval augmented generation is an approach that combines traditional retrieval-based methods with generative models to enhance the quality and relevance of AI-generated content. In an RAG system, the model first retrieves relevant information from a large dataset or knowledge base and then uses this information to generate more accurate and contextually appropriate responses. This technique leverages the strengths of both retrieval and generation, allowing the AI to produce high-quality outputs even when the initial input data is sparse or ambiguous. Simply put, RAG workflows are designed to feed AI apps with contextual data that may not be publicly available so the AI responses are relevant to that data resulting in fewer “AI hallucinations”.
Some cloud data warehouses like Snowflake and Google BigQuery have incorporated generative AI capabilities (i.e. Cortex and Vertex AI) that can be executed using simple SQL queries executed on top of those warehouses. This greatly simplifies the complexities around building AI applications as those can be executed right on top of the data already stored in the warehouse. Common use cases for these capabilities involve building RAG workflows and performing advanced analytics such as text clustering or semantic search and text summarization.