ReductStore & Mosaico

Integration Analysis for the TwinSight Platform
ReductStore — Time-Series Blob Storage Mosaico — Robotics Data Platform

ReductStore — What It Is

ReductStore is an open-source, time-series object/blob storage engine purpose-built for robotics and Industrial IoT. Traditional databases force you to choose: either you store structured metrics well (TimescaleDB, InfluxDB) or you store files/blobs well (MinIO, S3). ReductStore does both — it stores binary data (images, video frames, point clouds, MCAP recordings, sensor dumps) indexed by timestamp with metadata labels.

Simple analogy: Imagine a filing cabinet where every drawer is labeled with a timestamp. Inside each drawer you can put anything — a photo, a sensor reading, a video clip, a 3D point cloud. And you can instantly ask "give me everything from drawer 2:00 PM to 2:05 PM" or "give me all drawers labeled 'error'." That's ReductStore.

Time-Indexed Blob Storage

Every piece of data is stored with a timestamp and optional metadata labels (key-value pairs like robot_id=07, severity=high). Query by time range, filter by labels — without needing a separate database for the index.

Core feature

Native ROS 2 Agent

Ships with reductstore_agent — a ROS 2 node that subscribes to selected topics and records them directly into ReductStore. No custom code needed. It stores raw ROS messages with schema metadata attached.

ROS 2 ready

FIFO Retention & Quotas

Set a maximum storage size per "bucket." When the limit is reached, the oldest data is automatically deleted. Perfect for edge devices and on-premise deployments with fixed disk space — no manual cleanup needed.

Operations

Server-Side Filtering

Queries are filtered on the server before data is sent over the network. For example, extracting specific MCAP topics or filtering CSV columns. This dramatically reduces bandwidth for large datasets.

Performance

Replication & HA

Built-in data replication between instances (primary/secondary) for high availability. Read-only replicas can be added for scaling read throughput across multiple consumers.

Scalability

Multi-Language SDKs

Client libraries for Python, Rust, C++, JavaScript/Node.js, and Go. The Python and C++ SDKs are particularly relevant — Python for FastAPI backend, C++ for ROS 2 nodes.

Developer experience

Performance vs. TwinSight's Current Stack

ReductStore was benchmarked against the exact technologies TwinSight proposes:

ReductStore vs. TimescaleDB

  • Up to 1604% faster writes for 1MB objects
  • Up to 671% faster reads for 1MB objects
  • TimescaleDB wins for structured metrics and SQL queries
  • ReductStore wins for any blob >10KB (images, point clouds, recordings)

ReductStore vs. MinIO

  • Consistently faster on blob throughput (time-series access patterns)
  • MinIO wins for pure S3-compatible cloud storage
  • ReductStore adds time-based querying that MinIO lacks
  • Best combo: ReductStore for active streams, MinIO for long-term archival
Note: Benchmark figures are from ReductStore's own comparative tests. Real-world results may vary based on workload, hardware, and configuration.

Mosaico — What It Is

Mosaico is an open-source data platform for robotics and Physical AI, built by Mosaico Labs. It's not a fleet management UI or a dashboard — it's a data pipeline engine that sits between your raw robot sensor data and your ML/analytics workflows. Think of it as the "plumbing" that takes messy, unsynchronized, multi-modal sensor data and transforms it into clean, aligned, ML-ready datasets.

Simple analogy: Imagine 10 robots each recording video, LIDAR scans, and position data — but their clocks aren't perfectly synced, their formats differ, and some recordings have gaps. Mosaico is like a video editor that automatically aligns all the footage by timestamp, converts everything to a common format, and exports a clean, ready-to-use dataset for your AI team.

Rust-Powered Server (mosaicod)

The core daemon is written in Rust for maximum performance. It handles data conversion, compression, and storage operations. Rust's memory safety guarantees prevent crashes in long-running production workloads.

Architecture

Apache Arrow Columnar Storage

Uses Apache Arrow as its data interchange format — the same columnar format used by Pandas, Spark, and most modern data tools. This means zero-copy data access and seamless interop with Python ML workflows.

Data format

Sensor Data Synchronization

Automatically aligns multi-modal sensor streams that have different frequencies and clock sources. A camera at 30fps and a LIDAR at 10Hz get time-aligned into a unified dataset.

Core feature

Python SDK

A Python client library for ingesting data, querying the server, and exporting datasets. Fits naturally into Jupyter notebooks, training scripts, and FastAPI backends.

Developer experience

ML Pipeline Automation

Transforms raw sensor recordings into flattened, aligned datasets ready for model training. Eliminates the manual data wrangling that typically consumes 60–80% of ML engineering time.

ML integration

Coordinate Transforms (refx)

Companion library refx provides compile-time safe coordinate transformations in C++ — essential for robotics where converting between reference frames (map/odom/base_link) is error-prone.

Robotics-native

How They Fit Into TwinSight

Here's exactly where each tool plugs into TwinSight's architecture and what development work it eliminates.

Updated Architecture — Integration Points

ROBOTS (ROS 2) Robot 01 Robot 02 Robot N ROS Adapter Commands + structured telemetry ReductStore Agent ROS 2 node — records raw topics directly NEW KAFKA EVENT BACKBONE CORE BACKEND SERVICES FastAPI Mission Orch. Alert Svc Telemetry Svc WS Gateway ML Module (anomaly + reports) DATA STORAGE LAYER Redis PostgreSQL TimescaleDB ReductStore NEW Mosaico NEW MinIO (optional archive) WEB INTERFACE (SPA) Dashboard Live Map Missions Alerts Replay / Debug Viewer ML Data Explorer / Reports Raw blob storage

ReductStore Integration Points

1. Replace MinIO for Active Telemetry Blobs

The current proposal uses MinIO for storing ML-generated reports. But there's a missing piece: where do camera frames, LIDAR scans, point clouds, and video streams go? TimescaleDB handles structured metrics (battery voltage, x/y position) well, but it's not designed for large binary blobs. ReductStore fills this gap natively — storing binary telemetry indexed by timestamp with metadata labels like robot_id, sensor_type, and mission_id.

Development time saved: Without ReductStore, you'd need to build a custom solution combining MinIO (blob storage) + a metadata index in PostgreSQL (to query by time/robot/label) + a cleanup job (FIFO retention). ReductStore provides all three out of the box.

Replaces: Custom blob pipeline

2. Deploy ReductStore Agent Alongside ROS Adapter

The reductstore_agent is a ready-made ROS 2 node. Deploy it as a sidecar container next to the ROS Adapter. Configure it to subscribe to high-bandwidth topics (camera images, LIDAR point clouds, costmaps) and record them directly to ReductStore. The ROS Adapter continues handling structured telemetry (position, battery, state) → Kafka. This splits the data flow by nature: structured metrics go through Kafka, raw blobs go directly to ReductStore.

Development time saved: You don't need to write any code for blob ingestion from ROS 2. The agent is configurable via YAML — select topics, set recording frequency, add labels.

Replaces: Custom ROS 2 recording node

3. Power the Replay / Debug Feature

The proposal mentions "replay operațional" (reconstructing past sequences). With ReductStore, this becomes a query: "give me all data from robot_05, mission_42, between 14:00 and 14:15" — and you get timestamped camera frames, LIDAR scans, and diagnostic snapshots back in order. Combined with TimescaleDB's structured metrics for the same timeframe, you have a complete replay capability.

Development time saved: Building a replay system from scratch (syncing blobs with metrics, handling time alignment, managing storage) would be a significant effort. ReductStore's time-indexed queries + label filtering provide 80% of the work already done.

Enables: Incident replay & debugging

4. Grafana Integration for Blob Visualization

ReductStore ships with a Grafana plugin that can display stored images and extract ROS 2 messages as JSON for dashboard panels. This means operators could see camera snapshots alongside telemetry charts in a single Grafana dashboard — useful for post-incident analysis.

Bonus: Observability enhancement

Mosaico Integration Points

1. Data Pipeline for the ML Module

The current proposal says the external ML module accesses TimescaleDB for "historical telemetry and numerical data." But ML models need synchronized, multi-modal datasets — not raw database queries. Mosaico sits between the storage layer and the ML module, pulling data from TimescaleDB (structured metrics) and ReductStore (blobs), synchronizing them by timestamp, and outputting clean Apache Arrow datasets ready for model training.

Development time saved: Data preparation typically consumes 60–80% of ML engineering time. Mosaico automates the most tedious parts — time-alignment, format conversion, gap handling, and flattening.

Replaces: Custom ETL scripts

2. Anomaly Detection Data Preparation

The proposal's anomaly detection feature requires the ML module to analyze historical patterns. Mosaico can create curated datasets: "give me all sensor readings, camera frames, and navigation data for every time a robot entered an error state" — automatically aligned and formatted. This accelerates the anomaly detection model's training cycle significantly.

Enhances: Anomaly detection pipeline

3. Report Generation Enhancement

The current report flow is: UI → Backend → ML Module → MinIO → UI. With Mosaico, the ML module can query pre-processed, synchronized datasets instead of raw database tables. This simplifies report generation code and produces more accurate fleet performance analytics (because the data is properly time-aligned across sensors).

Enhances: Report quality & speed

4. Apache Arrow Interop

Mosaico outputs Apache Arrow format, which is directly consumable by Python's Pandas, PyArrow, Polars, and most ML frameworks (PyTorch DataLoaders, TensorFlow Datasets). The ML module's Python code can work with Mosaico outputs without any format conversion — zero-copy memory access to columnar data.

Developer experience

What Gets Replaced or Reduced

TwinSight ComponentCurrent ProposalWith ReductStore + MosaicoDev Time Impact
Binary telemetry storage (images, LIDAR, video) Not addressed (gap in proposal) ReductStore — native time-series blob storage Saves weeks — would need custom pipeline otherwise
ROS 2 blob recording Would need custom ROS 2 node + storage logic ReductStore Agent — drop-in ROS 2 node Saves 1–2 weeks — zero custom code
Operational replay Mentioned but no implementation detail ReductStore time-range queries + label filtering Saves 2–3 weeks — query API already exists
FIFO retention / data cleanup Would need custom cron jobs ReductStore built-in quota + FIFO Saves days — configuration only
ML data preparation pipeline ML module queries TimescaleDB directly Mosaico — synchronized, Arrow-formatted datasets Saves 3–4 weeks — replaces custom ETL
Multi-sensor time alignment Not addressed Mosaico — automatic synchronization Saves 1–2 weeks — complex to build from scratch
MinIO (report file storage) MinIO for ML report files Keep MinIO for reports, OR use ReductStore for everything No change — MinIO still valid for cold archival
TimescaleDB (structured metrics) TimescaleDB for telemetry history Keep TimescaleDB for structured metrics; ReductStore handles blobs No change — complementary roles

Estimated Total Dev Time Savings

Conservative estimates based on a mid-size team (3–5 developers):

ReductStore integrations4–6 weeks saved
Mosaico integrations4–6 weeks saved

Combined: Roughly 8–12 weeks of development time that shifts from "build from scratch" to "integrate and configure." This also reduces maintenance burden — bugs and performance improvements come from the open-source community rather than your team.

Updated Data Flow With Both Tools

Telemetry Flow — Structured vs. Blob Split

The key architectural change: split telemetry into two parallel paths based on data type.

Path A: Structured Metrics (unchanged)

1

Robot publishes position, battery, state

2

ROS Adapter → Kafka events

3

Redis (live) + TimescaleDB (history)

4

WebSocket → Dashboard

Path B: Binary Blobs (NEW)

1

Robot publishes camera, LIDAR, costmap

2

ReductStore Agent records directly

3

ReductStore (time-indexed + labeled)

4

Mosaico syncs with metrics → ML ready

ML Pipeline Flow — With Mosaico

1

Mosaico pulls from both data stores

Queries TimescaleDB for structured metrics (position, battery, mission states) and ReductStore for binary data (camera frames, LIDAR scans) over the same time range.

2

Time-aligns and synchronizes

Camera at 30fps, LIDAR at 10Hz, position at 50Hz — Mosaico aligns them by timestamp, interpolating where needed, handling gaps and missing data.

3

Outputs Apache Arrow datasets

Clean, columnar, ML-ready data. Directly loadable by PyTorch, TensorFlow, or any Python ML framework — no additional data wrangling needed.

4

ML Module consumes clean datasets

Anomaly detection models train on properly aligned multi-modal data. Report generation uses pre-aggregated, synchronized fleet metrics. Both become faster and more accurate.

Final Verdict & Recommendations

Verdict — ReductStore

Strong recommendation to integrate. ReductStore fills a genuine gap in the current TwinSight proposal — binary telemetry storage is simply not addressed. The native ROS 2 agent is a significant accelerator, and the FIFO retention, time-indexed queries, and label-based filtering eliminate multiple components that would otherwise need custom development. The Apache 2.0 license means no commercial restrictions. The main caveat: it's a younger project than MinIO or TimescaleDB, so evaluate the community size and support level before committing to it for production.

Verdict — Mosaico

Promising but evaluate maturity carefully. Mosaico addresses a real pain point — ML data preparation is notoriously time-consuming. The Rust server + Apache Arrow approach is technically sound. However, Mosaico is an earlier-stage project. Before committing, verify: (1) How mature is the Python SDK? (2) Does it handle the specific data types TwinSight needs (ROS 2 messages, point clouds)? (3) How well does it integrate with ReductStore as a data source? If the maturity is insufficient, consider using just ReductStore + custom Python scripts (using PyArrow directly) for the initial release, and plan to integrate Mosaico in a later phase when it's more battle-tested.

Combined Value

Together, ReductStore and Mosaico form a "raw data → ML-ready pipeline" that TwinSight's proposal currently lacks entirely. ReductStore handles the "store everything the robot produces" problem, while Mosaico handles the "make that data useful for ML" problem. This combination is particularly powerful for the anomaly detection and predictive path features mentioned in the proposal — those ML models are only as good as the training data pipeline feeding them.

Recommended Integration Phases

PhaseWhat to IntegrateEffortRisk
Phase 1
With initial backend
Deploy ReductStore + ReductStore Agent as containers in docker-compose. Configure agent for camera/LIDAR topics. Add ReductStore Python SDK to FastAPI for query endpoints. 1–2 weeks Low — mature enough for production
Phase 2
With replay feature
Build replay API using ReductStore time-range queries + TimescaleDB metrics. Add Grafana plugin for blob visualization in operational dashboards. 1–2 weeks Low
Phase 3
With ML module
Evaluate Mosaico maturity. If ready: deploy Mosaico server, connect to ReductStore + TimescaleDB, integrate Python SDK with ML module. If not ready: use PyArrow + custom sync scripts as interim solution. 2–3 weeks Medium — depends on Mosaico maturity