ReductStore is an open-source, time-series object/blob storage engine purpose-built for robotics and Industrial IoT. Traditional databases force you to choose: either you store structured metrics well (TimescaleDB, InfluxDB) or you store files/blobs well (MinIO, S3). ReductStore does both — it stores binary data (images, video frames, point clouds, MCAP recordings, sensor dumps) indexed by timestamp with metadata labels.
Every piece of data is stored with a timestamp and optional metadata labels (key-value pairs like robot_id=07, severity=high). Query by time range, filter by labels — without needing a separate database for the index.
Ships with reductstore_agent — a ROS 2 node that subscribes to selected topics and records them directly into ReductStore. No custom code needed. It stores raw ROS messages with schema metadata attached.
Set a maximum storage size per "bucket." When the limit is reached, the oldest data is automatically deleted. Perfect for edge devices and on-premise deployments with fixed disk space — no manual cleanup needed.
OperationsQueries are filtered on the server before data is sent over the network. For example, extracting specific MCAP topics or filtering CSV columns. This dramatically reduces bandwidth for large datasets.
PerformanceBuilt-in data replication between instances (primary/secondary) for high availability. Read-only replicas can be added for scaling read throughput across multiple consumers.
ScalabilityClient libraries for Python, Rust, C++, JavaScript/Node.js, and Go. The Python and C++ SDKs are particularly relevant — Python for FastAPI backend, C++ for ROS 2 nodes.
Developer experienceReductStore was benchmarked against the exact technologies TwinSight proposes:
Mosaico is an open-source data platform for robotics and Physical AI, built by Mosaico Labs. It's not a fleet management UI or a dashboard — it's a data pipeline engine that sits between your raw robot sensor data and your ML/analytics workflows. Think of it as the "plumbing" that takes messy, unsynchronized, multi-modal sensor data and transforms it into clean, aligned, ML-ready datasets.
The core daemon is written in Rust for maximum performance. It handles data conversion, compression, and storage operations. Rust's memory safety guarantees prevent crashes in long-running production workloads.
ArchitectureUses Apache Arrow as its data interchange format — the same columnar format used by Pandas, Spark, and most modern data tools. This means zero-copy data access and seamless interop with Python ML workflows.
Data formatAutomatically aligns multi-modal sensor streams that have different frequencies and clock sources. A camera at 30fps and a LIDAR at 10Hz get time-aligned into a unified dataset.
Core featureA Python client library for ingesting data, querying the server, and exporting datasets. Fits naturally into Jupyter notebooks, training scripts, and FastAPI backends.
Developer experienceTransforms raw sensor recordings into flattened, aligned datasets ready for model training. Eliminates the manual data wrangling that typically consumes 60–80% of ML engineering time.
ML integrationCompanion library refx provides compile-time safe coordinate transformations in C++ — essential for robotics where converting between reference frames (map/odom/base_link) is error-prone.
Here's exactly where each tool plugs into TwinSight's architecture and what development work it eliminates.
The current proposal uses MinIO for storing ML-generated reports. But there's a missing piece: where do camera frames, LIDAR scans, point clouds, and video streams go? TimescaleDB handles structured metrics (battery voltage, x/y position) well, but it's not designed for large binary blobs. ReductStore fills this gap natively — storing binary telemetry indexed by timestamp with metadata labels like robot_id, sensor_type, and mission_id.
Development time saved: Without ReductStore, you'd need to build a custom solution combining MinIO (blob storage) + a metadata index in PostgreSQL (to query by time/robot/label) + a cleanup job (FIFO retention). ReductStore provides all three out of the box.
Replaces: Custom blob pipelineThe reductstore_agent is a ready-made ROS 2 node. Deploy it as a sidecar container next to the ROS Adapter. Configure it to subscribe to high-bandwidth topics (camera images, LIDAR point clouds, costmaps) and record them directly to ReductStore. The ROS Adapter continues handling structured telemetry (position, battery, state) → Kafka. This splits the data flow by nature: structured metrics go through Kafka, raw blobs go directly to ReductStore.
Development time saved: You don't need to write any code for blob ingestion from ROS 2. The agent is configurable via YAML — select topics, set recording frequency, add labels.
Replaces: Custom ROS 2 recording nodeThe proposal mentions "replay operațional" (reconstructing past sequences). With ReductStore, this becomes a query: "give me all data from robot_05, mission_42, between 14:00 and 14:15" — and you get timestamped camera frames, LIDAR scans, and diagnostic snapshots back in order. Combined with TimescaleDB's structured metrics for the same timeframe, you have a complete replay capability.
Development time saved: Building a replay system from scratch (syncing blobs with metrics, handling time alignment, managing storage) would be a significant effort. ReductStore's time-indexed queries + label filtering provide 80% of the work already done.
Enables: Incident replay & debuggingReductStore ships with a Grafana plugin that can display stored images and extract ROS 2 messages as JSON for dashboard panels. This means operators could see camera snapshots alongside telemetry charts in a single Grafana dashboard — useful for post-incident analysis.
Bonus: Observability enhancementThe current proposal says the external ML module accesses TimescaleDB for "historical telemetry and numerical data." But ML models need synchronized, multi-modal datasets — not raw database queries. Mosaico sits between the storage layer and the ML module, pulling data from TimescaleDB (structured metrics) and ReductStore (blobs), synchronizing them by timestamp, and outputting clean Apache Arrow datasets ready for model training.
Development time saved: Data preparation typically consumes 60–80% of ML engineering time. Mosaico automates the most tedious parts — time-alignment, format conversion, gap handling, and flattening.
Replaces: Custom ETL scriptsThe proposal's anomaly detection feature requires the ML module to analyze historical patterns. Mosaico can create curated datasets: "give me all sensor readings, camera frames, and navigation data for every time a robot entered an error state" — automatically aligned and formatted. This accelerates the anomaly detection model's training cycle significantly.
Enhances: Anomaly detection pipelineThe current report flow is: UI → Backend → ML Module → MinIO → UI. With Mosaico, the ML module can query pre-processed, synchronized datasets instead of raw database tables. This simplifies report generation code and produces more accurate fleet performance analytics (because the data is properly time-aligned across sensors).
Enhances: Report quality & speedMosaico outputs Apache Arrow format, which is directly consumable by Python's Pandas, PyArrow, Polars, and most ML frameworks (PyTorch DataLoaders, TensorFlow Datasets). The ML module's Python code can work with Mosaico outputs without any format conversion — zero-copy memory access to columnar data.
Developer experience| TwinSight Component | Current Proposal | With ReductStore + Mosaico | Dev Time Impact |
|---|---|---|---|
| Binary telemetry storage (images, LIDAR, video) | Not addressed (gap in proposal) | ReductStore — native time-series blob storage | Saves weeks — would need custom pipeline otherwise |
| ROS 2 blob recording | Would need custom ROS 2 node + storage logic | ReductStore Agent — drop-in ROS 2 node | Saves 1–2 weeks — zero custom code |
| Operational replay | Mentioned but no implementation detail | ReductStore time-range queries + label filtering | Saves 2–3 weeks — query API already exists |
| FIFO retention / data cleanup | Would need custom cron jobs | ReductStore built-in quota + FIFO | Saves days — configuration only |
| ML data preparation pipeline | ML module queries TimescaleDB directly | Mosaico — synchronized, Arrow-formatted datasets | Saves 3–4 weeks — replaces custom ETL |
| Multi-sensor time alignment | Not addressed | Mosaico — automatic synchronization | Saves 1–2 weeks — complex to build from scratch |
| MinIO (report file storage) | MinIO for ML report files | Keep MinIO for reports, OR use ReductStore for everything | No change — MinIO still valid for cold archival |
| TimescaleDB (structured metrics) | TimescaleDB for telemetry history | Keep TimescaleDB for structured metrics; ReductStore handles blobs | No change — complementary roles |
Conservative estimates based on a mid-size team (3–5 developers):
Combined: Roughly 8–12 weeks of development time that shifts from "build from scratch" to "integrate and configure." This also reduces maintenance burden — bugs and performance improvements come from the open-source community rather than your team.
The key architectural change: split telemetry into two parallel paths based on data type.
Robot publishes position, battery, state
ROS Adapter → Kafka events
Redis (live) + TimescaleDB (history)
WebSocket → Dashboard
Robot publishes camera, LIDAR, costmap
ReductStore Agent records directly
ReductStore (time-indexed + labeled)
Mosaico syncs with metrics → ML ready
Queries TimescaleDB for structured metrics (position, battery, mission states) and ReductStore for binary data (camera frames, LIDAR scans) over the same time range.
Camera at 30fps, LIDAR at 10Hz, position at 50Hz — Mosaico aligns them by timestamp, interpolating where needed, handling gaps and missing data.
Clean, columnar, ML-ready data. Directly loadable by PyTorch, TensorFlow, or any Python ML framework — no additional data wrangling needed.
Anomaly detection models train on properly aligned multi-modal data. Report generation uses pre-aggregated, synchronized fleet metrics. Both become faster and more accurate.
Strong recommendation to integrate. ReductStore fills a genuine gap in the current TwinSight proposal — binary telemetry storage is simply not addressed. The native ROS 2 agent is a significant accelerator, and the FIFO retention, time-indexed queries, and label-based filtering eliminate multiple components that would otherwise need custom development. The Apache 2.0 license means no commercial restrictions. The main caveat: it's a younger project than MinIO or TimescaleDB, so evaluate the community size and support level before committing to it for production.
Promising but evaluate maturity carefully. Mosaico addresses a real pain point — ML data preparation is notoriously time-consuming. The Rust server + Apache Arrow approach is technically sound. However, Mosaico is an earlier-stage project. Before committing, verify: (1) How mature is the Python SDK? (2) Does it handle the specific data types TwinSight needs (ROS 2 messages, point clouds)? (3) How well does it integrate with ReductStore as a data source? If the maturity is insufficient, consider using just ReductStore + custom Python scripts (using PyArrow directly) for the initial release, and plan to integrate Mosaico in a later phase when it's more battle-tested.
Together, ReductStore and Mosaico form a "raw data → ML-ready pipeline" that TwinSight's proposal currently lacks entirely. ReductStore handles the "store everything the robot produces" problem, while Mosaico handles the "make that data useful for ML" problem. This combination is particularly powerful for the anomaly detection and predictive path features mentioned in the proposal — those ML models are only as good as the training data pipeline feeding them.
| Phase | What to Integrate | Effort | Risk |
|---|---|---|---|
| Phase 1 With initial backend |
Deploy ReductStore + ReductStore Agent as containers in docker-compose. Configure agent for camera/LIDAR topics. Add ReductStore Python SDK to FastAPI for query endpoints. | 1–2 weeks | Low — mature enough for production |
| Phase 2 With replay feature |
Build replay API using ReductStore time-range queries + TimescaleDB metrics. Add Grafana plugin for blob visualization in operational dashboards. | 1–2 weeks | Low |
| Phase 3 With ML module |
Evaluate Mosaico maturity. If ready: deploy Mosaico server, connect to ReductStore + TimescaleDB, integrate Python SDK with ML module. If not ready: use PyArrow + custom sync scripts as interim solution. | 2–3 weeks | Medium — depends on Mosaico maturity |