canml.canmlio Module

canml icon

This module provides the core APIs for decoding BLF files:

canmlio: Enhanced CAN BLF processing toolkit for production use.

This module provides end-to-end functionality for decoding CAN bus logs in BLF format into pandas DataFrames, handling DBC file loading and merging, streaming large logs, full-file loading with filtering, timing alignment, missing-signal injection, and exporting to CSV or Parquet with accompanying metadata. It also supports enums and custom signal attributes, all configurable via a single CanmlConfig object.

Dependencies:
  • numpy

  • pandas

  • cantools

  • python-can

  • tqdm

  • pyarrow (for Parquet export)

Example

from canml.canmlio import load_dbc_files, load_blf, to_csv, CanmlConfig

# 1️⃣ Load DBC db = load_dbc_files(“vehicle.dbc”, prefix_signals=True)

# 2️⃣ Configure BLF load cfg = CanmlConfig(

chunk_size=5000, progress_bar=True, force_uniform_timing=True, interval_seconds=0.02, interpolate_missing=True, dtype_map={“Engine_RPM”: “int32”}

)

# 3️⃣ Load BLF file df = load_blf(

blf_path=”drive.blf”, db=db, config=cfg, message_ids={0x100, 0x200}, expected_signals=[“Engine_RPM”, “Brake_Active”]

)

# 4️⃣ Export to_csv(df, “drive.csv”, metadata_path=”drive_meta.json”)

class canml.canmlio.CanmlConfig(chunk_size: int = 10000, progress_bar: bool = True, dtype_map: Dict[str, Any] | None = None, sort_timestamps: bool = False, force_uniform_timing: bool = False, interval_seconds: float = 0.01, interpolate_missing: bool = False)[source]

Bases: object

Configuration options for BLF processing.

Parameters:
  • chunk_size (int) – Number of messages per chunk when streaming. Defaults to 10000. Example: chunk_size=5000 for smaller, more frequent chunks.

  • progress_bar (bool) – Show a tqdm progress bar if True. Defaults to True.

  • dtype_map (Optional[Dict[str, Any]]) – Map signal names to pandas dtypes. Example: dtype_map={“Speed”: “float32”} ensures Speed column is float32.

  • sort_timestamps (bool) – Sort final DataFrame by timestamp. Defaults to False.

  • force_uniform_timing (bool) – Override timestamps with uniform spacing. Defaults to False.

  • interval_seconds (float) – Spacing interval in seconds for uniform timing. Defaults to 0.01.

  • interpolate_missing (bool) – Interpolate missing signal values if True. Defaults to False.

Raises:

ValueError – If chunk_size <= 0 or interval_seconds <= 0.

chunk_size: int = 10000
dtype_map: Dict[str, Any] | None = None
force_uniform_timing: bool = False
interpolate_missing: bool = False
interval_seconds: float = 0.01
progress_bar: bool = True
sort_timestamps: bool = False
canml.canmlio.iter_blf_chunks(blf_path: str, db: Database, config: CanmlConfig, filter_ids: Set[int] | None = None, filter_signals: Iterable[Any] | None = None) Iterator[DataFrame][source]

Stream-decode a BLF file into pandas DataFrame chunks.

Parameters:
  • blf_path (str) – Path to BLF file.

  • db (CantoolsDatabase) – Database for message definitions.

  • config (CanmlConfig) – Chunk size, progress bar, etc.

  • filter_ids (set[int], optional) – Only decode these arbitration IDs.

  • filter_signals (iterable, optional) – Only include these signal names.

Yields:

pd.DataFrame – Decoded signals with a ‘timestamp’ column.

Example

for chunk in iter_blf_chunks(“drive.blf”, db, cfg, filter_ids={0x123}):

print(chunk.head())

canml.canmlio.load_blf(blf_path: str, db: Database | str | List[str], config: CanmlConfig | None = None, message_ids: Set[int] | None = None, expected_signals: Iterable[Any] | None = None) DataFrame[source]

Load a BLF log into a pandas DataFrame, with full-featured options.

Parameters:
  • blf_path (str) – Path to BLF file.

  • db (CantoolsDatabase or str/list) – Database instance or DBC path(s).

  • config (CanmlConfig, optional) – Processing options.

  • message_ids (set[int], optional) – Filter by CAN IDs. Example: {0x123, 0x200}.

  • expected_signals (iterable, optional) – Signals to include. Example: [“Engine_RPM”].

Returns:

Columns [‘timestamp’, …signals], dtype-safe, enums as Categorical.

Return type:

pd.DataFrame

Example

df = load_blf(

blf_path=”drive.blf”, db=”vehicle.dbc”, config=cfg, expected_signals=[“Speed”, NameSignalValue(…)],

)

canml.canmlio.load_dbc_files(dbc_paths: str | List[str], prefix_signals: bool = False) Database[source]

Load and merge one or more DBC files, caching results.

Parameters:
  • dbc_paths (str or list) – Path or list of DBC file paths.

  • prefix_signals (bool) – If True, prefix signals with message names.

Returns:

Merged database.

Return type:

CantoolsDatabase

Example

db = load_dbc_files(“vehicle.dbc”, prefix_signals=True)

canml.canmlio.to_csv(df_or_iter: DataFrame | Iterable[DataFrame], output_path: str, mode: str = 'w', header: bool = True, pandas_kwargs: Dict[str, Any] | None = None, columns: List[str] | None = None, metadata_path: str | None = None) None[source]

Write DataFrame or chunks to CSV with metadata JSON.

Parameters:
  • df_or_iter (DataFrame or iterable) – Data to write.

  • output_path (str) – CSV path.

  • mode ("w"/"a") – Write or append mode.

  • header (bool) – Include header row.

  • pandas_kwargs (dict, optional) – Extra pandas.to_csv kwargs.

  • columns (list, optional) – Subset of columns to write.

  • metadata_path (str, optional) – JSON path for signal_attributes.

Example

to_csv(df, “out.csv”, metadata_path=”out_meta.json”)

canml.canmlio.to_parquet(df: DataFrame, output_path: str, compression: str = 'snappy', pandas_kwargs: Dict[str, Any] | None = None, metadata_path: str | None = None) None[source]

Write a DataFrame to Parquet with optional metadata JSON.

Parameters:
  • df (DataFrame) – Data to write.

  • output_path (str) – .parquet file path.

  • compression (str) – Codec e.g. snappy, gzip.

  • pandas_kwargs (dict, optional) – Extra pandas.to_parquet kwargs.

  • metadata_path (str, optional) – JSON path for signal_attributes.

Example

to_parquet(df, “data.parquet”, metadata_path=”data_meta.json”)