Utilities
finamt.utils
Heuristic rule-based extraction utilities used as a fallback when the LLM is unavailable or returns incomplete data.
These functions are intentionally simple and conservative — they prefer
returning None over returning plausibly wrong values.
- class finamt.utils.DataExtractor[source]
Bases:
objectHeuristic text extraction for receipts.
All methods are static — instantiate the class or call methods directly.
- static extract_company_name(text: str) str | None[source]
Return the first non-trivial line from the top of the receipt.
Skips blank lines, lines that start with a digit (dates, amounts), and lines containing common boilerplate words.
- static extract_date(text: str) datetime | None[source]
Return the first parseable date found in the text.
Handles DD.MM.YYYY, YYYY-MM-DD, DD/MM/YYYY and German month names. Two-digit years are interpreted as 2000+ if < 50, else 1900+.
- static extract_amounts(text: str) dict[str, Any][source]
Extract monetary amounts from text.
Strategy: 1. Scan lines that contain a total-indicating keyword; use the
amount on that line as the grand total.
Fall back to the largest amount found in the document.
Returns
{"total": Decimal | None, "all": [Decimal, ...]}.
- finamt.utils.clean_json_response(response: str) str[source]
Extract and sanitise a JSON object from an LLM response string.
Handles: - Markdown code fences (
`json … `) - Trailing commas in objects and arrays - Unquoted keys — only attempted when the extracted candidate is notalready valid JSON, to avoid corrupting URLs or colons inside strings
Returns an empty JSON object
{}on total failure so callers can always calljson.loads()on the result.
- finamt.utils.parse_decimal(value: Any) Decimal | None[source]
Safely coerce any value to
Decimal, returningNoneon failure.
- finamt.utils.parse_date(date_str: str) datetime | None[source]
Parse an ISO-format date string (
YYYY-MM-DD) todatetime.Also accepts common European formats as a fallback. Uses explicit format strings rather than
%B/%bto avoid locale dependency. Handles English abbreviated months (JUL, AUG …) and German month names/abbreviations (OKT, MRZ, JANUAR, OKTOBER …).