ingest#

One module per upstream source. Each pulls data from the upstream (HTTP, git clone, parsed catalog page) and writes per-app rows into app_source_details for the stitch pipeline to consume. Run from scripts/ingest.py on the refresh schedule.

Homebrew Cask#

async fetch_homebrew_casks(client: AsyncClient | None = None) → list[dict][source]#

Fetch the complete Homebrew Cask catalog as raw JSON.

Accepts an optional pre-configured httpx.AsyncClient so tests and callers that want custom timeouts/headers can inject one.

Parameters:: client (httpx.AsyncClient | None) – Optional pre-configured httpx.AsyncClient. If None, a new client with a 60-second timeout is created and disposed of before returning.
Returns:: List of raw Cask records as dicts (the upstream JSON shape).
Return type:: list[dict]

async ingest_homebrew_casks(session: AsyncSession, raw_records: list[dict]) → tuple[int, int][source]#

Upsert Cask records into the homebrew_casks table.

Parameters:

session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
raw_records (list[dict]) – List of raw Cask record dicts (the upstream JSON shape).

Returns:

(ingested, skipped) — ingested is the count of records that parsed and were upserted; skipped is the count that failed Pydantic validation.

Return type:

tuple[int, int]

AutoPkg#

async fetch_autopkg_index(client: AsyncClient | None = None) → dict[str, Any][source]#

Fetch the upstream AutoPkg recipe index as raw JSON.

Accepts an optional pre-configured httpx.AsyncClient so tests and callers that want custom timeouts/headers can inject one.

Parameters:: client (httpx.AsyncClient | None) – Optional pre-configured httpx.AsyncClient. If None, a new client with a 60-second timeout is created and disposed of before returning.
Returns:: The raw decoded JSON payload. Top-level shape is {"identifiers": {<identifier>: <entry>}, "shortnames": {...}}.
Return type:: dict[str, Any]

async ingest_autopkg_index(session: AsyncSession, index_payload: dict[str, Any]) → tuple[int, int][source]#

Upsert AutoPkg recipe entries into the autopkg_recipes table.

Walks the identifiers map. The shortnames map upstream is an inverted index (shortname → list of identifiers); we don’t store it separately since it can be reconstructed from shortname columns on the recipe rows.

Parameters:

session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
index_payload (dict[str, Any]) – Raw decoded index.json payload from fetch_autopkg_index().

Returns:

(ingested, skipped). Ingested is the count of recipes upserted; skipped is the count that failed validation.

Return type:

tuple[int, int]

Jamf App Installers#

async fetch_jai_titles(base_url: str, client_id: str, client_secret: str, client: AsyncClient | None = None) → list[JaiTitle][source]#

Fetch the App Installers title catalog (lean list records only).

Authenticates once, then pages through GET /api/v1/app-installers/titles. The list records carry bundle_id + version (enough for stitching) but not per-title download URLs or architecture — use fetch_jai_catalog() for those.

Parameters:

base_url (str) – Jamf Pro base URL (e.g. https://dummy.jamfcloud.com). Any instance works — the title endpoints serve the global catalog.
client_id (str) – OAuth API client ID.
client_secret (str) – OAuth API client secret.
client (httpx.AsyncClient | None) – Optional pre-configured httpx.AsyncClient.

Returns:

Every catalog title as lean JaiTitle records.

Return type:

list[JaiTitle]

Raises:

httpx.HTTPError – On auth failure or a non-2xx page response.

async fetch_jai_catalog(base_url: str, client_id: str, client_secret: str, *, concurrency: int = 10, client: AsyncClient | None = None) → list[JaiTitle][source]#

Fetch the full catalog with per-title detail (download URLs, architecture).

One token for the whole sweep: list the titles, then fan out the per-title detail GETs under a bounded semaphore. The list + parallel detail finish in a few seconds — comfortably inside the token’s short life — so no re-auth is needed. A title whose detail fetch fails (e.g. a 429 that survives one retry) falls back to its lean list record, which still carries bundle_id + version; the run never aborts over one bad title.

Parameters:

base_url (str) – Jamf Pro base URL. Any instance works (catalog-global).
client_id (str) – OAuth API client ID.
client_secret (str) – OAuth API client secret.
concurrency (int) – Max in-flight detail requests.
client (httpx.AsyncClient | None) – Optional pre-configured httpx.AsyncClient.

Returns:

Every catalog title, detail-enriched where the detail call succeeded.

Return type:

list[JaiTitle]

async ingest_jai_titles(session: AsyncSession, titles: list[JaiTitle]) → tuple[int, int][source]#

Upsert API-fetched titles into jamf_app_installers (keyed by title name).

Derives source/host from the title’s media source so API rows carry the same coverage fields the HTML scrape provides, plus the enrichment columns (bundle_id/version/jamf_id/download_url/ architecture). The full title payload is preserved in raw.

Parameters:

session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
titles (list[JaiTitle]) – Title records from fetch_jai_catalog() (or fetch_jai_titles()).

Returns:

(ingested, skipped).

Return type:

tuple[int, int]