ingest¶
One module per upstream source. Each pulls data from the upstream (HTTP, git clone, parsed catalog page) and writes per-app rows into app_source_details for the stitch pipeline to consume. Run from scripts/ingest.py on the refresh schedule.
Homebrew Cask¶
- async fetch_homebrew_casks(client: AsyncClient | None = None) list[dict][source]¶
Fetch the complete Homebrew Cask catalog as raw JSON.
Accepts an optional pre-configured
httpx.AsyncClientso tests and callers that want custom timeouts/headers can inject one.
- async ingest_homebrew_casks(session: AsyncSession, raw_records: list[dict]) tuple[int, int][source]¶
Upsert Cask records into the
homebrew_caskstable.- Parameters:
- Returns:
(ingested, skipped)— ingested is the count of records that parsed and were upserted; skipped is the count that failed Pydantic validation.- Return type:
AutoPkg¶
- async fetch_autopkg_index(client: AsyncClient | None = None) dict[str, Any][source]¶
Fetch the upstream AutoPkg recipe index as raw JSON.
Accepts an optional pre-configured
httpx.AsyncClientso tests and callers that want custom timeouts/headers can inject one.- Parameters:
client (httpx.AsyncClient | None) – Optional pre-configured
httpx.AsyncClient. IfNone, a new client with a 60-second timeout is created and disposed of before returning.- Returns:
The raw decoded JSON payload. Top-level shape is
{"identifiers": {<identifier>: <entry>}, "shortnames": {...}}.- Return type:
- async ingest_autopkg_index(session: AsyncSession, index_payload: dict[str, Any]) tuple[int, int][source]¶
Upsert AutoPkg recipe entries into the
autopkg_recipestable.Walks the
identifiersmap. Theshortnamesmap upstream is an inverted index (shortname → list of identifiers); we don’t store it separately since it can be reconstructed fromshortnamecolumns on the recipe rows.- Parameters:
session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
index_payload (dict[str, Any]) – Raw decoded
index.jsonpayload fromfetch_autopkg_index().
- Returns:
(ingested, skipped). Ingested is the count of recipes upserted; skipped is the count that failed validation.- Return type:
Jamf App Installers¶
- async fetch_jai_titles(base_url: str, client_id: str, client_secret: str, client: AsyncClient | None = None) list[JaiTitle][source]¶
Fetch the App Installers title catalog (lean list records only).
Authenticates once, then pages through
GET /api/v1/app-installers/titles. The list records carrybundle_id+version(enough for stitching) but not per-title download URLs or architecture — usefetch_jai_catalog()for those.- Parameters:
base_url (str) – Jamf Pro base URL (e.g.
https://dummy.jamfcloud.com). Any instance works — the title endpoints serve the global catalog.client_id (str) – OAuth API client ID.
client_secret (str) – OAuth API client secret.
client (httpx.AsyncClient | None) – Optional pre-configured
httpx.AsyncClient.
- Returns:
Every catalog title as lean
JaiTitlerecords.- Return type:
list[
JaiTitle]- Raises:
httpx.HTTPError – On auth failure or a non-2xx page response.
- async fetch_jai_catalog(base_url: str, client_id: str, client_secret: str, *, concurrency: int = 10, client: AsyncClient | None = None) list[JaiTitle][source]¶
Fetch the full catalog with per-title detail (download URLs, architecture).
One token for the whole sweep: list the titles, then fan out the per-title detail GETs under a bounded semaphore. The list + parallel detail finish in a few seconds — comfortably inside the token’s short life — so no re-auth is needed. A title whose detail fetch fails (e.g. a 429 that survives one retry) falls back to its lean list record, which still carries
bundle_id+version; the run never aborts over one bad title.- Parameters:
- Returns:
Every catalog title, detail-enriched where the detail call succeeded.
- Return type:
list[
JaiTitle]
- async ingest_jai_titles(session: AsyncSession, titles: list[JaiTitle]) tuple[int, int][source]¶
Upsert API-fetched titles into
jamf_app_installers(keyed by title name).Derives
source/hostfrom the title’s media source so API rows carry the same coverage fields the HTML scrape provides, plus the enrichment columns (bundle_id/version/jamf_id/download_url/architecture). The full title payload is preserved inraw.- Parameters:
session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
titles (list[
JaiTitle]) – Title records fromfetch_jai_catalog()(orfetch_jai_titles()).
- Returns:
(ingested, skipped).- Return type:
Mac App Store¶
- async fetch_mas_lookup(bundle_ids: list[str], client: AsyncClient | None = None) list[dict[str, Any]][source]¶
Look up each bundle_id against Apple’s iTunes Lookup API.
One HTTP call per bundle_id (Apple’s lookup endpoint accepts comma- separated
idvalues for iTunes IDs but not forbundleId). Serialized with a small inter-request delay to stay under Apple’s rate limit.- Parameters:
- Returns:
List of raw result dicts as returned by Apple. Bundle IDs with no match are silently omitted.
- Return type:
- async ingest_mas_apps(session: AsyncSession, raw_records: list[dict[str, Any]]) tuple[int, int][source]¶
Upsert MAS lookup results into the
mas_appstable.Records that fail Pydantic validation, or whose
kindfield is notmac-software, are logged and skipped; ingestion continues for the rest of the batch. We never block the whole sweep over one weird upstream record.- Parameters:
session (sqlalchemy.ext.asyncio.AsyncSession) – Async SQLAlchemy session bound to the target DB.
raw_records (list[dict[str, Any]]) – List of raw lookup result dicts from
fetch_mas_lookup().
- Returns:
(ingested, skipped). Ingested is the count of records upserted; skipped is the count that failed validation or were non-Mac results.- Return type: