Article
The article argues that adding OpenAI/Anthropic-style API calls can turn simple app features into fragile, expensive distributed systems that inherit outages, billing failures, rate limits, and privacy obligations, especially when user data must leave the device. It promotes a local-first approach for tasks where input is already on-device, such as summarizing, classification, extraction, and normalization, because this preserves data ownership, reduces integration complexity, and removes dependence on vendor uptime or credit-card-driven limits. The author describes implementing this with Apple’s Foundation Models and LanguageModelSession, including chunked summarization and typed outputs via a Generable struct so apps receive reliable structured data instead of brittle JSON scraping. The piece frames local models as imperfect but practical subsystems: not substitutes for general internet-scale reasoning, but better suited as lightweight, predictable components for task-specific transformations. The conclusion is pragmatic rather than ideological: use cloud models when genuinely necessary, and prefer local inference when possible so that AI acts like software infrastructure rather than a generic chat layer.
Commenters broadly support the concern about lock-in and geopolitical/economic risk in relying on a small set of API providers, but they split over timing and feasibility. Many argue open-weight progress, better tooling, and browser-level APIs could eventually make local models standard, especially for private or narrowly scoped workflows, while others stress that current hardware, memory bandwidth, power, and latency make local frontier-capable models costly and cumbersome for mainstream users. Several commenters predict a hybrid future in which sensitive local preprocessing and constrained tasks run on-device, with frontier services used for complex or high-accuracy work. Others emphasize that software quality, consistency, and maintenance burden still favor managed cloud APIs, especially for richer coding and agentic tasks where speed and accuracy dominate. The thread also highlights competing visions: self-hosted inference, standardized cross-platform APIs beyond Apple, or even abandoning AI-heavy UX in favor of simpler non-LLM approaches, reflecting a practical trade-off rather than a simple local-vs-cloud binary.