Current / v2

Inference bottlenecks inhibit capabilities.Current fixes this.

Our research has produced an approach to text generation that massively reduces latency relative to autoregressive models by replacing serial token rollout with fast, parallel generation.

Ultrafast inference

Serve useful intelligence with dramatically lower wall-clock delay and better hardware utilization.

Next generation RL

Make long rollouts, dense rewards, and online adaptation far cheaper to train and evaluate.

Seamless multimodality

Flows are already state-of-the-art across modalities: audio, video, images, now text.

Continuous flows: a paradigm for truly parallel inference

Autoregressive: one token per function evaluation

Discrete diffusion: any order, but still sequential

Continuous flows: smooth, correlated paths

Our approach fundamentally changes the cost and speed of inference, creating entirely new application domains: streaming, collaborative intelligence.

autoregressive

discrete diffusion

continuous flow

Flow Map Language Models are experiencing a rapid take-off.

In only two months, FMLMs have moved from a research idea to a competitive text-generation technology.

They have already dramatically overtaken discrete diffusion in the few-step regime and are now rapidly approaching autoregressive quality while preserving parallel generation.

We believe that efficiency dictates quality and flow map language models will set a new frontier.

Adjusted GenPPL of flow-based language models on OpenWebText, dropping rapidly from FLM to ELF over ~35 days.

The inference bottleneck.

Autoregressive model capabilities have dramatically improved thanks to the scalability of transformers, data, and reinforcement learning. The hard part today is a balance of intelligence, cost, and speed.

Current is a research lab building a different kind of generative capabilities, which can repurpose and extend the entire training stack for fast, parallel generation by reimagining generative models from a dynamical point of view.

Autoregressive limits.

At each autoregressive step, the model predicts one token. In causal attention, the query Q is a single vector — shape [1 × dₖ] — so Q·Kᵀ is a matrix-vector multiply: every column of the key cache must stream from HBM, but each byte feeds only one dot product.

The arithmetic intensity is low, meaning that serving becomes memory-bound. This is at odds with modern GPU and TPU hardware, which is maximally utilized by operations that are compute-bound.

Inefficient GPU utilization

High latency

Expensive, cumbersome RL

\[T_q = 1,\qquad \mathrm{AI}_{AR} = \frac{S T_q}{S + T_q} = \frac{S}{S+1} \approx 1\]

Flow matching as parallel evolution.

Our approach treats generation as a dynamical process specified by a differential equation. This is the basis for flow and diffusion generative models, which are state-of-the-art in every modality besides text.

A learned velocity field \(b_t(x)\) pushes every output position forward through time, each one drifting along its own smooth trajectory from a random start to its target.

All positions advance together through a correlated, entangled process, capable of capturing the complex relationships present in real data in a completely parallel fashion.

One forward pass evaluates the field at the current state and updates the entire generated sample simultaneously. This moves beyond single-token-at-a-time generation and removes the need to wait for sequential evaluation.

Flow matching closes the gap.

A flow-matching model processes all T generation tokens in one pass. Q has shape [T × dₖ], making Q·Kᵀ a full matrix multiply: each key column is reused across T query rows before the next column is needed. The arithmetic intensity rises by a factor of T — the operation becomes compute-bound.

\[T_q = T,\qquad \mathrm{AI}_{FM} = \frac{S T_q}{S + T_q} = \frac{ST}{S+T} \approx T \quad (S \gg T)\]

Flow maps collapse the rollout.

Flow matching converted generation into a smooth dynamical process, leading to one matrix multiplication per step. But every sample still requires solving the differential equation, meaning it incurs many forward passes per output, each requiring a fresh matrix multiply.

A flow map \(X_{s,t}\) is trained to output the entire solution directly. This means that one forward pass produces the entire sequence. The compute-bound arithmetic intensity stays, but the wall time collapses to a single matrix multiply, requiring us to evaluate our large transformer orders of magnitude fewer times.

Bringing the current to language.

Language is discrete. One way to handle this with a continuous flow is to think about each element in a vocabulary as a vertex of a simplex; a probability over tokens is a point inside it. Plain flow matching, designed for continuous space, treats those probabilities like coordinates — and loses the geometry that made language models train so well in the first place.

We've discovered algorithms that allow us to exploit this geometry in learning flow maps on discrete spaces. The model outputs probability distributions on the simplex, and can be trained with the same KL / cross-entropy losses that scaled transformers in the first place.

This gives us the scalable training signal of language models with the parallel, transport-based generation of flows. Many tokens in a single, differentiable function evaluation. Cheap to serve. Easy to fine-tune with reinforcement learning.

Flow maps reset the speed-quality tradeoff.

Standard generative models buy quality with serial work: more autoregressive steps, more denoising steps, more network evaluations. A learned flow map changes the unit of inference from an infinitesimal update to a jump across the trajectory.

Because \(X_{s,t}\) is trained to be accurate for arbitrary spans, serving can choose one large jump, a few corrective jumps, or a budgeted schedule without retraining the core model.

Scaling closes the gap to the data distribution.

Generative perplexity on OpenWebText falls smoothly with model size, and is rapidly converging to the generative perplexity of the underlying corpus itself — the floor that any language model can hope to reach on this data.

Each point is a flow-map language model trained at a different parameter count. The dashed reference line is the generative perplexity of OpenWebText under the same GPT-2-Large scoring model.

A clean scaling curve at this early scale is what we would expect of a learning paradigm with room to grow — and is the foundation for extrapolating to frontier-scale flow-map language models.

Generative perplexity of flow-map language models on OpenWebText as a function of model size, decreasing monotonically from 1B to 4B parameters and approaching the corpus generative perplexity of 16.2.

Parallel generation changes the throughput scale.

Flow maps emit a whole block with a single model evaluation. When generating 128 tokens at a time, throughput on the same hardware moves from O(100) tokens per second to O(10,000). Using LLM inference throughput calculations, flow maps give orders of magnitude higher throughput, and the gains become even stronger at lower batch sizes.

Higher batch sizes are often intractable due to KV cache memory constraints, which further limits autoregressive throughput scalability.

Long-term, our ability to handle smaller batch sizes at high GPU utilization enables the efficient use of local models, real-time computation, and on-device deployment.

The post-training dilemma.

Post-training — and RL in particular — endows modern LLMs with reasoning, long-horizon planning, and adaptability to downstream tasks. It is largely responsible for their compelling performance in real-world use cases.

But the sequential, iterative nature of autoregressive generation means this stage dominates training cost. Reward models, which evaluate the quality of a generated sequence according to some downstream goal, can only be computed after a full-length rollout — making each step expensive.

Backpropagating through the rollout for modern LLMs is impossible. Modern pipelines rely on relative-advantage estimators and actor-critic value approximations to keep the gradient signal alive — variance-reduction tricks stacked just to make training tractable.

policy \(\pi_\theta\)

reward \(r(x)\)

\(r\)

\(\nabla_\theta\, J(\pi_\theta)\)

Flow maps enable scalable post-training.

Flow maps amortize the rollout, meaning that the entire sequence and its reward can be computed in one step.

This isn't just dramatically faster, but the rollout is now differentiable end-to-end, giving us a dense reward signal that flows directly back through the model. High-variance policy-gradient estimators are replaced by a high-signal, low-variance gradient, enabling a fundamentally new post-training paradigm.

Our approach eliminates the need for relative-advantage tricks. Value functions become cheap to estimate without a separate critic network. The cleaner inner loop unlocks more advanced and more stable reinforcement-learning algorithms.

flow map \(X_\theta\)

reward \(r(x_1)\)

\(r\)

\(\nabla_\theta r(x_1)\;\text{w.r.t. parameters}\)

\(x_1 = X_{t,1}(x_t)\)

Flow maps enable inference-time adaptability and scaling.

Modern flows are controllable. This can be used to adjust their output for higher precision, adaptability to specific tasks, or alignment with a certain style or aesthetic.

Recent advances from our team show that flow maps dramatically expand our ability to control these models and to scale compute against output quality (Meta Flow Maps [6], Diamond Maps [5]). This follows because the flow map can be used to efficiently compute dense rewards throughout the generative process.

By incorporating reward signals at inference time, we can adapt large pre-trained models to new tasks in a lightweight, scalable way and systematically trade compute for higher-quality outputs.

Because of the sequential compute cost and non-differentiability of discrete sampling, this is a capability that is fundamentally impossible with the modern autoregressive paradigm.

flow map \(X_\theta\)

reward \(r(x_1)\)

\(r\)

\(\nabla_{x_t} r(x_1)\;\text{w.r.t. input}\)

\(x_1 = X_{t,1}(x_t)\)

Real-time computer use.

Our ability to handle efficient long-horizon RL and inference-time adaptation is naturally suited to real-time computer use. Given an effective base model, fast post-training and online adaptation let us specialize to arbitrary RL environments — letting the model leverage available tools in their native form.

Our improved inference efficiency dramatically reduces latency, enabling a truly interactive user experience. Continuous flows are state-of-the-art across all continuous modalities, enabling us to design highly expressive multimodal state representations for complex agentic tasks.

By leveraging multimodal representations over continuous data (mouse coordinates), discrete actions (click, select, drag), and image representations of the screen state, our approach opens a new paradigm for next-generation agents.

Token economics will dominate the agent ecosystem.

Agents turn model calls into operating costs. Every observation, tool call, screen state, retrieved document, plan revision, and reward signal is paid for in tokens. The systems that win will not just be the systems that are smarter, but the systems that can convert token budget into useful work more efficiently.

Our focus is where that economics matters first: ultra-fast business intelligence, computer use applications, and long time horizon RL.

Flow maps will enable more useful actions per second, more rollouts per dollar, and longer horizons before token cost becomes the bottleneck.

Product roadmap.

We propose a two-pronged one-year plan, pursued in parallel.

In the first phase, we will leverage recent flow map distillation techniques to convert premier open-weights autoregressive models into flow maps with dramatically accelerated inference. These can be deployed, post-trained, and steered on proprietary data via enterprise partnerships, serving sub-frontier intelligence for business tasks like customer service, text processing, and knowledge management at a fraction of the cost.

Second, we will fully map out the pre-training science of these models, including their scaling laws, data efficiency, and the training paradigm needed to build a truly frontier flow map language model that goes beyond the autoregressive paradigm.

Both prongs leverage the dramatic advances made in autoregressive technologies, including training approaches, network architectures, and serving and inference pipelines, letting us reach these goals at rapid speed.

The short version.

Current is building the engine to transform the future of real-time inference, computer use, and frontier intelligence.

Through a continuous representation, our approach enables dramatically accelerated post-training, reinforcement learning, hardware utilization, and a new era of truly multimodal foundation models.

For the math behind each section — interpolants, transport, flow maps, the simplex setup for language, reward fine-tuning, and inference-time guidance — see the technical appendix →

Team

Current — research studio

Michael Albergo (CEO) — Harvard assistant professor. Past: Anthropic. Co-inventor of flow maps and flow matching.
Nick Boffi — Carnegie Mellon assistant professor. Co-inventor of flow maps and flow matching.
Grant Rotskoff — Stanford assistant professor. Expert on diffusion and flow matching.
Jason Yim — Past: Xaira, MIT PhD, DeepMind. Co-inventor of RFdiffusion and discrete flow matching.
Brian Lee — Jane Street, engineer building machine learning infrastructure.
Peter Potaptchik — Oxford PhD candidate.
Woody Ahern — Xaira AI scientist, engineering lead. Co-inventor of RFdiffusion.

Angel investors

Sander Dieleman — Director for generative media at DeepMind.
Aaron van den Oord — Lead of generative media team at DeepMind.
Lasse Espeholt — CTO of Ineffable Intelligence.
Aaron Lou — Head of Strategic Explorations at OpenAI.