Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Tech Chain Daily
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Tech Chain Daily
    Home»AI News»How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention
    How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention
    AI News

    How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention

    June 17, 20262 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    binance


    print(“\n” + “=”*70 + “\n4. Variable-length packed batch — no padding waste\n” + “=”*70)
    seqlens = [37, 120, 8, 200]
    total = sum(seqlens)
    H, K = 8, 64
    q = torch.randn(1, total, H, K, device=device, dtype=torch.float16)
    k = torch.randn(1, total, H, K, device=device, dtype=torch.float16)
    v = torch.randn(1, total, H, K, device=device, dtype=torch.float16)
    try:
    bias = ab.BlockDiagonalMask.from_seqlens(seqlens)
    out_packed = xops.memory_efficient_attention(q, k, v, attn_bias=bias)
    s0 = seqlens[0]
    ref0 = vanilla_attention(q[:, :s0], k[:, :s0], v[:, :s0]).half()
    print(“packed shape :”, tuple(out_packed.shape), “(all”, total, “tokens, no pad)”)
    print(“segment-0 max diff : {:.2e}”.format((out_packed[:, :s0] – ref0).abs().max().item()))
    cbias = ab.BlockDiagonalCausalMask.from_seqlens(seqlens)
    _ = xops.memory_efficient_attention(q, k, v, attn_bias=cbias)
    print(“-> also did a packed CAUSAL pass. This is how vLLM-style engines”)
    print(” batch requests of different lengths with zero padding overhead.”)
    splits = bias.split(out_packed)
    print(“recovered segments :”, [tuple(t.shape) for t in splits])
    except Exception as e:
    print(“BlockDiagonalMask path skipped on this version/backend:”, repr(e))
    print(“\n” + “=”*70 + “\n5. Grouped-query attention (5-D BMGHK layout)\n” + “=”*70)
    B, M, K = 2, 256, 64
    n_q_heads, n_kv_heads = 8, 2
    G, Hq = n_kv_heads, n_q_heads // n_kv_heads
    try:
    qg = torch.randn(B, M, G, Hq, K, device=device, dtype=torch.float16)
    kg = torch.randn(B, M, G, 1, K, device=device, dtype=torch.float16)
    vg = torch.randn(B, M, G, 1, K, device=device, dtype=torch.float16)
    out_gqa = xops.memory_efficient_attention(qg, kg, vg)
    print(“GQA output shape :”, tuple(out_gqa.shape), “= [B, M, G, Hq, K]”)
    print(f”-> {n_q_heads} query heads, only {n_kv_heads} KV heads: smaller KV-cache,”)
    print(” which is exactly what Llama-/Mistral-class models use at inference.”)
    except Exception as e:
    print(“GQA 5-D path skipped on this version/backend:”, repr(e))



    Source link

    notion
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    CryptoExpert
    • Website

    Related Posts

    When it comes to predicting people’s preferences, it pays to consider “the power of three” | MIT News

    June 16, 2026

    MCP solved tool calling. A2A solved coordination. What solves transport?

    June 15, 2026

    Automating portfolio trading with AI

    June 14, 2026

    Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Scores 80.04% on BIRD Single-Model Leaderboard

    June 13, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    changelly
    Latest Posts

    How to Build Memory-Efficient Transformers with xFormers Using Packed Sequences, GQA, ALiBi, SwiGLU, and Causal Attention

    June 17, 2026

    Generative AI vs Agentic AI vs AI Agents

    June 16, 2026

    Charles Hoskinson Reveals What Happened to 1,096 BTC From Cardano’s Early Days

    June 16, 2026

    Strategy bought $100 million more Bitcoin but critics say MSTR shareholders now own less of it

    June 16, 2026

    BitMine ETH Holdings Reach $10B, Now 4.66% of Circulating Supply

    June 16, 2026
    livechat
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    Bitcoin Rallies To $67K As US-Iran Make Peace: Will Both Hold?

    June 17, 2026

    Binance Faces Reported MiCA Setback In Greece Ahead Of July Deadline

    June 17, 2026
    Customgpt
    Facebook X (Twitter) Instagram Pinterest
    © 2026 TechChainDaily.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.