Close Menu
    Facebook X (Twitter) Instagram
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Facebook X (Twitter) Instagram
    Tech Chain Daily
    • Home
    • Crypto News
      • Bitcoin
      • Ethereum
      • Altcoins
      • Blockchain
      • DeFi
    • AI News
    • Stock News
    • Learn
      • AI for Beginners
      • AI Tips
      • Make Money with AI
    • Reviews
    • Tools
      • Best AI Tools
      • Crypto Market Cap List
      • Stock Market Overview
      • Market Heatmap
    • Contact
    Tech Chain Daily
    Home»AI News»Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
    Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset
    AI News

    Building a Semantic Search Engine and Open-Status Classifier over the ResearchMath-14k Dataset

    June 4, 20261 Min Read
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Customgpt


    from sentence_transformers import util
    def search(query, k=5):
    q = model.encode([query], normalize_embeddings=True)
    sims = util.cos_sim(q, emb)[0].cpu().numpy()
    idx = sims.argsort()[::-1][:k]
    print(f’\n=== Query: “{query}” ===’)
    for rank, i in enumerate(idx, 1):
    row = work.iloc[i]
    print(f”\n[{rank}] sim={sims[i]:.3f} | {row[‘taxonomy_level_1’]} ”
    f”| status={row[‘open_status’]}”)
    print(” “, row[TEXT_COL][:260].replace(“\n”, ” “), “…”)
    search(“rational points on hyperelliptic curves”)
    search(“multiplicativity of maximal output p-norm of a quantum channel”)
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import classification_report, ConfusionMatrixDisplay
    y = work[“open_status”].values
    Xtr, Xte, ytr, yte = train_test_split(
    emb, y, test_size=0.25, random_state=RANDOM_STATE, stratify=y)
    clf = LogisticRegression(max_iter=2000, class_weight=”balanced”, C=2.0)
    clf.fit(Xtr, ytr)
    pred = clf.predict(Xte)
    print(“\n=== open_status classifier (embeddings + logistic regression) ===”)
    print(classification_report(yte, pred))
    fig, ax = plt.subplots(figsize=(7, 6))
    ConfusionMatrixDisplay.from_predictions(
    yte, pred, ax=ax, cmap=”Blues”, xticks_rotation=45,
    normalize=”true”, values_format=”.2f”)
    ax.set_title(“open_status confusion matrix (row-normalized)”)
    plt.tight_layout(); plt.show()
    sims = util.cos_sim(emb, emb).cpu().numpy()
    np.fill_diagonal(sims, 0)
    i, j = np.unravel_index(sims.argmax(), sims.shape)
    print(f”\nMost similar pair (cos={sims[i, j]:.3f}):”)
    for n in (i, j):
    print(f”\n paper_id={work.iloc[n][‘paper_id’]} | ”
    f”{work.iloc[n][‘taxonomy_level_1’]}”)
    print(” “, work.iloc[n][TEXT_COL][:240].replace(“\n”, ” “), “…”)
    print(“\nDone. Set SAMPLE_SIZE=None at the top to run on the full 14.1k rows.”)



    Source link

    murf
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    CryptoExpert
    • Website

    Related Posts

    Teaching AI agents to ask better questions by playing “Battleship” | MIT News

    June 3, 2026

    The future of automated trading with the best forex robot reviews

    June 1, 2026

    A Coding Implementation on Loguru for Designing Robust, Structured, Concurrent, and Production-Ready Python Logging Pipelines

    May 31, 2026

    Media Advisory: MIT to establish regional quantum hub | MIT News

    May 30, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    kraken
    Latest Posts

    The Lazy Way to Make Passive Income with AI in 2026 (90% Automated)

    June 4, 2026

    Updated Essential AI Skills For 2026

    June 4, 2026

    Ultimate Claude Code Guide: How to Use Claude Code for Beginners in 2026

    June 4, 2026

    Strategy Didn’t Sell Bitcoin in May, According to Polymarket

    June 4, 2026

    Corporate Giant Eyes $4.2 Billion Bitcoin Expansion While Saylor Moves To Sell

    June 4, 2026
    kraken
    LEGAL INFORMATION
    • Privacy Policy
    • Terms Of Service
    • Social Media Disclaimer
    • DMCA Compliance
    • Anti-Spam Policy
    Top Insights

    A 2011 physical Bitcoin loaded with 25 BTC was just unlocked during the $62k selloff

    June 5, 2026

    Arthur Hayes Exits HYPE, NEAR as AI IPOs Threaten Liquidity

    June 5, 2026
    murf
    Facebook X (Twitter) Instagram Pinterest
    © 2026 TechChainDaily.com - All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.