Picture for Yidong Wang

Yidong Wang

FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models

Add code
Apr 09, 2024
Viaarxiv icon

Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Add code
Mar 09, 2024
Figure 1 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 2 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 3 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Figure 4 for Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
Viaarxiv icon

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Add code
Feb 23, 2024
Viaarxiv icon

A General Framework for Learning from Weak Supervision

Add code
Feb 02, 2024
Viaarxiv icon

Supervised Knowledge Makes Large Language Models Better In-context Learners

Add code
Dec 26, 2023
Viaarxiv icon

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

Add code
Oct 18, 2023
Figure 1 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 2 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 3 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Figure 4 for Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Viaarxiv icon

A Survey on Evaluation of Large Language Models

Add code
Jul 18, 2023
Figure 1 for A Survey on Evaluation of Large Language Models
Figure 2 for A Survey on Evaluation of Large Language Models
Figure 3 for A Survey on Evaluation of Large Language Models
Figure 4 for A Survey on Evaluation of Large Language Models
Viaarxiv icon

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Add code
Jun 13, 2023
Figure 1 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 2 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 3 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Figure 4 for PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts
Viaarxiv icon

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Add code
Jun 08, 2023
Figure 1 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 2 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 3 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 4 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Viaarxiv icon

Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations

Add code
May 23, 2023
Figure 1 for Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
Figure 2 for Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
Figure 3 for Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
Figure 4 for Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations
Viaarxiv icon