OT |
OECD AI Principles overview |
Trustworthy Language Model |
|
Jailbroken : How does llm safety training fail? (NeurlPS ‘23 Oral Paper) |
|
2024 신뢰할 수 있는 인공지능 개발 안내서 |
|
1주차 - Fariness AI |
영상 : Building fair, ethical, and responsible AI with the Responsible AI Toolkit |
논문 : Preventing Discriminatory Decision-making in Evolving Data Streams |
|
블로그 : 사람과 공존하는 AI의 필요조건, AI 공정성 |
|
2주차 - Sustainability AI |
블로그 : How to Make Generative AI Greener |
논문 : The role of artificial intelligence in achieving the Sustainable Development Goals |
|
3주차 - Trustworthy AI |
논문 : Trustworthy AI : From Principles to Practices |
협약 : OECD AI Principles overview |
|
영상 : MIT 6.S191 : Robus and Trustworthy Deep Learning |
|
Trustworthy Language Model |
|
저작권협회 생성형 AI 저작권 안내 |
|
4주차 - 프로젝트 팀 빌딩 및 토의 |
영상 : Generative AI meets Responsible AI : Practical Challenges and Opportunities |
detection model : Huggingface Prompt injection dataset |
|
프롬프트 Gandalf Lakera |
|
5주차 - Jailbreaking w/ Prompts |
AntiGPT |
블로그 : Jailbreaking Large Language Models |
|
Github : ChatGPT_DAN 해당 논문 : Do Anything Now |
|
논문 : FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts |
|
6주차 - Fairness in Gen AI |
Measuring Fairness in Generative Models |
On The Impact of Machine Learning Randomness on Group Fairness |
|
[Data quality and artificial |
|
intelligence – mitigating bias |
|
and error to protect |
|
fundamental rights](https://fra.europa.eu/sites/default/files/fra_uploads/fra-2019-data-quality-and-ai_en.pdf) |
|
|
|
7주차 - Sustainability AI |
Generative AI in energy, natural resources, and chemical |
8주차 - Trustworthy Gen AI |
[On Evaluating Adversarial Robustness of |
Large Vision-Language Model](https://yunqing-me.github.io/AttackVLM/) |
|
OpenReview : Jailbreak in pieces |
|
9주차 - 중간 리뷰/프로젝트 회의 |
|
10,11주차 - Gen AI project for avoiding toxicity |
영상 : Generative AI meets Responsible AI : Practical Challenges and Opportunities |
논문 : Can LLM Recognize Toxicity? Structured Toxicity Investigation Framework and Semantic-Based Metric |
|
Tutorial : Building a Dataset to Measure Toxicity and Social Bias within Language |
|
12,13주차 - LLM trustworthy project |
Constitutional AI: Harmlessness from AI Feedback |
Jailbreak in pieces : Compositional Adversarial Attacks on Multi-Modal Language Models |
|
Langchain-Safety |
|
7 methods to secure LLM apps from prompt injections and jailbreaks |
|
Label Errors in ML Test Sets |
|