| OT |
OECD AI Principles overview |
| Trustworthy Language Model |
|
| Jailbroken : How does llm safety training fail? (NeurlPS ‘23 Oral Paper) |
|
| 2024 신뢰할 수 있는 인공지능 개발 안내서 |
|
| 1주차 - Fariness AI |
영상 : Building fair, ethical, and responsible AI with the Responsible AI Toolkit |
| 논문 : Preventing Discriminatory Decision-making in Evolving Data Streams |
|
| 블로그 : 사람과 공존하는 AI의 필요조건, AI 공정성 |
|
| 2주차 - Sustainability AI |
블로그 : How to Make Generative AI Greener |
| 논문 : The role of artificial intelligence in achieving the Sustainable Development Goals |
|
| 3주차 - Trustworthy AI |
논문 : Trustworthy AI : From Principles to Practices |
| 협약 : OECD AI Principles overview |
|
| 영상 : MIT 6.S191 : Robus and Trustworthy Deep Learning |
|
| Trustworthy Language Model |
|
| 저작권협회 생성형 AI 저작권 안내 |
|
| 4주차 - 프로젝트 팀 빌딩 및 토의 |
영상 : Generative AI meets Responsible AI : Practical Challenges and Opportunities |
| detection model : Huggingface Prompt injection dataset |
|
| 프롬프트 Gandalf Lakera |
|
| 5주차 - Jailbreaking w/ Prompts |
AntiGPT |
| 블로그 : Jailbreaking Large Language Models |
|
| Github : ChatGPT_DAN 해당 논문 : Do Anything Now |
|
| 논문 : FigStep: Jailbreaking Large Vision-language Models via Typographic Visual Prompts |
|
| 6주차 - Fairness in Gen AI |
Measuring Fairness in Generative Models |
| On The Impact of Machine Learning Randomness on Group Fairness |
|
| [Data quality and artificial |
|
| intelligence – mitigating bias |
|
| and error to protect |
|
| fundamental rights](https://fra.europa.eu/sites/default/files/fra_uploads/fra-2019-data-quality-and-ai_en.pdf) |
|
|
|
| 7주차 - Sustainability AI |
Generative AI in energy, natural resources, and chemical |
| 8주차 - Trustworthy Gen AI |
[On Evaluating Adversarial Robustness of |
| Large Vision-Language Model](https://yunqing-me.github.io/AttackVLM/) |
|
| OpenReview : Jailbreak in pieces |
|
| 9주차 - 중간 리뷰/프로젝트 회의 |
|
| 10,11주차 - Gen AI project for avoiding toxicity |
영상 : Generative AI meets Responsible AI : Practical Challenges and Opportunities |
| 논문 : Can LLM Recognize Toxicity? Structured Toxicity Investigation Framework and Semantic-Based Metric |
|
| Tutorial : Building a Dataset to Measure Toxicity and Social Bias within Language |
|
| 12,13주차 - LLM trustworthy project |
Constitutional AI: Harmlessness from AI Feedback |
| Jailbreak in pieces : Compositional Adversarial Attacks on Multi-Modal Language Models |
|
| Langchain-Safety |
|
| 7 methods to secure LLM apps from prompt injections and jailbreaks |
|
| Label Errors in ML Test Sets |
|