پیش بینی هزینه های بیمه درمانی افراد با استفاده از یادگیری ماشین و روش یادگیری جمعی (مقاله علمی وزارت علوم)

درجه علمی: نشریه علمی (وزارت علوم)

نویسندگان: مهسا تجددی نودهی سمانه حسینی خطیبانی محسن یزدی نژاد سمیه زلفی

منبع: پژوهشنامه بیمه دوره 13 زمستان 1402 شماره 1

کلید واژه ها: داده کاوی ریسک هزینه بیمه درمان یادگیری جمعی یادگیری ماشین

حوزه های تخصصی:

حوزه‌های تخصصی مدیریت مدیریت بازرگانی مدیریت بیمه

doi: 10.22056/ijir.2024.01.01

شماره صفحات: ۱ - ۱۴

دریافت مقاله تعداد دانلود : ۲۰

آرشیو

چکیده

پیشینه و اهداف: صنعت بیمه درمانی در پیش بینی هزینه های بیمه افراد که براساس پارامترهای پیچیده ای مانند سن و ویژگی های فیزیکی است، با چالش مهمی مواجه است. شرکت های بیمه برای مدیریت ریسک و جلوگیری از زیان احتمالی، بیمه گذاران را به دو گروه پرخطر و کم خطر دسته بندی می کنند. بااین حال، برآورد دقیق هزینه ها برای هر فرد می تواند کار سختی باشد. برای مقابله با این چالش، ما رویکردی مبتنی بر علم داده و یادگیری ماشین را پیشنهاد می کنیم که از یادگیری جمعی برای پیش بینی افراد پرخطر و کم خطر استفاده می کند.روش شناسی: روش پیشنهادی شامل مراحل مختلفی از جمله پیش پردازش داده ها، مهندسی ویژگی ها و اعتبارسنجی متقابل برای ارزیابی عملکرد مدل است. در مرحله اول، داده ها را با پاک کردن، مدیریت مقادیر ازدست رفته و رمزگذاری متغیرهای طبقه بندی، پیش پردازش می کنیم. در مرحله دوم، ما ویژگی های جدیدی را با استفاده از روش های مهندسی ویژگی ها مانند مقیاس بندی، نرمال سازی و کاهش ابعاد تولید می کنیم. این روش ها به استخراج اطلاعات معنادار از داده ها و بهبود عملکرد مدل کمک می کند. در مرحله بعد، ما از یادگیری جمعی برای ترکیب روش های رگرسیون متعدد، مانند رگرسیون لجستیک، شبکه های عصبی، ماشین های بردار پشتیبانی، جنگل های تصادفی، LightGBM و XGBoost استفاده می کنیم. هدف از ترکیب این روش ها این است که از نقاط قوت آن ها استفاده کنیم و نقاط ضعف آن ها را به حداقل برسانیم تا به دقت پیش بینی بهتری دست یابیم. در نهایت، عملکرد مدل را با استفاده از روش اعتبارسنجی متقاطع k-fold ارزیابی می کنیم. این روش به اعتبارسنجی دقت مدل و جلوگیری از برازش بیش از حد کمک می کند.یافته ها: رویکرد پیشنهادی ما به AUC برابر با 73/0 دست می یابد که اثربخشی آن را در پیش بینی افراد پرخطر و کم خطر نشان می دهد.نتیجه گیری: با استفاده از علم داده و روش های یادگیری ماشین، شرکت های بیمه می توانند دقت برآورد هزینه خود را بهبود بخشند و ریسک را بهتر مدیریت کنند. این رویکرد می تواند به شرکت های بیمه کمک کند تا پوشش بیمه ای و قیمت گذاری دقیق تری را برای افراد ارائه دهند که به رضایت بیشتر مشتریان و کاهش زیان های مالی منجر می شود.

Predicting people's health insurance costs using machine learning and ensemble learning methods

BACKGROUND AND OBJECTIVES: The healthcare insurance industry faces a significant challenge predicting individuals' insurance costs, which are based on complex parameters such as age and physical characteristics. Insurance companies categorize policyholders into high-risk and low-risk groups to manage risks and avoid potential losses. However, the accurate estimation of costs for each individual can be a daunting task. By leveraging data science and machine learning techniques, insurance companies can improve their cost estimation accuracy and better manage risks. This approach can help insurance companies to provide more accurate insurance coverage and pricing for individuals leading to higher customer satisfaction and lower financial losses.METHODS: To address this challenge, a data science and machine learning-based approach that uses ensemble learning to predict high-risk and low-risk individuals is used. The method involves several steps including data preprocessing, feature engineering, and cross-validation to evaluate the model's performance. The first step involves preprocessing the data by cleaning it, handling missing values, and encoding categorical variables. The second step generates new features using feature engineering techniques such as scaling, normalization, and dimensionality reduction. Next, ensemble learning is used to combine multiple regression methods such as logistic regression, neural networks, support vector machines, random forests, LightGBM, and XGBoost. By combining these methods, the aim is to leverage their strengths and minimize their weaknesses to achieve better prediction accuracy. Finally, the model's performance is evaluated using cross-validation techniques such as k-fold cross-validation. These techniques help to validate the model's accuracy and prevent overfitting.FINDINGS: The proposed approach achieves an AUC of 0.73 demonstrating its effectiveness in predicting high-risk and low-risk individuals.CONCLUSION: In conclusion, the healthcare insurance industry can benefit greatly from data science and machine learning-based approaches. By accurately predicting high-risk and low-risk individuals, insurance companies can better manage risks and provide more accurate coverage and pricing for their customers. This can lead to the improvement of customer satisfaction and the reduction of financial losses for insurance companies.