Disease Risk Prediction using text-based heterogeneous dataset