【学界】第七章:Python代码之模块整合与主程序

【学界】第七章:Python代码之模块整合与主程序

文章作者:@小杨
本文获得@小杨授权,原文链接:【学界】第七章:Python代码之模块整合与主程序
欢迎原链接转发,转载请私信 @留德华叫兽 获取信息,盗版必究。
敬请关注和扩散本专栏及同名公众号,会邀请全球知名学者发布运筹学、人工智能中优化理论等相关干货、知乎Live及行业动态:『运筹OR帷幄』大数据人工智能时代的运筹学


这系列文章是个人的一些体会,也是对上一阶段工作的总结。由于是初次涉水,阅历尚浅,希望通过总结来反省,希望收获大神的意见~

研究课题:

[ 社区居家养老调度与路径规划问题 ]

[ Community Home Health Care Scheduling and Routing Problem ]

------本文目录------

第一、二、三章:论文选题、文献综述方法、问题分析、与模型假设

第四章:马氏决策过程&机会约束模型&多人随机路径规划

第五章:Python代码之模型与Q-Learning

第六章:Python代码之蚁群算法部分

第七章:Python代码之模块整合与主程序

第八章:Python代码之数据输出、调参与算法总结

------7 Python代码之模块整合与主程序------

这篇文章主要从如何用Python实现算法的角度来总结。如标题所示,这个启发式算法结合了Q-learning和Ant Colony Optimization,其本质思想是强化学习的思想,即通过不断地试错来寻找可接受的近似最优解。算法的流程如下所示。

Outline of the QL-ACO algorithm

那么整个程序可以分为四部分:

Data Representation(DR)
Best-Worst Ant Colony Optimization(BWACO)
Q-Learning(QL)
Results Exporting(RE)

第五、六章已经给出了Q Learning和BWACO的Python代码,接下来我们把所有模块整合起来,单独作为一个py文件,然后在主文件里调用它。首先列出目前用到的包、所有预定义的类和方法:

"""------QL_BWACO.py------"""

import numpy as np
import pandas as pd
import copy
import math
import random

# Basic classes and instance data input
class Nurses:...
class Jobs:...
def get_data(f_index, f_scale, jobs):...

# QL realization
def state_identification(target_set, skill_set):...
def action_taking(state, q_matrix, skill_set, greedy):...
def count_demand_num(target_set):...
def q_learning(q_nurses_list, target_set, skill_set,
               q_matrix, learning_rate_para, discount_para, greedy_para, waiting_para, absorbing_state,
               # ACO variables
               ccp_acquaintance_increment, ccp_alpha, ccp_beta, ccp_waiting_limit, ccp_workload_limit,
               aco_alpha, aco_beta, aco_rho, aco_ant_number, changeable_pheromone_matrix,
               e_walk_speed, changeable_preference_matrix, e_init_distance_matrix, e_service_time_mean,
               depot):...

# ACO realization
def collect_feasible_targets(visiting_list, distance_matrix, walk_speed, waiting_limit, current_job, current_time):...
def choose_target_randomly(pr):...
def calculate_transition_probability(feasible_targets, current_time, distance_matrix, current_job, walk_speed,
                                     sub_arrival_time, ant_path_table, visiting_list, pheromone_matrix,
                                     alpha_aco_p, beta_aco_p,
                                     depot):...
def update_pheromone(best_path, worst_path, pheromone_matrix, rho_aco_p, distance_matrix):
def aco(acquaintance_increment, alpha_model_p, beta_model_p, waiting_limit, workload,
        alpha_aco_p, beta_aco_p, rho_aco_p, ant_num, pheromone_matrix,
        walk_speed, preference_matrix, distance_matrix, service_time_mean,
        nurse, sd_matched_targets, depot):...

在第五章中,我将q_learning()方法定义为调用一次跑一个episode并更新Q矩阵。那么主函数要做的事情就是调用这个方法很多次,直到矩阵收敛(aco()是被q_learning()调用的)。主函数要做的事情:

  1. 设置好所有参数与案例数据
  2. 循环调用q_learning()直到解的质量不再提高
  3. 把解的数据进行可视化输出

此外,需要建立一个新的类:Solution,用来表示每个子解和全局解,并建立一个内部函数来计算需要的数值:

class Solution:
    __slots__ = ['nl', 'on', 'ev', 'rd', 'fd_sum',
                 'a_wait_j', 'a_wait_n', 't_wait', 'a_work_j', 'a_work_n', 't_work',
                 'fig_t', 'info_to_excel']

    def __init__(self):
        # Nurse list
        self.nl = []
        # Occupied nurse number
        self.on = 0
        # Evaluation function value
        self.ev = 0
        # Remaining demand number
        self.rd = []
        # Fulfilled demand number
        self.fd_sum = 0
        # Average waiting time of fulfilled job
        self.a_wait_j = 0
        # Average waiting time of occupied nurse
        self.a_wait_n = 0
        # Total waiting time
        self.t_wait = 0
        # Average service time of job
        self.a_work_j = 0
        # Average workload of occupied nurse
        self.a_work_n = 0
        # Total workload of occupied nurse
        self.t_work = 0
        # Figure test
        self.fig_t = ''
        # Information to excel
        self.info_to_excel = []

    def calculate(self, q_learning_result, remaining_demand):
        self.t_work = float('%.2f' % (copy.deepcopy(q_learning_result[0])))
        self.t_wait = float('%.2f' % (copy.deepcopy(q_learning_result[1])))
        self.rd = QL_BWACO.count_demand_num(copy.deepcopy(remaining_demand))
        self.fd_sum = file_scale - sum(self.rd) - 1
        # Calculate occupied nurses
        for o in range(len(self.nl)):
            if len(self.nl[o].r) > 2:
                self.on += 1
        if self.on != 0:
            self.a_wait_j = float('%.2f' % (self.t_wait / self.fd_sum))
            self.a_wait_n = float('%.2f' % (self.t_wait / self.on))
            self.a_work_j = float('%.2f' % (self.t_work / self.fd_sum))
            self.a_work_n = float('%.2f' % (self.t_work / self.on))
            self.ev = float('%.2f' % (self.fd_sum + 100 * self.fd_sum / self.t_wait))

*self.fig_t和self.info_to_excel这两个变量是用于储存图的题注,以及即将存入Excel文件中的数据的。将在下一章讲到。

*这个类的对象在第一次被创建时,不需要输入任何信息(def__init__(self):),但有一堆内部属性要通过其他方式赋值。

读取数据与设置参数:

"""------Instance representations------"""
# Get elder data and distance matrix
file_index = 'A'
file_scale = 61
e_jobs = []
e_init_distance_matrix = QL_BWACO.get_data(file_index, file_scale, e_jobs)
# Set up nurse resource
e_nurses_skill = [1, 2, 2, 2, 3, 3, 3]
# Build initial preference matrix
e_random_pre = np.loadtxt("./initial_preference_ABC.csv", delimiter=",", skiprows=0)
e_preference_matrix = np.row_stack((np.zeros((1, len(e_nurses_skill))), e_random_pre))
# Skill demand matching parameter matrix
e_service_time_mean = np.array([[0, 0, 0, 0],
                                [0, 25, -1, -1],
                                [0, 20, 30, -1],
                                [0, 18, 20, 20]])
e_walk_speed = 60

"""------Q Learning parameters------"""
ql_learning_rate_para = 0.9
ql_discount_para = 0.9
ql_greedy_para = 0.5
ql_waiting_para = -0.01
ql_q_matrix = np.zeros((8, 3))
ql_absorbing_state = [6, 7]

"""------ACO parameters------"""
aco_alpha = 5
aco_beta = 5
aco_rho = 0.3
aco_ant_number = 30
aco_pheromone = 20
aco_pheromone_matrix = np.ones((len(e_init_distance_matrix[0]), len(e_init_distance_matrix[0]))) * aco_pheromone

"""------CCP model parameters------"""
ccp_acquaintance_increment = 0.01
ccp_alpha = 1.29
ccp_beta = 1.29
ccp_waiting_limit = 40
ccp_workload_limit = 480

"""---Solution Recording Variables---"""
solution_final = Solution()

"""---Start iteration---"""
start = time.time()
iter = 0
iter_max = 200

主程序

import time
import copy
import numpy as np
import QL_BWACO

import os
os.getcwd()

if __name__ == '__main__':
    """------Instance representations------"""
    ...
    """------Q Learning parameters------"""
    ...
    """------CCP model parameters------"""
    ...
    """------ACO parameters------"""
    ...
    """---Solution Recording Variables---"""
    solution_final = Solution()

    """---Start iteration---"""
    start = time.time()
    iter = 0
    iter_max = 200
    while iter < iter_max:
        iter += 1
        # Solution objective for current sub-solution
        sub_solution = Solution()
        # Changeable list of targets
        available_targets = copy.deepcopy(e_jobs)
        # Delete depot
        available_targets.remove(available_targets[0])
        # Changeable preference, pheromone matrix and nurse's skill set
        changeable_preference_matrix = copy.deepcopy(e_preference_matrix)
        changeable_pheromone_matrix = copy.deepcopy(aco_pheromone_matrix)
        changeable_nurses_skill = copy.deepcopy(e_nurses_skill)
        # Start Q Learning process
        ql_result = QL_BWACO.q_learning(sub_solution.nl, available_targets, changeable_nurses_skill,
                                        ql_q_matrix, ql_learning_rate_para, ql_discount_para, ql_greedy_para,
                                        ql_waiting_para, ql_absorbing_state,
                                        # ACO variables
                                        ccp_acquaintance_increment, ccp_alpha, ccp_beta, ccp_waiting_limit,
                                        ccp_workload_limit,
                                        aco_alpha, aco_beta, aco_rho, aco_ant_number, changeable_pheromone_matrix,
                                        e_walk_speed, changeable_preference_matrix, e_init_distance_matrix,
                                        e_service_time_mean,
                                        e_jobs[0])
        # Calculate fulfilled and remaining demands
        sub_solution.calculate(ql_result, available_targets)
        # Update global solution according to evaluation value
        if solution_final.ev < sub_solution.ev:
            solution_final = copy.deepcopy(sub_solution)

While的每次循环都做了三件事:

  1. 建立一个Solution对象
  2. 初始化每个episode都要重置的参数
  3. 调用q_learning()函数
  4. 以评价函数值为基准更新当前全局解

*import os和os.getcwd()是用来把文件路径定位到这个主程序所在的目录里,没有它就吊用不了同在一个目录里的QL_BWACO.py啦。

*在while开始前一刻,用time.time()记录当前系统的时刻。

*可拜访目标集available_targets会在每个episode中被更改,所以用deepcopy()复制。preference矩阵、信息素矩阵、可选护工资源以及更新全局解的方式同理。

*调用q_learning()返回的值是个列表: [总工作负荷,总等待时长]

算法过程至此完全结束,先写个简单的数据显示块:

    end = time.time()
    running_time = (end - start) / 60
    r_text = "Running time: " + str(float('%.2f' % running_time)) + " mins \n" \
               "Results:\n" \
               "Solution avg workload of nurse = " + str(solution_final.a_work_n) + "\n" \
               "Solution avg waiting time of job = " + str(solution_final.a_wait_j) + "\n" \
               "Solution avg waiting time of nurse = " + str(solution_final.a_wait_n) + "\n" \
               "Solution total workload = " + str(solution_final.t_work) + "\n" \
               "Solution total waiting time = " + str(solution_final.t_wait) + "\n" \
               "Solution remaining demands = " + str(solution_final.rd)

    print(r_text)

然后设置200个episode跑一下:

Current Experimental Instance is A
Instance Scale: 61
Running time: 2.0 mins

Results:
Solution avg workload of nurse = 321.48
Solution avg waiting time of job = 4.21
Solution avg waiting time of nurse = 54.7
Solution total workload = 1285.93
Solution total waiting time = 218.81
Solution remaining demands = [3, 4, 1]

发现还是挺快的嘛,60个点的案例200次循环跑了2分钟。这个解已经是比较优的了,满足了52个需求,护工每天的平均等待时长有54分钟。大家可能会觉得每人每天8个小时工作时长中有快1个小时是在等待,太浪费时间了吧。其实,以需求为基准的平均等待时长是4分钟左右,是不是觉得还可以接受?

但是,护工们的平均服务时长距离总工时480分钟差了120分钟左右,肯定有的人做了很多事,有的人做了很少,说明护工资源过剩了。而且这个算法并没有考虑去balance各个护工的工作量,之前在算平均服务时长的时候也没有挑出那些比较“清闲”的护工。

无论如何,算法虽然得到了一个符合数学模型目标函数的解了,但怎么确定这个解的质量呢?这其实是一个挺复杂的问题。下一章要很学术了:如何评估多目标优化问题的算法和解?高能预警。


文章来源申明:本篇文章来自【学界】第七章:Python代码之模块整合与主程序
如需转载请联系原作者,获取转载须知

扫二维码关注『运筹OR帷幄』公众号:

点击查看『运筹OR帷幄』志愿者招募介绍及加入方式

点击查看【骥寻伯乐】板块说明及投稿方式

发布于 2018-10-07

文章被以下专栏收录