【学界】第六章:Python代码之蚁群算法部分

【学界】第六章:Python代码之蚁群算法部分

文章作者:@小杨
本文获得@小杨授权,原文链接:【学界】第六章:Python代码之蚁群算法部分
欢迎原链接转发,转载请私信 @留德华叫兽 获取信息,盗版必究。
敬请关注和扩散本专栏及同名公众号,会邀请全球知名学者发布运筹学、人工智能中优化理论等相关干货、知乎Live及行业动态:『运筹OR帷幄』大数据人工智能时代的运筹学


简介:作者是广东工业大学2016级工业工程系研究生,师从广东工业大学教授、博士生导师、《工业工程》杂志主编,陈庆新教授,广东工业大学讲师、港大博士,张婷。于研一至研二第一学期初次进行学术研究,发表一篇被EI期刊、国内Top工程期刊《系统工程理论与实践》收录,两篇国际学术会议文章(International Conference on Networking, Sensing and Control 2018,Healthcare Operations Management SIG 2018 Conference),完成一个国际会议poster演讲(Manufacturing and Service Operations Management Conference 2018 ),投稿一篇文章至Journal of Clean Production正在初审。目前研究兴趣主要集中在机器学习与运筹学的交叉应用领域。


这系列文章是个人的一些体会,也是对上一阶段工作的总结。由于是初次涉水,阅历尚浅,希望通过总结来反省,希望收获大神的意见~

研究课题:

[ 社区居家养老调度与路径规划问题 ]

[ Community Home Health Care Scheduling and Routing Problem ]

------本文目录------

第一、二、三章:论文选题、文献综述方法、问题分析、与模型假设

第四章:马氏决策过程&机会约束模型&多人随机路径规划

第五章:Python代码之模型与Q-Learning

第六章:Python代码之蚁群算法部分

第七章:Python代码之模块整合与主程序

第八章:Python代码之数据输出、调参与算法总结

------6 Python代码之蚁群算法与程序整合------

(写着写着发现蚁群算法部分太多了,另起一篇叙述。)

这篇文章主要从如何用Python实现算法的角度来总结。如标题所示,这个启发式算法结合了Q-learning和Ant Colony Optimization,其本质思想是强化学习的思想,即通过不断地试错来寻找可接受的近似最优解。算法的流程如下所示。

Outline of the QL-ACO algorithm

那么整个程序可以分为四部分:

Data Representation(DR) Best-Worst Ant Colony Optimization(BWACO) Q-Learning(QL) Results Exporting(RE)

所有需要用到的包:

import numpy as np
import copy
import pandas as pd
import math
import random
import time
import matplotlib.pyplot as plt
import xlwt

------6.1 蚁群算法实现与初始化------

蚁群算法是一种仿生学启发式算法,其核心思想在于模仿现实世界中蚂蚁寻找最优路径的生物机制,在1991年首次被提出(Colorni et al., 1991),如今已经产生了很多应用成果(Dorigo and Stützle, 2010),而BW这个加速收敛的方法则第一次出现在2000年(Cordon et al., 2000)。感兴趣的同学可以仔细研究一下参考文献:

  • Colorni, A., Dorigo, M., Maniezzo, V., 1991. Distributed optimization by ant colonies, Proceedings of the first European conference on artificial life. Paris, France, pp. 134-142.
  • Cordon, O., de Viana, I.F., Herrera, F., Moreno, L., 2000. A new ACO model integrating evolutionary computation concepts: The best-worst Ant System, From Ant Colonies to Artificial Ants: Second International Workshop on Ant Algorithms Brussels, Belgium.
  • Dorigo, M., Stützle, T., 2010. Ant colony optimization: overview and recent advances, Handbook of metaheuristics. Springer, pp. 227-263.

本文蚁群算法经过了修改以适应具体问题,流程为:

  1. 设置n只蚂蚁在起点,初始化每条边的phenomenon量
  2. 每只蚂蚁按照转移概率函数构建路径
  3. 根据等待时长挑出最优路径和最差路径,对每条在最优路径中的边,增加一个固定的phenomenon量,对最差路径中的则减去同等的量,并根据挥发系数减去一个固定百分比的量
  4. 重复2-3步直至maximum iteration,输出最优路径

*流程中第2步的概率转移函数:

Probability transition function
Explanations of the probability transition function

*流程中第3步的信息素更新方程:

Pheronmone updating function

已经可以列出蚁群算法总共需要调试的参数了:蚂蚁数量、初始信息素量、α、β、挥发系数ρ。那么这部分算法怎么实现呢?(修改完代码发现要讲明白还不容易 orz...)

函数包:

import numpy as np
import pandas as pd
import copy
import math
import random

把整个算法当成一个构建路径的函数,先确定输入输出。

输入:
CCP参数
ccp_acquaintance_increment
ccp_alpha
ccp_beta
ccp_waiting_limit
ccp_workload_limit
ACO参数
aco_alpha
aco_beta
aco_rho
aco_ant_number
aco_pheromone_matrix
实验数据常量
e_preference_matrix
e_distance_matrix
e_service_time_mean
e_walk_speed
被选中的护工对象(QL算法给出的action)
ql_nurse
满足技能匹配规则的所有可选目标
sd_matched_targets

输出:
一个包含最优路径的护工对象

先看一下算法流程图:

Ant Colony Optimization flow chart
def aco(acquaintance_increment, alpha_model_p, beta_model_p, waiting_limit, workload,
        alpha_aco_p, beta_aco_p, rho_aco_p, ant_num, pheromone_matrix,
        walk_speed, preference_matrix, distance_matrix, service_time_mean,
        nurse, sd_matched_targets, depot):
    # Output variables
    arrival_time_trace = []
    shortest_time = 0
    waiting_time = 0
    best_path = []
    # Temporal variables
    worst_path = []
    ccp_best_objective = 0
    ccp_worst_objective = 0

    for ant in range(ant_num):
        # Initialization: depot, time, waiting time, workload
        current_job = depot
        current_time = 0
        current_waiting = 0
        current_workload = 0
        # Lists for recording sub-arrival time and path
        sub_arrival_time = []
        ant_path_table = [depot]
        # Initialize visiting list and preference matrix
        visiting_list = copy.deepcopy(sd_matched_targets)
        current_preference = copy.deepcopy(preference_matrix)

        # Build routes
        while current_workload <= workload:
            # Read out service time mean value and preference value
            st_mean = service_time_mean[nurse.s][current_job.lv]
            preference_factor = copy.deepcopy(current_preference[current_job.e][nurse.l])

            # Inspect waiting and record sub-arrival time
            if current_time < current_job.twb:
                current_waiting += (current_job.twb - current_time)
                current_time = copy.deepcopy(current_job.twb)
                sub_arrival_time.append(copy.deepcopy(current_time))
            else:
                sub_arrival_time.append(copy.deepcopy(current_time))

            # Compute arrival time as predicted workload when going back to depot at current position
            # Then check if overwork occurs
            current_workload = current_time + (preference_factor * st_mean)\
                               + beta_model_p + (distance_matrix[current_job.e][depot.e]) / walk_speed
            if current_workload >= workload:
                # Overwork predicted, stop routing
                # Set depot as the next target and record arrival time
                ant_path_table.append(depot)
                current_time = copy.deepcopy(current_workload)
                sub_arrival_time.append(copy.deepcopy(current_time))
                break
            else:
                # Continue routing
                # Add up service time
                current_time += (preference_factor * st_mean) + alpha_model_p

            # Search for targets satisfying the time window constraint
            feasible_targets = collect_feasible_targets(visiting_list, distance_matrix, walk_speed, waiting_limit,
                                                        current_job, current_time)
            # Count feasible targets, calculate transition probabilities and choose target
            chosen_target = calculate_transition_probability(feasible_targets, current_time, distance_matrix,
                                                             current_job, walk_speed, sub_arrival_time, ant_path_table,
                                                             visiting_list, pheromone_matrix, alpha_aco_p, beta_aco_p,
                                                             depot)
            if chosen_target.l == 0:
                # No feasible target, stop routing
                break
            else:
                # Feasible target chosen, continue
                current_job = chosen_target
                # Revise preference
                current_preference[chosen_target.e][nurse.l] -= acquaintance_increment
                continue

        # Calculate fulfilled demands
        fulfilled_demand = copy.deepcopy(len(ant_path_table) - 2)
        # Record the best and worst solution according to the CCP objective
        if fulfilled_demand == 0:
            # no fulfilled demand
            if len(best_path) == 0:
                best_path = copy.deepcopy(ant_path_table)
                worst_path = copy.deepcopy(ant_path_table)
        else:
            # record current PPM objective: total waiting time
            ccp_objective = copy.deepcopy(current_waiting)
            if ant == 0:
                # first iteration, record best CCP objective, worst PPM objective, working time,
                # waiting time, best route, and worst route
                ccp_best_objective = copy.deepcopy(ccp_objective)
                ccp_worst_objective = copy.deepcopy(ccp_objective)
                shortest_time = copy.deepcopy(current_time)
                waiting_time = copy.deepcopy(current_waiting)
                best_path = copy.deepcopy(ant_path_table)
                worst_path = copy.deepcopy(ant_path_table)
                arrival_time_trace = copy.deepcopy(sub_arrival_time)
            else:  # not first iteration
                if ccp_best_objective > ccp_objective:  # find the best one
                    ccp_best_objective = copy.deepcopy(ccp_objective)
                    shortest_time = copy.deepcopy(current_time)
                    waiting_time = copy.deepcopy(current_waiting)
                    best_path = copy.deepcopy(ant_path_table)
                    arrival_time_trace = copy.deepcopy(sub_arrival_time)
                elif ccp_worst_objective < ccp_objective:  # find the worst one
                    ccp_worst_objective = copy.deepcopy(ccp_objective)
                    worst_path = copy.deepcopy(ant_path_table)
                else:
                    continue

        # update pheromone according to Best-Worst rule
        update_pheromone(best_path, worst_path, pheromone_matrix, rho_aco_p, distance_matrix)

    # update route
    nurse.tt = copy.deepcopy(shortest_time)
    nurse.aT = copy.deepcopy(arrival_time_trace)
    nurse.twt = copy.deepcopy(waiting_time)
    for o in range(len(best_path)):
        nurse.r.append(best_path[o])
        if best_path[o].lv == 1:
            nurse.sd[0] += 1
        elif best_path[o].lv == 2:
            nurse.sd[1] += 1
        elif best_path[o].lv == 3:
            nurse.sd[2] += 1
    if sum(nurse.sd) != 0:
        nurse.avg_w = copy.deepcopy(nurse.twt / sum(nurse.sd))

代码看了头晕的别急,我们来分解这个 def aco(): 函数。上面说我们有n只蚂蚁,所以就有n条路要来构建,所以就有了第一个for循环:

    for ant in range(ant_num):

而每一只蚂蚁都要在workload约束下适时结束旅途

*CCO,有疑问请→→第四章:马氏决策过程&机会约束模型&多人随机路径规划

于是就有了那个while循环:

# Build routes
        while current_workload <= workload:

对应的if判断语句都可以在流程图中找到,注意判断overwork(CCO约束)那个语句中,一旦条件成立需要跳出while循环结束当前路径,所以用了break。下面是关于选择下一个移动目标的两个函数:

def collect_feasible_targets(visiting_list, distance_matrix, walk_speed, waiting_limit, current_job, current_time):
    ft = []
    for j in range(len(visiting_list)):
        distance = distance_matrix[current_job.e][visiting_list[j].e]
        travel = distance / walk_speed
        arrival = current_time + travel
        # Arrival time must be earlier than the upper bound
        # and later than the maximum waiting time + lower bound
        if arrival < visiting_list[j].twe:
            if (arrival + waiting_limit) >= visiting_list[j].twb:
                ft.append(visiting_list[j])
                continue
        else:
            continue
    return ft

这个函数是将所有满足时间窗基本要求的目标挑出来,所以对所有可选目标计算到达时刻,返回值是一个列表。再把这个列表抛给选择移动目标的函数:

def calculate_transition_probability(feasible_targets, current_time, distance_matrix, current_job, walk_speed,
                                     sub_arrival_time, ant_path_table, visiting_list, pheromone_matrix,
                                     alpha_aco_p, beta_aco_p,
                                     depot):
    # Count feasible targets
    # =0: return depot as the next target
    # =1: return it as the next target
    # >2: return the target chosen according to probability transition function
    if (len(feasible_targets)) == 0:
        # No feasible targets, end routing
        current_time += (distance_matrix[current_job.e][depot.e]) / walk_speed
        sub_arrival_time.append(copy.deepcopy(current_time))  # record arrival time back to depot
        ant_path_table.append(depot)
        return depot
    elif len(feasible_targets) == 1:
        # Only one feasible target, choose it and update route
        ant_path_table.append(feasible_targets[0])
        current_time += (distance_matrix[current_job.e][feasible_targets[0].e]) / walk_speed
        # Remove chosen target from visiting list
        for v in range(len(visiting_list)):
            if visiting_list[v].l == feasible_targets[0].l:
                visiting_list.remove(visiting_list[v])
                return feasible_targets[0]
    else:
        # More than 1 feasible targets, calculate transition probabilities
        pr = []
        pD = 0
        for pdd in range(len(feasible_targets)):
            yitaD = distance_matrix[current_job.e][feasible_targets[pdd].e]
            pD += pow((pheromone_matrix[current_job.e][feasible_targets[pdd].e]), alpha_aco_p) \
                  * pow(yitaD, beta_aco_p)
        for pt in range(len(feasible_targets)):
            yitaU = distance_matrix[current_job.e][feasible_targets[pt].e]
            pU = pow((pheromone_matrix[current_job.e][feasible_targets[pt].e]), alpha_aco_p) \
                 * pow(yitaU, beta_aco_p)
            pT = pU / pD
            pr.append([current_job, feasible_targets[pt], pT])
            if math.isnan(pT):
                print(pT)
        # Choose target randomly and update route
        target_index = choose_target_randomly(pr)
        ant_path_table.append(pr[target_index][1])
        current_time += (distance_matrix[current_job.e][pr[target_index][1].e]) / walk_speed
        # Remove chosen target from visiting list
        for v in range(len(visiting_list)):
            if visiting_list[v].l == pr[target_index][1].l:
                visiting_list.remove(visiting_list[v])
                break
        return pr[target_index][1]

这个函数里要这里分三种情况讨论,即可选集里为空、或者有一个,或者多于一个,分别进行不同操作。

# Count feasible targets
# =0: return depot as the next target
# =1: return it as the next target
# >2: return the target chosen according to probability transition function

返回值是一个Job对象。

calculate_transition_probability()这个函数里调用了另一个小函数:

def choose_target_randomly(pr):
    p_coor = 0
    p_axi = [0]

    for pta in range(len(pr)):
        p_coor += pr[pta][2]
        p_axi.append(p_coor)
    # generate a random value
    ran_var = random.uniform(0, p_coor)
    search = (len(p_axi) // 2) - 1

    while True:
        if search == 0:
            return 0
        if ran_var <= p_axi[search]:
            if ran_var > p_axi[search - 1]:
                return (search - 1)
            else:
                search -= 1
        else:
            if ran_var <= p_axi[search + 1]:
                return search
            else:
                search += 1

这个小算法用来根据多个目标的转移概率选择移动目标,类似抽样。具体思想是把所有目标的转移概率加起来,成为一个区间,并以每个子区间表示一个目标。生成一个随机数,落在哪个区间里则返回对应的目标。实现起来返回的是一个索引值。

好了,到了这里程序基本完成了,最后一步是把解的信息存进作为输入的Nurse对象里面:

# update route
    nurse.tt = copy.deepcopy(shortest_time)
    nurse.aT = copy.deepcopy(arrival_time_trace)
    nurse.twt = copy.deepcopy(waiting_time)
    for o in range(len(best_path)):
        nurse.r.append(best_path[o])
        if best_path[o].lv == 1:
            nurse.sd[0] += 1
        elif best_path[o].lv == 2:
            nurse.sd[1] += 1
        elif best_path[o].lv == 3:
            nurse.sd[2] += 1

然后,进入马尔科夫链的下一个状态,算法转化至Q-Learning计算奖励值并进入下一个决策阶段。


文章来源申明:本篇文章来自【学界】第六章:Python代码之蚁群算法部分
如需转载请联系原作者,获取转载须知

扫二维码关注『运筹OR帷幄』公众号:

点击查看『运筹OR帷幄』志愿者招募介绍及加入方式

点击查看【骥寻伯乐】板块说明及投稿方式

发布于 2018-10-06

文章被以下专栏收录