用tensorflow-gpu训练自己的神经网络(错误总结)

在我用tensorflow-gpu训练自己的神经网络的过成中遇到的错误总结

错误的总结主要是配合下面的文章顺序进行的:

品颜完月:用tensorflow-gpu训练自己的神经网络Win10zhuanlan.zhihu.com图标

错误一:

在进行:三、生成可训练数据 发生的

2、.csv --> .record:
在控制台运行如下代码:

python generate_tfrecord.py --csv_input=my_data\train_labels.csv  --output_path=my_data\train.record
python generate_tfrecord.py --csv_input=my_data\test_labels.csv  --output_path=my_data\test.record

其中generate_tfrecord.py 代码下载地址如下:

datitran/raccoon_datasetgithub.com

下载后,我把generate_tfrecord.py 文件放在了这个目录下:D:\Documents\GitHub\my_OpenCV\models-master\research
出现错误1,如下图所示:

 File "D:\Anaconda3\envs\my_openCV\lib\site-packages\tensorflow\python\framewor
k\errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile faile
d to Create/Open: D:\Documents\GitHub\my_OpenCV\models-master\research\my_data\2
8.jpg : \u03f5\u0373\udcd5\u04b2\udcbb\udcb5\udcbd\u05b8\udcb6\udca8\udcb5\udcc4
\udcce\u013c\udcfe\udca1\udca3
; No such file or directory

解决方案1:修改generate_tfrecord.py里图像的地址(改成存放所有原始图片的地址)

解决方案2:不修改程序把存放文件夹的名字改为images

解决方案3:从下面的网址下载generate_tfrecord.py文件,因为这里的generate_tfrecord.py文件做了相应的通用性修改

eric-erki/How-To-Train-an-Object-Detection-Classifier-for-Multiple-Objects-Using-TensorFlow-GPU-on-Windows-1github.com图标

比如此处的修改如下图所示:

错误二:

在进行:三、生成可训练数据 2、.csv --> .record:发生的

出现错误,如下图所示:

 File "D:\Documents\GitHub\my_OpenCV\models-master\research\object_detection\ut
ils\dataset_util.py", line 26, in int64_list_feature
    return tf.train.Feature(int64_list=tf.train.Int64List(value=value))
TypeError: None has type NoneType, but expected one of: int, long

解决办法:
在用labeling标注的时候,由于个人不细心的原因有两个图中把标签‘red light’,写成了 ‘right light’,后来仔细观察发现之后修改过来,在从新运行‘把.xml文件转成csv文件’的程序,就可以直接运行下面的程序,出结果了。

错误三:
五、训练 时发生的

python train.py --logtostderr --train_dir=D:/Documents/GitHub/my_OpenCV/models-master/research/my_data/training/ --pipeline_config_path=D:/Documents/GitHub/my_OpenCV/models-master/research/my_data/training/faster_rcnn_inception_v2_pets.config

报错1:

解决方案参考:

new object_detection error from today's commis: 'ManualStepLearningRate' object has no attribute 'warmup' · Issue #3705 · tensorflow/modelsgithub.com图标

修改models/research/object_detection/utils/learning_schedules.pylines 167-169.
修改前:

修改后:

错误四:

五、训练 时发生的

报错:

解决方案:路径中多了一个空格,删除空格即可

错误五:

五、训练 时发生的

报错如下:

错误提示详情:

(base) D:\tensorflow1\models\research\object_detection>python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/rfcn_resnet101_coco.config
WARNING:tensorflow:From D:\tensorflow1\models\research\object_detection\trainer.py:260: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:depth of additional conv before box predictor: 0
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
WARNING:tensorflow:From D:\tensorflow1\models\research\object_detection\utils\ops.py:665: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From D:\tensorflow1\models\research\object_detection\core\losses.py:317: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

F:\my_install\Anaconda3\lib\site-packages\tensorflow\python\ops\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
WARNING:tensorflow:From D:\tensorflow1\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py:2017: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
Traceback (most recent call last):
  File "train.py", line 184, in <module>
    tf.app.run()
  File "F:\my_install\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
    _sys.exit(main(argv))
  File "train.py", line 180, in main
    graph_hook_fn=graph_rewriter_fn)
  File "D:\tensorflow1\models\research\object_detection\trainer.py", line 391, in train
    include_global_step=False))
  File "D:\tensorflow1\models\research\object_detection\utils\variables_helper.py", line 126, in get_variables_available_in_checkpoint
    ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
  File "F:\my_install\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 287, in NewCheckpointReader
    return CheckpointReader(compat.as_bytes(filepattern), status)
  File "F:\my_install\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on D:    ensorflow1\fcn_resnet101_coco_2018_01_28 : \udcce\u013c\udcfe\udcc3\udcfb\udca1\udca2\u013f\xbc\udcc3\udcfb\udcbb\udcf2\udcbe\udced\udcb1\udcea\udcd3\ufde8\udcb2\udcbb\udcd5\udcfd\u0237\udca1\udca3
; Unknown error

错误原因 :在换用其他模型时,配置文件时 在路径上直接复制了文件夹的路径,最后 发现 是路径 的 斜杠 和反斜杠的问题(自己粗心大意)。

改正之后就能正常运行了:

编辑于 2018-10-03

文章被以下专栏收录