CARVIEW |
Select Language
HTTP/2 302
date: Wed, 30 Jul 2025 12:09:28 GMT
content-type: text/html; charset=utf-8
content-length: 0
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
location: https://objects.githubusercontent.com/github-production-repository-file-5c1aeb/65711522/16140454?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20250730%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250730T120928Z&X-Amz-Expires=300&X-Amz-Signature=a0684ae0fed699003689ee5b1df98842b9ce06107420e7b37dc327020bd10255&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3Bfilename%3Dworkerlog_recompute_false.log&response-content-type=text%2Fx-log
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: origin-when-cross-origin, strict-origin-when-cross-origin
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
set-cookie: _gh_sess=hbVZFx3KvHt6vGudoxjjQATV%2BDiwEjNgfNwH2SB3831hfx5WcbKfAnz3s44aJxYTDHSGp0miVYpAQ4CUepYFHw%2FV5Wig%2BxBYasqb6Tih2YyMiUdi1AeOiH6FMi88789o%2FqCoLOSc2vZiUseNvuLVDnUukV4EdEn6zVyUnfKkUn1%2F759lkHkbzXu6XItZqfubt%2FtDClqwARmZVUURTgJcvCb8%2BAarutlW1xNfRaubyDTNCYHE9VNaL%2FHtA9vX%2Bs8l7NOfo6pU9c1iK4VJrCi80A%3D%3D--uiAqWpi%2F%2BmprFUSw--lS9lWqq4u0kFLz4dHnmk%2Fw%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.1018106697.1753877368; Path=/; Domain=github.com; Expires=Thu, 30 Jul 2026 12:09:28 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Thu, 30 Jul 2026 12:09:28 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: A02C:397358:19B7D6A:1E9054C:688A0B78
HTTP/2 200
content-type: text/x-log
last-modified: Tue, 04 Mar 2025 17:04:24 GMT
etag: "0x8DD5B3E9D6F3A99"
server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 15aa2615-d01e-007a-2a4a-0185c4000000
x-ms-version: 2025-11-05
x-ms-creation-time: Tue, 04 Mar 2025 17:04:24 GMT
x-ms-blob-content-md5: hh+z1EZ2nxfiXc9h17vGWg==
x-ms-lease-status: unlocked
x-ms-lease-state: available
x-ms-blob-type: BlockBlob
content-disposition: attachment;filename=workerlog_recompute_false.log
x-ms-server-encrypted: true
via: 1.1 varnish, 1.1 varnish
fastly-restarts: 1
accept-ranges: bytes
age: 0
date: Wed, 30 Jul 2025 12:09:29 GMT
x-served-by: cache-iad-kiad7000171-IAD, cache-bom-vanm7210031-BOM
x-cache: MISS, MISS
x-cache-hits: 0, 0
x-timer: S1753877368.393784,VS0,VE396
content-length: 38357
[33m[2024-07-09 16:01:48,803] [ WARNING][0m - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md[0m
[33m[2024-07-09 16:01:50,654] [ WARNING][0m - sharding_parallel_degree=1 means no sharding, please set sharding to empty![0m
[2024-07-09 16:01:50,654] [ INFO] distributed_strategy.py:214 - distributed strategy initialized
/home/llama-example/PaddleNLP/paddlenlp/trainer/training_args.py:1148: UserWarning: enable_mp_skip_c_identity only works with enable_mp_async_allreduce. It will not work.
warnings.warn(
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_embedding_deterministic', current_value=1, default_value=0)
FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='')
FLAGS(name='FLAGS_benchmark', current_value=True, default_value=False)
FLAGS(name='FLAGS_cudnn_deterministic', current_value=True, default_value=False)
=======================================================================
I0709 16:01:50.656653 29214 tcp_utils.cc:181] The server starts to listen on IP_ANY:48045
I0709 16:01:50.656841 29214 tcp_utils.cc:130] Successfully connected to 10.127.24.147:48045
I0709 16:01:53.631757 29214 process_group_nccl.cc:137] ProcessGroupNCCL pg_timeout_ 1800000
I0709 16:01:53.631801 29214 process_group_nccl.cc:138] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-07-09 16:01:53,632] [ INFO] topology.py:357 - Total 8 pipe comm group(s) create successfully!
W0709 16:01:53.635170 29214 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 11.8
W0709 16:01:53.671629 29214 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
W0709 16:01:53.671660 29214 gpu_resources.cc:196] WARNING: device: 0. The installed Paddle is compiled with CUDA 11.8, but CUDA runtime version in your machine is 11.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
/usr/local/lib/python3.8/dist-packages/paddle/distributed/communication/group.py:114: UserWarning: Current global rank 0 is not in group _default_pg10
warnings.warn(
[2024-07-09 16:01:56,169] [ INFO] topology.py:357 - Total 8 data comm group(s) create successfully!
I0709 16:01:56.169374 29214 process_group_nccl.cc:137] ProcessGroupNCCL pg_timeout_ 1800000
I0709 16:01:56.169389 29214 process_group_nccl.cc:138] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-07-09 16:01:56,169] [ INFO] topology.py:357 - Total 1 model comm group(s) create successfully!
[2024-07-09 16:01:56,169] [ INFO] topology.py:357 - Total 8 sharding comm group(s) create successfully!
I0709 16:01:56.169605 29214 process_group_nccl.cc:137] ProcessGroupNCCL pg_timeout_ 1800000
I0709 16:01:56.169611 29214 process_group_nccl.cc:138] ProcessGroupNCCL nccl_comm_init_option_ 0
I0709 16:01:56.169638 29214 process_group_nccl.cc:137] ProcessGroupNCCL pg_timeout_ 1800000
I0709 16:01:56.169642 29214 process_group_nccl.cc:138] ProcessGroupNCCL nccl_comm_init_option_ 0
[2024-07-09 16:01:56,169] [ INFO] topology.py:279 - HybridParallelInfo: rank_id: 0, mp_degree: 8, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1, 2, 3, 4, 5, 6, 7], sharding_group: [0], pp_group: [0], dp_group: [0], sep:group: None, check/clip group: [0, 1, 2, 3, 4, 5, 6, 7]
[32m[2024-07-09 16:01:56,170] [ INFO][0m - +==============================================================================+
| |
| DistributedStrategy Overview |
| |
+==============================================================================+
| a_sync=True <-> a_sync_configs |
+------------------------------------------------------------------------------+
| k_steps -1 |
| max_merge_var_num 1 |
| send_queue_size 16 |
| independent_recv_thread False |
| min_send_grad_num_before_recv 1 |
| thread_pool_size 1 |
| send_wait_times 1 |
| runtime_split_send_recv False |
| launch_barrier True |
| heter_worker_device_guard cpu |
| lr_decay_steps 10 |
| use_ps_gpu 0 |
| use_gpu_graph 0 |
+==============================================================================+
| Environment Flags, Communication Flags |
+------------------------------------------------------------------------------+
| mode 1 |
| elastic False |
| auto False |
| sync_nccl_allreduce True |
| nccl_comm_num 1 |
| use_hierarchical_allreduce False |
| hierarchical_allreduce_inter_nranks 1 |
| sync_batch_norm False |
| fuse_all_reduce_ops True |
| fuse_grad_size_in_MB 32 |
| fuse_grad_size_in_TFLOPS 50.0 |
| cudnn_exhaustive_search False |
| conv_workspace_size_limit 512 |
| cudnn_batchnorm_spatial_persistent False |
| fp16_allreduce False |
| last_comm_group_size_MB 1.0 |
| find_unused_parameters False |
| without_graph_optimization True |
| fuse_grad_size_in_num 8 |
| calc_comm_same_stream False |
| asp False |
| fuse_grad_merge False |
| semi_auto False |
| adam_d2sum False |
| auto_search False |
| heter_ccl_mode False |
| is_fl_ps_mode False |
| with_coordinator False |
| split_data True |
| downpour_table_param [] |
| fs_client_param |
+==============================================================================+
| Build Strategy |
+------------------------------------------------------------------------------+
| enable_sequential_execution False |
| fuse_elewise_add_act_ops False |
| fuse_bn_act_ops False |
| fuse_relu_depthwise_conv False |
| fuse_broadcast_ops False |
| fuse_all_optimizer_ops False |
| enable_inplace False |
| enable_backward_optimizer_op_deps True |
| cache_runtime_context False |
| fuse_bn_add_act_ops True |
| enable_auto_fusion False |
| enable_addto False |
| fix_op_run_order False |
| allow_cuda_graph_capture False |
| reduce_strategy 0 |
| fuse_gemm_epilogue False |
| debug_graphviz_path |
| fused_attention False |
| fused_feedforward False |
| fuse_dot_product_attention False |
| fuse_resunit False |
+==============================================================================+
[0m
[32m[2024-07-09 16:01:56,171] [ INFO][0m - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).[0m
[32m[2024-07-09 16:01:56,173] [ INFO][0m - The global seed is set to 42, local seed is set to 50 and random seed is set to 42.[0m
[35m[2024-07-09 16:01:56,173] [ DEBUG][0m - ============================================================[0m
[35m[2024-07-09 16:01:56,173] [ DEBUG][0m - Model Configuration Arguments [0m
[35m[2024-07-09 16:01:56,173] [ DEBUG][0m - paddle commit id : d730da7fd595167a37db02e21eb16759acfc5995[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - paddlenlp commit id : 3ebe938339860525b8818e8191421491fc889e24[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - attention_probs_dropout_prob : 0.1[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - continue_training : False[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - fuse_attention_ffn : True[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - fuse_attention_qkv : True[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - hidden_dropout_prob : 0.1[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - model_name_or_path : meta-llama/Llama-2-7b[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - num_hidden_layers : None[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - tokenizer_name_or_path : meta-llama/Llama-2-7b[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - use_fast_layer_norm : False[0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - [0m
[35m[2024-07-09 16:01:56,174] [ DEBUG][0m - ============================================================[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - Data Configuration Arguments [0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - paddle commit id : d730da7fd595167a37db02e21eb16759acfc5995[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - paddlenlp commit id : 3ebe938339860525b8818e8191421491fc889e24[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - data_cache : None[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - data_impl : mmap[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - input_dir : ./data[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - max_seq_length : 1024[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - share_folder : False[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - skip_warmup : True[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - split : 949,50,1[0m
[35m[2024-07-09 16:01:56,175] [ DEBUG][0m - [0m
[33m[2024-07-09 16:01:56,175] [ WARNING][0m - Process rank: 0, device: gpu, world_size: 8, distributed training: True, 16-bits training: True[0m
[32m[2024-07-09 16:01:56,176] [ INFO][0m - We are using to load 'meta-llama/Llama-2-7b'.[0m
[32m[2024-07-09 16:01:56,275] [ INFO][0m - We are using to load 'meta-llama/Llama-2-7b'.[0m
[32m[2024-07-09 16:01:56,276] [ INFO][0m - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-7b/config.json[0m
[32m[2024-07-09 16:01:56,276] [ INFO][0m - Reset vocab size to 32000 for batter amp peformance.[0m
Final pre-training config: LlamaConfig {
"alibi": false,
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"dtype": "float16",
"eos_token_id": 2,
"fuse_attention_ffn": true,
"fuse_attention_qkv": true,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"long_sequence_init_args": {},
"long_sequence_strategy_name": null,
"long_sequence_strategy_type": null,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 8,
"num_key_value_heads": 32,
"pad_token_id": 0,
"paddlenlp_version": "2.8.0.post",
"pretraining_tp": 1,
"recompute_granularity": "full_attn",
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_scaling_factor": 1.0,
"rope_scaling_type": null,
"rope_theta": 10000.0,
"seq_length": 1024,
"tensor_parallel_degree": 8,
"tie_word_embeddings": false,
"use_fast_layer_norm": false,
"use_fused_rms_norm": true,
"use_fused_rope": true,
"use_long_sequence_strategies": false,
"vocab_size": 32000
}
> datasets target sizes (minimum size):
train: 20
> building dataset index ...
Using old dataet (.npy & .npz)
> finished creating indexed dataset in 0.007425 seconds
number of documents: 100000
> dataset split:
train:
document indices in [0, 94900) total of 94900 documents
validation:
document indices in [94900, 99900) total of 5000 documents
test:
document indices in [99900, 100000) total of 100 documents
searching for causual dataset, build_indices=False, share_folder False, check_rank_flag False
build success
> loading doc-idx mapping from ./data/index-cache/1bea8719f54f99438baeed36e7ac872c_doc_idx.npy
> loading sample-idx mapping from ./data/index-cache/1bea8719f54f99438baeed36e7ac872c_sample_idx.npy
> loading shuffle-idx mapping from ./data/index-cache/1bea8719f54f99438baeed36e7ac872c_shuffle_idx.npy
loaded indexed file in 0.001 seconds
total number of samples: 115201
total number of epochs: 1
searching for causual dataset, build_indices=False, share_folder False, check_rank_flag False
build success
> loading doc-idx mapping from ./data/index-cache/99ec6fc470aba3021e18b725e5b6fd5a_doc_idx.npy
> loading sample-idx mapping from ./data/index-cache/99ec6fc470aba3021e18b725e5b6fd5a_sample_idx.npy
> loading shuffle-idx mapping from ./data/index-cache/99ec6fc470aba3021e18b725e5b6fd5a_shuffle_idx.npy
loaded indexed file in 0.001 seconds
total number of samples: 5192
total number of epochs: 1
searching for causual dataset, build_indices=False, share_folder False, check_rank_flag False
build success
> loading doc-idx mapping from ./data/index-cache/af257eb8ba49edf2b9f295b66b736861_doc_idx.npy
> loading sample-idx mapping from ./data/index-cache/af257eb8ba49edf2b9f295b66b736861_sample_idx.npy
> loading shuffle-idx mapping from ./data/index-cache/af257eb8ba49edf2b9f295b66b736861_shuffle_idx.npy
loaded indexed file in 0.001 seconds
total number of samples: 862
total number of epochs: 10
[32m[2024-07-09 16:01:58,702] [ INFO][0m - Sample data for train mode.[0m
[32m[2024-07-09 16:01:58,713] [ INFO][0m - growth for Russia (+0.94%) and a decline for Saudi Arabia (-2.0%). Most forecasts are predicting a flat production for Saudi Arabia around 10 mbpd. Russia will probably have a weak growth for the next few years (see Dave's excellent post and Ray Leonard's presentation).
A strong growth in the production of synthetic crude oil from the Canadian Tar sands may slow down the decline in the group III and create a kind of plateau until 2012-2015.
Future production from Iraq could be a key element but unfortunately this country will probably remains in turmoil for years.
Fig. 10- Production growth and decline within group II and III. The two dotted blue lines represents the new supply from the type II group that is required to meet the world demand growth.
Fig. 11- Yearly supply fluctuations in mbpd. The two dotted blue lines represents the new supply from the type II group that is necessary to meet the world demand growth.
1999 2005 2006 2007 2010 2012 2015 2020 Group III (Observed) 39.78 39.11 -2.20% -1.49% Logistic Group III 40.32 39.12 38.70 38.23 36.48 35.10 32.76 28.43 + 0.08% -0.91% -1.07% -1.23% -1.70% -1.99% -2.41% -3.04% Logistic Group III+Canadian Tar Sands 40.32 39.17 38.97 38.73 38.31 37.76 35.93 32.07 + 0.08% -0.80% -0.52% -0.61% -0.11% -0.80% -1.84% -2.29% World CO + NGPL (Observed) 72.50 81.25 81.20* -1.68% +1.10% -0.06%* CO + NGPL (1.5% Growth from 1999) 72.50 79.28 80.47 81.68 85.41 87.99 92.01 99.12 +1.50% + 1.50% +1.50% +1.50% +1.50% +1.50% + 1.50% Group II (Observed) 32.72 42.13 -1.04% +3.63% Group II (Requirement1) 32.18 40.15 41.76 43.45 48.92 52.89 59.25 70.69 +3.97% +4.01% +4.03% +4.02% +3.96% +3.80% +3.45% Group II + Tar Sands (Requirement1) 32.18 40.11 41.50 42.95 47.10 50.23 56.08 67.05 + 3.85% +3.47% +3.48% + 2.85% +3.30% +3.76% +3.42% Russia (Observed2) 6.31 9.50 9.59* +2.59% +0.94% Saudi Arabia (Observed2) 8.84 11.01 10.79* +5.76% -2.02%
Table III - Observed and projected production values (in mbpd) for Crude Oil + NGL. The second row for each category gives the[0m
[32m[2024-07-09 16:01:58,714] [ INFO][0m - The global seed is set to 42, local seed is set to 50 and random seed is set to 42.[0m
[32m[2024-07-09 16:01:58,807] [ INFO][0m - max_steps is given, it will override any value given in num_train_epochs[0m
[32m[2024-07-09 16:01:58,808] [ INFO][0m - Using half precision[0m
/home/llama-example/PaddleNLP/paddlenlp/trainer/trainer.py:489: VisibleDeprecationWarning: [93m
Warning:
API "paddle.distributed.fleet.utils.mix_precision_utils.MixPrecisionScaler" is deprecated since 2.5.0, and will be removed in future versions. Please use "paddle.distributed_scaler" instead. [0m
mix_precision_utils.MixPrecisionScaler(self.scaler) # retun value has no use
[35m[2024-07-09 16:01:58,818] [ DEBUG][0m - ============================================================[0m
[35m[2024-07-09 16:01:58,818] [ DEBUG][0m - Training Configuration Arguments [0m
[35m[2024-07-09 16:01:58,818] [ DEBUG][0m - paddle commit id : d730da7fd595167a37db02e21eb16759acfc5995[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - paddlenlp commit id : 3ebe938339860525b8818e8191421491fc889e24[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - _no_sync_in_gradient_accumulation: True[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - adam_beta1 : 0.9[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - adam_beta2 : 0.999[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - adam_epsilon : 1e-08[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - amp_custom_black_list : None[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - amp_custom_white_list : None[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - amp_master_grad : True[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - autotuner_benchmark : False[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - bf16 : False[0m
[35m[2024-07-09 16:01:58,819] [ DEBUG][0m - bf16_full_eval : False[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - context_parallel_degree : 1[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - current_device : gpu:0[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - data_parallel_config : [0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - data_parallel_degree : 1[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - data_parallel_rank : 0[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - dataloader_drop_last : False[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - dataloader_num_workers : 1[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - dataset_rank : 0[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - dataset_world_size : 1[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - decay_steps : 5[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - device : gpu[0m
[35m[2024-07-09 16:01:58,820] [ DEBUG][0m - disable_tqdm : True[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - distributed_dataloader : False[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - do_eval : False[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - do_export : False[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - do_predict : False[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - do_train : True[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - enable_auto_parallel : False[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - enable_linear_fused_grad_add : True[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - eval_accumulation_steps : None[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - eval_batch_size : 8[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - eval_iters : 10[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - eval_steps : 1000[0m
[35m[2024-07-09 16:01:58,821] [ DEBUG][0m - evaluation_strategy : IntervalStrategy.NO[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - flatten_param_grads : False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - force_reshard_pp : False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - fp16 : True[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - fp16_full_eval : False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - fp16_opt_level : O2[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - fuse_sequence_parallel_allreduce: False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - gradient_accumulation_steps : 4[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - greater_is_better : None[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - hybrid_parallel_topo_order : pp_first[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - ignore_data_skip : False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - ignore_load_lr_and_optim : False[0m
[35m[2024-07-09 16:01:58,822] [ DEBUG][0m - ignore_save_lr_and_optim : False[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - label_names : None[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - lazy_data_processing : True[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - learning_rate : 1e-05[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - load_best_model_at_end : False[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - load_sharded_model : False[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - local_process_index : 0[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - local_rank : 0[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - log_level : -1[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - log_level_replica : -1[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - log_on_each_node : True[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - logging_dir : output/qwen_7b_test/runs/Jul09_16-01-50_yq01-sys-hic-k8s-v100-box-a225-0491.yq01.baidu.com[0m
[35m[2024-07-09 16:01:58,823] [ DEBUG][0m - logging_first_step : False[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - logging_steps : 1[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - logging_strategy : IntervalStrategy.STEPS[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - logical_process_index : 0[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - lr_end : 1e-07[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - lr_scheduler_type : SchedulerType.LINEAR[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - max_evaluate_steps : -1[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - max_grad_norm : 0.0[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - max_steps : 5[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - metric_for_best_model : None[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - min_learning_rate : 5e-06[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - minimum_eval_times : None[0m
[35m[2024-07-09 16:01:58,824] [ DEBUG][0m - no_cuda : False[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - no_recompute_layers : None[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - num_cycles : 0.5[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - num_train_epochs : 1.0[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - optim : OptimizerNames.ADAMW[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - optimizer_name_suffix : tp00[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - output_dir : output/qwen_7b_test[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - overwrite_output_dir : False[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - past_index : -1[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - per_device_eval_batch_size : 8[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - per_device_train_batch_size : 1[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - pipeline_parallel_config : enable_delay_scale_loss enable_sharding_comm_overlap enable_release_grads[0m
[35m[2024-07-09 16:01:58,825] [ DEBUG][0m - pipeline_parallel_degree : 1[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - pipeline_parallel_rank : 0[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - power : 1.0[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - pp_recompute_interval : 1[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - prediction_loss_only : False[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - process_index : 0[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - recompute : False[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - recompute_granularity : full_attn[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - recompute_use_reentrant : False[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - remove_unused_columns : True[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - report_to : ['visualdl'][0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - resume_from_checkpoint : None[0m
[35m[2024-07-09 16:01:58,826] [ DEBUG][0m - run_name : output/qwen_7b_test[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - save_on_each_node : False[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - save_sharded_model : False[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - save_steps : 5000[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - save_strategy : IntervalStrategy.STEPS[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - save_total_limit : None[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - scale_loss : 1024.0[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - seed : 42[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sep_parallel_degree : 1[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sequence_parallel : False[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sequence_parallel_config : [0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sharding : [][0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sharding_degree : -1[0m
[35m[2024-07-09 16:01:58,827] [ DEBUG][0m - sharding_parallel_config : split_param,enable_stage1_overlap[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - sharding_parallel_degree : 1[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - sharding_parallel_rank : 0[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_load_dataset : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_load_sharding_stage1_model: False[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_log : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_save : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_save_model_state : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - should_save_sharding_stage1_model: False[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - skip_memory_metrics : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - skip_profile_timer : True[0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - tensor_parallel_config : enable_delay_scale_loss enable_mp_skip_c_identity [0m
[35m[2024-07-09 16:01:58,828] [ DEBUG][0m - tensor_parallel_degree : 8[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - tensor_parallel_output : True[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - tensor_parallel_rank : 0[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - test_iters : 100[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - to_static : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - train_batch_size : 1[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - unified_checkpoint : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - unified_checkpoint_config : [0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - use_async_save : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - use_expert_parallel : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - use_flash_attention : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - use_fused_dropout_add : False[0m
[35m[2024-07-09 16:01:58,829] [ DEBUG][0m - use_fused_linear : False[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - use_fused_rms_norm : True[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - use_fused_rope : True[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - use_hybrid_parallel : True[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - virtual_pp_degree : 1[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - wandb_api_key : None[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - warmup_ratio : 0.01[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - warmup_steps : 0[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - weight_decay : 0.01[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - weight_name_suffix : tp00[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - world_size : 8[0m
[35m[2024-07-09 16:01:58,830] [ DEBUG][0m - [0m
[32m[2024-07-09 16:01:58,830] [ INFO][0m - Starting training from resume_from_checkpoint : None[0m
[2024-07-09 16:01:58,844] [ INFO] tensor_parallel.py:33 - start broadcast mp parameters
[2024-07-09 16:01:59,457] [ INFO] tensor_parallel.py:48 - mp's parameters is ready
[32m[2024-07-09 16:01:59,457] [ INFO][0m - [timelog] checkpoint loading time: 0.00s (2024-07-09 16:01:59) [0m
[32m[2024-07-09 16:01:59,457] [ INFO][0m - ***** Running training *****[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Num examples = 115,200[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Num Epochs = 1[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Instantaneous batch size per device = 1[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Total train batch size (w. parallel, distributed & accumulation) = 4[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Gradient Accumulation steps = 4[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Total optimization steps = 5[0m
[32m[2024-07-09 16:01:59,458] [ INFO][0m - Total num train samples = 20[0m
[35m[2024-07-09 16:01:59,459] [ DEBUG][0m - Number of trainable parameters = 235,212,800 (per device)[0m
[35m[2024-07-09 16:01:59,460] [ DEBUG][0m - Number of trainable parameters = 1,881,702,400 (all devices, roughly)[0m
[32m[2024-07-09 16:02:01,553] [ INFO][0m - loss: 11.1670723, learning_rate: 8.03e-06, global_step: 1, current_memory_allocated: 3.958691358566284, current_memory_reserved: 4.367553949356079, max_memory_allocated: 3.958693504333496, max_memory_reserved: 4.367553949356079, interval_runtime: 2.0923, interval_samples_per_second: 1.9117, interval_tokens_per_second_per_device: 244.7012, interval_steps_per_second: 0.4779, progress_or_epoch: 0.0[0m
[32m[2024-07-09 16:02:01,855] [ INFO][0m - loss: 10.97811699, learning_rate: 7.02e-06, global_step: 2, current_memory_allocated: 3.958691358566284, current_memory_reserved: 4.608276605606079, max_memory_allocated: 4.552124738693237, max_memory_reserved: 4.608276605606079, interval_runtime: 0.3016, interval_samples_per_second: 13.2632, interval_tokens_per_second_per_device: 1697.6853, interval_steps_per_second: 3.3158, progress_or_epoch: 0.0001[0m
[32m[2024-07-09 16:02:02,099] [ INFO][0m - loss: 10.78742027, learning_rate: 6.01e-06, global_step: 3, current_memory_allocated: 3.958691358566284, current_memory_reserved: 4.608276605606079, max_memory_allocated: 4.552124738693237, max_memory_reserved: 4.608276605606079, interval_runtime: 0.2444, interval_samples_per_second: 16.369, interval_tokens_per_second_per_device: 2095.2327, interval_steps_per_second: 4.0923, progress_or_epoch: 0.0001[0m
[32m[2024-07-09 16:02:02,343] [ INFO][0m - loss: 10.56037617, learning_rate: 5e-06, global_step: 4, current_memory_allocated: 3.958691358566284, current_memory_reserved: 4.608276605606079, max_memory_allocated: 4.552124738693237, max_memory_reserved: 4.608276605606079, interval_runtime: 0.2441, interval_samples_per_second: 16.388, interval_tokens_per_second_per_device: 2097.6641, interval_steps_per_second: 4.097, progress_or_epoch: 0.0001[0m
[32m[2024-07-09 16:02:02,586] [ INFO][0m - loss: 10.33168793, learning_rate: 5e-06, global_step: 5, current_memory_allocated: 3.958691358566284, current_memory_reserved: 4.608276605606079, max_memory_allocated: 4.552124738693237, max_memory_reserved: 4.608276605606079, interval_runtime: 0.2427, interval_samples_per_second: 16.4822, interval_tokens_per_second_per_device: 2109.7176, interval_steps_per_second: 4.1205, progress_or_epoch: 0.0002[0m
[32m[2024-07-09 16:02:02,586] [ INFO][0m -
Training completed.
[0m
[32m[2024-07-09 16:02:02,587] [ INFO][0m - train_runtime: 3.1263, train_samples_per_second: 6.3973, train_steps_per_second: 1.5993, train_loss: 10.764934730529784, progress_or_epoch: 0.0002[0m
[32m[2024-07-09 16:02:02,587] [ INFO][0m - Saving model checkpoint to output/qwen_7b_test[0m
[32m[2024-07-09 16:02:02,588] [ INFO][0m - tokenizer config file saved in output/qwen_7b_test/tokenizer_config.json[0m
[32m[2024-07-09 16:02:02,588] [ INFO][0m - Special tokens file saved in output/qwen_7b_test/special_tokens_map.json[0m
[32m[2024-07-09 16:02:02,594] [ INFO][0m - Configuration saved in output/qwen_7b_test/config.json[0m
[32m[2024-07-09 16:02:02,595] [ INFO][0m - Configuration saved in output/qwen_7b_test/generation_config.json[0m
[32m[2024-07-09 16:02:04,579] [ INFO][0m - Model weights saved in output/qwen_7b_test/model_state.tp00.pdparams[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - ***** train metrics *****[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - progress_or_epoch = 0.0002[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - train_loss = 10.7649[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - train_runtime = 0:00:03.12[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - train_samples_per_second = 6.3973[0m
[32m[2024-07-09 16:02:04,580] [ INFO][0m - train_steps_per_second = 1.5993[0m
Effective Tokens per second: 6550.87
ips: 6550.87 tokens/s