CARVIEW |
Select Language
HTTP/2 200
date: Sat, 11 Oct 2025 12:29:00 GMT
content-type: text/html; charset=utf-8
cache-control: max-age=0, private, must-revalidate
cf-cache-status: DYNAMIC
link: ; rel=preload; as=style; nopush,; rel=preload; as=script; nopush,; rel=preload; as=style; nopush,; rel=preload; as=script; nopush,; rel=preload; as=script; nopush
nel: {"report_to":"heroku-nel","response_headers":["Via"],"max_age":3600,"success_fraction":0.01,"failure_fraction":0.1}
referrer-policy: strict-origin-when-cross-origin
report-to: {"group":"heroku-nel","endpoints":[{"url":"https://nel.heroku.com/reports?s=rFWWCAgWBWMXjpDGFo1E0oPcvGWSOpuzdwC4XUcm%2F%2Bk%3D\u0026sid=e11707d5-02a7-43ef-b45e-2cf4d2036f7d\u0026ts=1760185740"}],"max_age":3600}
reporting-endpoints: heroku-nel="https://nel.heroku.com/reports?s=rFWWCAgWBWMXjpDGFo1E0oPcvGWSOpuzdwC4XUcm%2F%2Bk%3D&sid=e11707d5-02a7-43ef-b45e-2cf4d2036f7d&ts=1760185740"
server: cloudflare
strict-transport-security: max-age=0; includeSubDomains
vary: Accept,Accept-Encoding
via: 2.0 heroku-router
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
x-request-id: dbdda884-f37f-41de-f06a-41116b934e2f
x-runtime: 0.171384
x-xss-protection: 0
content-encoding: gzip
set-cookie: _secure_speakerd_session=U6gUIVIOHGEEOlTEpX2N7OMSlYbLZR0LZdOiFAt7%2BoYPcbl18Y91UQrzYc7cIGbOzjhTuscZr0vgvpJM5gIkrTo1HPiXzeZRsqrPz1gRTB9hBQTaB9U3iYRFEE5W%2BqeX5FChjzBr63oBdIDE5hTxKFZETC%2BZ0Ygl8Jj0pTrfIrU0tYuKbtr3LfiXUMmzgfqhZinQtzzl5KDLcszCndJ%2B8HN20ykQ%2FYKK0FRP1RYIXlYuu0dcGvPFsPutxypmtSN3pod8MQ%2FRLhD2SYSsb6FodS%2B5zSpDuS8uDLBG42v3GYumk774kTcn07M45gsfQ1XAeaS610pWxODrgSYUz3rpNJX1%2Fsc88kBVFsmoynhhb1lg7U4DmWHUVgQg%2B7uzsylRF2%2F743Ej6aJHTN%2B3qxJQnR0K--GR0mdxIb1WMEvnGz--IiZIU8n5nCYCj%2Ffajen8eQ%3D%3D; HttpOnly; SameSite=Lax; Secure; Path=/; Expires=Sat, 25 Oct 2025 12:29:00 GMT
cf-ray: 98ce5c46eba8c1c9-BLR
博士論文公聴会: Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage, and Mining / PhD Defence - Speaker Deck
博士論文公聴会: Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage, and Mining / PhD Defence
博士学位論文 公聴会(本審査)
京都大学大学院情報学研究科 知能情報学専攻
坪内 佑樹
Yuuki Tsubouchi (yuuk1)
February 25, 2025
More Decks by Yuuki Tsubouchi (yuuk1)
Other Decks in Research
Featured
Transcript
-
Scaling Telemetry Workloads in Cloud Applications: Techniques for Instrumentation, Storage,
and Mining ژେֶେֶӃ ใֶݚڀՊ ೳใֶઐ߈ 20252݄18 തֶ࢜Ґจެௌձ ௶ ༎थ -
4 ΦϯϥΠϯαʔϏεͷར༻ऀʹର͢Δ৴པੑ্ ΞϓϦέʔγϣϯ γεςϜ ٕज़ऀ ར༻ऀ Πϯλʔωοτ ӡ༻ࢹͷͨΊͷ γεςϜ շదͳαʔϏεར
༻ͷͨΊͷ৴པੑ ͷ্ ܭଌ อଘ ੳ ࡾߏʹׂ͞ΕΔ σʔλऩू -
5 ΞϓϦέʔγϣϯ γεςϜ ٕज़ऀ ར༻ऀ Πϯλʔωοτ ӡ༻ࢹͷͨΊͷ γεςϜ շదͳαʔϏεར ༻ͷͨΊͷ৴པੑ
ͷ্ ܭଌ อଘ ੳ ܭࢉࢿݯ ෛՙ૿େ ӡ༻ෛՙͷ ૿େ ߩݙᶃ ߩݙᶄ ߩݙᶅ ӡ༻ͷͨΊͷσʔλऩूෛՙͷ૿େʹର͢Δٕज़ఏҊ σʔλऩू -
7 1. ͡Ίʹ 2. OSΧʔωϧܭ๏ͷఏҊʢߩݙᶃʣ 3. ετϨʔδΞʔΩςΫνϟߏ๏ͷఏҊʢߩݙᶄʣ 4. ނোࣗಈಛఆͷલॲཧ๏ͷఏҊʢߩݙᶅʣ 5.
૯ׅ ༧උ৹ࠪޙͷओͳमਖ਼Օॴ (P14-15) ςϨϝτϦʔͷఆٛͱϢʔεέʔεͷه (P43) LinuxΧʔωϧͷฒߦ੍ޚʹ ىҼ͢ΔΦʔόϔουͷٞͷه (P26) ຊݚڀͷ࢈ۀͷߩݙͷه (P93) ̏ͭͷݸผͷߩݙΛ௨ఈ͢ Δ݁ͷه (P89) ࣌ܥྻղੳ๏ͱͯ͠ͷ ԣஅͷద༻ੑ ʢจ:p.1-2, 14) ʢจ:p. 91-92) ʢจ:p. 4) ʢจ:p. 31ʣ ʢจ:p. 86-87ʣ -
9 ΫϥυίϯϐϡʔςΟϯάͷීٴ Cloud ΦϯϥΠϯαʔϏεࣄۀऀ ΫϥυڥʹΞϓϦέʔγϣϯΛ ߏங͠ɺΠϯλʔωοτΛհͯ͠ɺ ར༻ऀʹαʔϏεΛఏڙɻ ɾιʔγϟϧωοτϫʔΩϯά ɾEίϚʔε ɾΦϯϥΠϯήʔϜ
ɾϝσΟΞ৴ ɾϖΠϝϯτ ɾIoT ɾ… Applications Datacenters (ར༻ऀ) എܠ -
12 ΫϥυΞϓϦέʔγϣϯͷجຊΞʔΩςΫνϟ എܠ ᶃ ᶄ ᶅ ᶆ ϦΫΤετॲཧͷܦ࿏ͷҰྫ Fig. 2.1
ϦΫΤετɾϨεϙϯεܕͷܗଶɻ τϥϯεϙʔτଓΛऴ͠தܧ͢Δɻ -
13 ΫϥυΞϓϦέʔγϣϯͷ৴པੑ എܠ ར༻ऀͷշదͳαʔϏεར༻ͷͨΊʹߴ͍৴པੑ͕ཁٻ͞ΕΔɻ 1,819ݸͷγεςϜোͷ͏ͪ47%͕ղܾ·Ͱʹ2࣌ؒҎ্ཁ͢Δɻ มߋىҼͷোͷׂ߹͕શମͷ49.5%ΛΊΔɻ [58] [13] োͷ Өڹ
োͷ τϦΨʔ ɾ ΞϓϦέʔγϣϯίʔυઃఆϑΝΠϧɺج൫γεςϜͷมߋͳͲ োͷൃੜΛલఏʹӨڹΛ͍͔ʹݮ͢Δ͔ʹԠ͑ΔΞϓϩʔν͕ීٴɻ ΦϖϨʔλʔͷରԠؚΊͨϑΥʔϧττϨϥϯε͕ॏཁɻ [14] 24࣌ؒ365ͷՄ༻ੑɺԆԠͳͲɻ -
14 ɾଆͷނোʢFaultʣͷӨڹ͕Α Γ֎ଆ·Ͱٴ͢Δ ɾ֤ͷϑΥʔϧττϨϥϯεػ ߏʹΑΓɺͦͷٴΛ͑Δ 3ͷϑΥʔϧττϨϥϯε Fig. 2.2: [60]ͷFigure 1-1Λجʹվม
എܠ ࠷֎ʹண ɾোͷݕɾݪҼಛఆɾճ෮ ɾऩ༰ೳྗͷ૿ڧ ɾ… -
15 ࢹͱੳͷͨΊʹɺγεςϜɺΞϓϦέʔγϣϯɺαʔϏε͔Βԕִɺ ੑೳར༻ʹؔ͢ΔσʔλΛࣗಈͰऩू͠ɺૹ৴͢Δɻ ςϨϝτϦʔʹΑΔγεςϜͷࢹ ܭثͷಡΈऔΓΛه͠ɺૹ৴͢Δϓϩηεɻ Ұൠతͳఆٛ ຊݚڀʹ͓͚Δఆٛ ԕִ ܭث ૹ৴
ੳ ༧උ৹ࠪࢦఠࣄ߲ ɾ ཧతͳػثΛࢹ͢Δ͜ͱͰಘΒΕΔใݶఆతͰ͋Δɻ ɾϋʔυΣΞɾιϑτΣΞɾωοτϫʔΫ௨৴ͷཧతͳঢ়ଶΛࢹ͢Δɻ ςϨϝτϦʔ [62,63,64] -
16 ओཁͳςϨϝτϦʔσʔλ Time-oriented Path-oriented ʢϝτϦΫεʣ จࣈྻʢϩάʣ τϨʔε ͋Δ࣌ͰͷγεςϜͷੑೳΛఆྔత ʹଌఆͨ͠ɻ ݻఆִ࣌ؒؒͰαϯϓϦϯά͞ΕΔɻ
ྫʣ CPUར༻ɺϦΫΤετԠ࣌ؒ γεςϜͰൃੜ͢ΔΠϕϯτͷඇߏ Խ͞ΕͨจࣈྻʹΑΔه ྫʣΤϥʔϝοηʔδɺϢʔβʔΞΫ ςΟϏςΟɺγεςϜૢ࡞ͳͲ γεςϜΛ௨ա͢ΔҰ࿈ͷॲཧ௨৴ ͷྲྀΕΛදݱ͢ΔߏԽ͞Εͨσʔλɻ എܠ ಛʹωοτϫʔΫ௨৴ʹؔΘΔτϨʔε ɾ্ҐɿϦΫΤετཻ ɾԼҐɿϑϩʔཻ ߩݙᶄͱᶅ ߩݙᶃ -
17 ओཁͳςϨϝτϦʔσʔλʢϝτϦΫεʣ Time-oriented Topology-oriented Data ʢϝτϦΫεʣ จࣈྻʢϩάʣ τϨʔε ͋Δ࣌ͰͷγεςϜͷੑೳΛఆྔత ʹଌఆͨ͠ɻ
ݻఆִ࣌ؒؒͰαϯϓϦϯά͞ΕΔɻ ྫʣ CPUར༻ɺϦΫΤετԠ࣌ؒ ྫʣΤϥʔϝοηʔδɺϢʔβʔΞΫ ςΟϏςΟɺγεςϜૢ࡞ͳͲ - ϦΫΤετཻʢΞϓϦʣ - ϑϩʔ·ͨύέοτཻʢΠϯϑϥʣ γεςϜΛ௨ա͢ΔҰ࿈ͷॲཧ௨৴ ͷྲྀΕΛදݱ͢ΔߏԽ͞Εͨσʔλย ͷू߹ എܠ cpu_seconds{instance=host1,…} λΠϜελϯϓͱͷͷྻͰදݱ͞ΕΔ ྫɿ[(1709298600, 29851.26), …] γεςϜͰൃੜ͢ΔΠϕϯτͷඇߏ Խ͞ΕͨจࣈྻʹΑΔهɻ -
18 ओཁͳςϨϝτϦʔσʔλʢτϨʔεʣ Path-oriented τϨʔε γεςϜΛ௨ա͢ΔҰ࿈ͷॲཧ௨৴ ͷྲྀΕΛදݱ͢ΔߏԽ͞Εͨσʔλ എܠ ಛʹωοτϫʔΫ௨৴ʹؔΘΔτϨʔε ɾ্ҐɿϦΫΤετཻ ɾԼҐɿϑϩʔཻ
B C D A ίʔϧάϥϑ 10.0.10.1:80 10.0.20.1:3306 listen port 80 3306 9200 9092 10.0.30.1:9200 10.0.40.1:9092 ʢຊݚڀର֎ʣ -
19 ςϨϝτϦʔγεςϜ ܭଌ ʢInstrumentationʣ ετϨʔδ ʢStorageʣ ϚΠχϯά ʢMiningʣ ຊݚڀͰ̏֊ʹ ׂ͢Δɻ
എܠ Fig. 2.3: Overview of one possible telemetry system. -
20 ΞϓϦέʔγϣϯγ εςϜʹܭث͕Έ ࠐ·ΕΔɻ தԝͷετϨʔδ σʔλ͕ૹ৴͞ΕΔɻ ςϨϝτϦʔγεςϜɿܭଌʢInstrumentationʣ എܠ Fig. 2.3:
Overview of one possible telemetry system. -
21 ϚΠχϯά͔ΒDB ʹඞཁͳσʔλ͕ ͍߹Θͤ͞ΕΔɻ ૹ৴͞Εͨσʔλ DBγεςϜʹऔ Γࠐ·ΕΔɻ ςϨϝτϦʔγεςϜɿετϨʔδʢStorageʣ എܠ Fig.
2.3: Overview of one possible telemetry system. -
22 ςϨϝτϦʔγεςϜɿϚΠχϯάʢMiningʣ ՄࢹԽ͞ΕͨϏϡʔ ͱҟৗͷൃੜΛࣔ͢ ΞϥʔτΛఏڙɻ ػցֶशʹΑΔσʔλͷࣗ ಈղੳثΛ௨ͯ͠ΦϖϨʔ λʔͷෛ୲Λݮɻ ʢߩݙᶅͷରʣ എܠ
ࣗಈϚΠχϯά खಈϚΠχϯά Fig. 2.3: Overview of one possible telemetry system. -
23 ɾΞϓϦέʔγϣϯͷϫʔΫϩʔυɺ͓Αͼɺίϯϙʔωϯτͷ૿େ ɾΑΓਫ਼៛ͳγεςϜཧղͷͨΊͷςϨϝτϦʔσʔλͷࡉཻԽ ςϨϝτϦʔϫʔΫϩʔυͷ૿େ എܠ ܭଌ ϚΠχϯά ɾܭଌͷసૹɾूॲཧʹ ཁ͢ΔϦιʔεফඅͷ૿େ ɾΞϓϦέʔγϣϯͷॲཧ
Ԇ૿େ ܭଌɾૹ৴ॲཧྔͷ૿େ ετϨʔδ σʔλऔΓࠐΈྔͷ૿େ ɾॻ͖ࠐΈॲཧͷϦιʔε ফඅͷ૿େ ɾσΟεΫอଘྖҬͷ૿େ ɾಡΈࠐΈॲཧͷϦιʔε ফඅͱԆͷ૿େ ֶशॲཧྔͷ૿େ ɾϞσϧग़ྗͷਫ਼Լ ɾֶशॲཧͷ࣮ߦ࣌ؒͱ Ϧιʔεফඅྔͷ૿େ ཁҼ -
24 ςϨϝτϦʔγεςϜ͕ͨΒ͢ӡ༻ͷෳࡶ͞ ల։༰қੑ ϝϯςφϯε༰қੑ ɾαʔϏεࣄۀऀΞϓϦέʔγϣϯʹՃ͑ͯςϨϝτϦʔγεςϜӡ༻ ͢Δඞཁ͕͋Δɻ ɾӡ༻ෳࡶੑΛ͑Δ͜ͱ࣮༻ԽͷͨΊʹॏཁͰ͋Δɻ ܭଌ ϚΠχϯά ετϨʔδ
खಈʹΑΔܭ࡞ۀ DBγεςϜͷߏஙɺઃఆɺνϡʔ χϯάɺόοΫΞοϓͷ࡞ۀෛ୲ σʔληοτͷखಈϥϕϦϯά Ϟσϧͷύϥϝʔλνϡʔχϯά σʔλಛੑͷมԽʹΑΔਫ਼ ԼͷରԠʢ࠶ֶशɾ࠶νϡʔχϯ άͳͲʣ ܭݩͷίʔυมߋͷै ن֦ுͷ࡞ۀɺόʔδϣϯ Ξοϓɺ࠶νϡʔχϯά എܠ -
༧උ৹ࠪࢦఠࣄ߲ 26 ݚڀత ར༻ऀ ʢҰൠͷফඅऀ اۀͷ୲ऀͳͲʣ Ϋϥυ ΦϯϥΠϯαʔϏεࣄۀऀ ΫϥυαʔϏεࣄۀऀ ΞϓϦέʔγϣϯ
ΦϖϨʔλʔ͕ςϨϝτϦʔΛհͯ͠ɺ γεςϜΛਫ਼៛ʹѲՄೳ ΦϖϨʔλʔ ςϨϝτϦʔϫʔΫϩʔ υ͕ফඅ͢Δܭࢉػࢿݯ ͷར༻ޮԽ ͍ӡ༻ෳࡶੑʹΑΓ ਓతࢿݯͷޮԽ ৴པੑͷ্ʹΑΓ շదʹαʔϏεΛར ༻Մೳ ཱ྆ -
27 ݚڀඪ ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ ݚڀత ϫʔΫϩʔυ ςϨϝτϦʔϫʔΫϩʔυͷ૿େʹ
ର֤ͯ͠͝ͱʹޮతʹεέʔϦ ϯά͢Δٕज़ΛఏҊ͢Δɻ ӡ༻ෳࡶੑͷ૿ՃΛ͑Δ݅ԼͰ Ϧ ι ʛ ε ফ අ ྔ ॲ ཧ Ԇ -
28 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented
ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత -
29 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented
ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ Mining ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ ςϨϝτϦʔ ϫʔΫϩʔυͷ૿େ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత OSΧʔωϧͷޮతू ʹΑΔܭ๏ -
30 ຊݚڀΛ၆ᛌͨ͠ਤ (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented
ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ Mining ΦϖϨʔλʔ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ εέʔϦϯάٕज़ ͷఏҊ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత OSΧʔωϧͷޮతू ʹΑΔܭ๏ -
31 (Chapter 3) Path-oriented Time-oriented ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ Y.
Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, Mar 2022. ӡ༻ෳࡶੑ ܭͷͨΊͷΞϓϦέʔγϣϯ ίʔυͷमਖ਼Λෆཁͱ͢Δ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత OSΧʔωϧͷޮతू ʹΑΔܭ๏ ςϨϝτϦʔγεςϜ -
32 (Chapter 4) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ
औΓࠐΈෛՙͷ૿େ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ ௶༎थ, ࡔேਓ, ᖛా݈, দխ, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछࢄKVSؒͷࣗಈ ֊ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ใॲཧֶձจࢽ, Vol.62, No.3, pp.818-828, 20213݄. ӡ༻ෳࡶੑ ݚڀత ࣝɾ࣮ͷྲྀ༻ੑ ͷߴ͍ଟతͷDBγ εςϜͷൣғͰղܾ -
33 (Chapter 3) (Chapter 5) Path-oriented Time-oriented ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ
OSΧʔωϧͷޮతू ʹΑΔτϨʔγϯάͷܭ๏ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత ӡ༻ෳࡶੑ ϥϕϦϯάͱϞσϧͷ܇࿅͕ෆཁͳ ڭࢣͳֶ͠शͷΈͰղܾɻ ύϥϝʔλͷมԽʹରͯ͠ؤڧͳઃܭ ͱ͠ɺνϡʔχϯάͷෛ୲Λݮɻ ܭଌ ςϨϝτϦʔγεςϜ -
35 (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ ܭଌ ετϨʔδ
ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, Mar 2022. ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ (Chapter 3) -
36 ωοτϫʔΫίʔϧάϥϑ എܠ ैདྷखಈͰͷ࡞ਤ͕ඞཁͰ͋ͬͨ ͕ɺ࠷ۙͰPath-oriented dataΛجʹ ࣗಈԽ͞Εͭͭ͋Δɻ Cloud Load Balancers
Database Clusters Web app servers Message queues ֤ίϯϙʔωϯτͷݺͼग़ؔ͠ ΛΓ͍ͨɻ L7: ϦΫΤετ,Τϥʔ,Ԡ࣌ؒ… L4: ૹ৴ɾड৴Bytes/s, RTT, … - มߋͷӨڹൣғΛΓ͍ͨɻ - ϦϯΫ୯ҐͷϝτϦΫεΛΓ͍ͨɻ -
37 Path-oriented dataͷܭΞϓϩʔν طଘख๏ Kernel User Proxy Network Stack App
NIC Application-intrusive ΞϓϦέʔγϣϯίʔυʹܭ͢Δɻ Application-non-intrusive ΞϓϦέʔγϣϯҎ֎ͷՕॴʹܭɻ Switch ωοτϫʔΫ௨৴ܦ࿏্ͷ͍ͣΕ͔ʹܭଌΛઃஔ͢Δɻ རɿΞϓϦͷίϯςΩετΛೖՄɻ ܽɿίʔυՃͷ࿑ྗ͕େ͖͍ɻ རͱܽApp-intrusiveͱٯɻ Χʔωϧͷ্ҐʢιέοτʣͰͷܭʹணɻ ରProxy: தܧΦʔόʔϔου͕ͳ͍ɻ ରSwitch: ܭଌෛՙΛΤϯυϗετʹࢄՄೳɻ -
ιέοτʹ͓͚Δܭख๏ Kernel User Service Agent ετϦʔϛϯά๏ ϑϩʔू๏ ϑϩʔूଋ๏ʢఏҊʣ ✗ ϝοηʔδ૿ՃʹԠ
ͯ͡ɺϢʔβۭؒͷܭ ଌͷసૹ͕૿Ճɻ ✗ ໋ͳϑϩʔ͕૿Ճ͢Δͱɺ సૹσʔλ૿Ճɻ Ѽઌ͕ಉҰͷϑϩʔΛ ଋͶΔɻ ※ ϑϩʔ = ྆ͷΞυϨεͱϙʔτͷ͕ಉҰͷ௨৴୯Ґ ݚڀͷҐஔ ͚ͮ Queue ܭଌ Kernel User Service Agent ܭଌ ※ ҹσʔλͷྲྀΕΛද͢ ✔ ϑϩʔ͝ͱʹू͞Εͨܭ ଌͷΈอଘɻసૹσʔλ Λݮɻ Flow1 Flow2 Flow3 Flow4 Kernel User Service Agent ܭଌ ✔ ໋ͳϑϩʔ͕ଟ͘ ͱసૹσʔλΛݮ Bundle 1 Bundle 2 ✔ ܭଌΦʔόʔϔου ͕খ͍͞ ([96,97]) ([27,98]) -
39 ߩݙᶃͷ֓ཁ 1. ໋ͳϑϩʔ͕ଟ͍ڥʹ͓͍ͯɺܭଌΦʔόʔϔουΛݮͤ͞Δ Χʔωϧϑϩʔूଋ๏ΛఏҊ͢Δɻ 2. ϑϩʔ͕૿େͨ͠ͱͯ͠ɺܭଌΦʔόʔϔουʢCPUෛՙʣ͕ेʹ খ͘͞ͳΔ͜ͱΛݕূͨ͠ɻ طଘख๏ʹෆརͳڥ Web
App Servers DB Server PHPΞϓϦέʔγϣϯͰɺϦιʔεͷ ཚ༻Λ͙ͨΊʹDBͷӬଓతͳଓ ͕ਪ͞Εͳ͍͜ͱ͕͋Δ[101] ղܾ ϑϩʔ͕࣋ଓ͞Εͣɺ໋ͳϑϩʔ͕૿େ͢Δɻ Connections ߩݙ -
40 ϑϩʔͷूଋͷ֓೦ ΫϥΠΞϯτ αʔό ఏҊख๏ 53421 32346 48901 Service Service
Listen port 80 Ephemeral port Flow 1 Flow 2 Flow N Service Service 80 1ຊͷଋͶΒΕͨϑϩʔͱΈͳ͢ -
41 ΧʔωϧͰͷҟͳΔϑϩʔͷूଋ ఏҊख๏ ϑϩʔूଋ๏ʢఏҊʣ Kernel User Service Agent NIC ܭଌ
Bundle 1 Bundle 2 "src_ip": "192.168.1.101", "src_port": 53421, "dst_ip": "192.168.1.200", “dst_port": 80, “recv_bytes”: 2000, “send_bytes”: 500, "src_ip": "192.168.1.101", "src_port": 61390, "dst_ip": "192.168.1.200", "dst_port": 80, “recv_bytes": 1000, “sent_bytes”: 100, Flow 1 Flow 2 Bundle 1 "src_ip": "192.168.1.101", "dst_ip": "192.168.1.200", “dst_port": 80, “recv_bytes”: 3000, “sent_bytes”: 600, Ephemeral portΛ আͯ͠Ϛʔδ σʔλ౷ܭॲཧ͞ΕΔ ʢྫͰ૯ΛͱΔʣ -
42 ࣮ɿུ֓ਤ Hash map Kernel User Service Socket Layer Agent
tcp_v4_connect() inet_csk_accept() tcp_sendmsg() tcp_cleanup_rbuf() ʢUDPলུʣ ఏҊख๏ {src_addr, dst_addr, listen_port, proto, pid} NIC Keys Values {counts, recv_bytes, send_bytes, …} System Call ܭଌϓϩάϥϜ1 ܭଌϓϩάϥϜ2 ܭଌϓϩάϥϜ3 ܭଌϓϩάϥϜ4 LinuxͷkprobeͰΧʔωϧ ؔʹΞλον͢Δ Linuxͷ extended Barkley Packet Filter (eBPF) Λ༻͍ͯΧʔωϧΛ֦ுΛ͢Δɻ MapߏମΛߋ৽ όονૢ࡞ʹΑΓෳΞΠ ςϜΛఆظతʹऔಘɾআ -
43 ࣮ɿΧʔωϧͷฒߦ੍ޚ ఏҊख๏ ༧උ৹ࠪࢦఠࣄ߲ ֤ϝϞϦྖҬͷอޢͷͨΊɺΦʔόʔϔου͕খ͍͞ಉظػߏΛ͏ɻ ܭଌϓϩάϥϜ Hash Map eBPFཧྖҬ ΧʔωϧཧྖҬ
ΤϯτϦͷͷߋ৽ ΞτϛοΫ໋ྩͷ༻ ʢϑΣον໋ྩͱՃࢉ໋ྩʣ ૈཻʢϚοϓશମʣ ͷεϐϯϩοΫ Agent Φʔόϔου࣮ݧͰेখ͍͜͞ͱΛ֬ ೝࡁΈ͕ͩɺCPUίΞ͕ଟ͍ڥͰແࢹ Ͱ͖ͳ͘ͳΔՄೳੑ͋Γɻ ϚοϓΤϯτϦͷૠೖ ࡉཻʢόέοτ୯ҐʣͰ εϐϯϩοΫ Χʔωϧؔ ιέοτߏମͳͲΛ ಡΈऔΔ͚ͩͰɺϩο Ϋ͠ͳ͍ɻ ※ ܭଌϓϩάϥϜ ܭଌϓϩάϥϜ -
44 ධՁͷઃఆ ධՁ ϕϯνϚʔΫ ϕʔεϥΠϯ ධՁ߲ Client Server Agent Agent
ɾ ΤίʔΫϥΠΞϯτɾαʔόʹΑΓTCP·ͨ UDPͷ௨৴ෛՙΛൃੜͤ͞Δɻ ɾ Ұճͷࢼߦ30ඵɺόονऔಘස1ඵ ɾ ΧʔωϧͷιέοτΛରͱͨ͠طଘͷܭख๏ ɾ ετϦʔϛϯά๏ ɾ Χʔωϧू๏ 1. ໋ϑϩʔͷ૿େʹର͢ΔCPUෛՙͷൺֱ 2. 1ରNͷ௨৴ڥʹ͓͚ΔCPUෛՙͷൺֱ 3. ΞϓϦέʔγϣϯͷRTTΦʔόʔϔου -
45 1. ໋ͳTCPϑϩʔͷ૿େʹର͢ΔCPUෛՙͷൺֱ ఏҊख๏ ɾ2.2%ҎԼͷCPUར༻Λҡ࣋ɻ ධՁ ετϦʔϛϯά๏ ࠷େ21.3%·ͰCPUར༻͕૿Ճɻ Χʔωϧू๏ ࠷େ11.5%·ͰCPUར༻͕૿Ճɻ
UDPϝοηʔδϨʔτ͕૿େ͢Δ࣮ݧʹͭ ͍ͯྨࣅͷ݁Ռ͕ಘΒΕͨɻ -
46 2. ௨৴ઌͷݸΛ૿Ճͨ࣌͠ͷCPUෛՙ ҟͳΔͪड͚ϙʔτΛͭ௨৴ઌ͕૿͑Δͱɺूଋ͕Լ͢Δɻ ↪ ఏҊख๏ͷCPUෛՙ͕૿Ճ͢Δͣ…ʁ ूଋ : ଋͶΒΕΔϑϩʔ :
߹ܭϑϩʔ R = 1 − B/T B T ධՁ R=0.90 R=0.94 R=0.98 ௨৴ઌͷͰ ܾ·Δ ݻఆ T = 10k αʔϏεʢ௨৴ઌʣͷ૿Ճʹର͠ ͯɺCPUར༻2%ҎԼΛҡ࣋ͨ͠ɻ ·Ͱ૿Ճͤ͞ΔͱR=0ͱͳΓɺ طଘख๏ͷ༏Ґੑͳ͘ͳΔɻ T = 100k -
47 3. ܭଌॲཧ͕༩͑ΔԆΦʔόϔουͷൺֱ TCP໋ଓ UDP RTT 300μs ʹରͯ͠ɺఏҊख๏ͷΦʔόϔου࠷େͰ 5.8 μsɻ
ແܭͱൺɺߴʑ2%ͷΦʔόϔου૿Ճʹཹ·Δɻ ධՁ ετϦʔϛϯά๏͕ ࠷খͷRTTΛࣔͨ͠ɻ -
48 ୈ̎෦ ߩݙᶃ ·ͱΊ ·ͱΊ (Chapter 3) Path-oriented ςϨϝτϦʔγεςϜ ܭଌ
ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔܭ๏ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ධՁɿ໋ϑϩʔͷ૿Ճʹରͯ͠ɺఏҊ๏ 2.2%ҎԼͷCPUར༻Λҡ࣋ͨ͠ɻ ແܭঢ়ଶʹରͯ͠RTTΦʔόʔϔουߴʑ 2%૿Ճʹཹ·ͬͨɻ ༻్ɿωοτϫʔΫίʔϧάϥϑΛܧଓతʹࣗ ಈߏங͢Δɻ -
50 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ
ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔτϨʔγϯάͷܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ ௶༎थ, ࡔேਓ, ᖛా݈, দխ, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछࢄKVSؒͷࣗಈ ֊ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ใॲཧֶձจࢽ, Vol.62, No.3, pp.818-828, 20213݄. ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ݚڀత -
52 ϝτϦΫεͷऔΓࠐΈϫʔΫϩʔυྔɺ̎ͭͷ࣍ݩʹൺྫ͢Δ ϝτϦΫεετϨʔδͷϫʔΫϩʔυ ࣌ؒ cpu_seconds{instance=host1,…} memory_total_bytes{instance=host1,…} http_requests_count{instance=host1,…} http_requests_count{instance=host99,…} എܠ ᶄ
ϝ τ Ϧ Ϋ ε ͷ ݸ ᶃ ղ૾ (Ұൠʹ1 ~ 60ඵͷൣғ) cpu_seconds{instance=host1,…} cpu_seconds{instance=host1,mode=user,core_no=1,…} cpu_seconds{instance=host1,mode=system,core_no=1,…} cpu_seconds{instance=host1,mode=user,core_no=2,…} ༁ͷࡉཻԽʹΑΔݸ૿Ճ ղ -
53 ϝτϦΫεετϨʔδͷεέʔϥϏϦςΟཁٻ औΓࠐΈॲཧεϧʔϓοτ ετϨʔδ༰ྔ σʔλѹॖٕज़هԱίετͷ͍ ϝσΟΞͷظอଘʢSSD/HDDʣ എܠ ɾਫฏׂ͞ΕͨෳϊʔυͰͷऔΓࠐΈ ɾϝϞϦ্ͷσʔλߏͷޮతͳॻ͖ ࠐΈ
Ұൠతͳղܾ๏ Slack 12M datapoints / sec Meta 700M datapoints / min LYCorp 12.5M datapoints / min [19] [32] [112] Slack 12 TB / day ByteDance 10 TB/ day LYCorp 2.7 TB / day Mackerel 460 days [19] [35] [69] [108] Ұൠతͳղܾ๏ -
54 طଘख๏ͷྨ ࣌ܥྻDBཧγεςϜํࣜ ʢTSDBMSʣ Client DBMS ؔ࿈ݚڀ ࣌ܥྻσʔλࢦΞϓϦέʔγϣϯํࣜ ʢTSDAʣ App
DBMS Client ଟతͳDBγεςϜͰ͋ΔKVSͷ্ʹߏ ங͞ΕΔɻ (OpenTSDB, KairosDB) KVS: ΩʔͱͷϖΞͷू߹ͱͯ͠ σʔλΛอଘɺݕࡧɺཧՄೳͳ DBMSɻ Transaction Transaction ࣌ܥྻσʔλॲཧʹ࠷దԽ͞ΕͨDBMSɻ λΠϜελϯϓͷִؒੑɺͷ࣌ ؒతۙੑʹணͨ͠ූ߸Խɻ ѹॖ ߏ σΟεΫϕʔεKVSͰ༻͍ΒΕΔLSMπ ϦʔΛجʹ࣌ܥྻߏʹ࠷దԽɻݻఆ ͷ࣌ؒ͝ͱʹϑΝΠϧཧ͞ΕΔɻ (Prometheus, Gorilla, InfluxDBͳͲ) [31,33,35,79] [29,30] -
55 طଘख๏ͷྨ ࣌ܥྻDBཧγεςϜํࣜ ʢTSDBMSʣ DBMS ؔ࿈ݚڀ ࣌ܥྻσʔλࢦΞϓϦέʔγϣϯํࣜ ʢTSDAʣ App DBMS
Client ଟతͳDBγεςϜͰ͋ΔKVSͷ্ʹߏ ங͞ΕΔɻ (OpenTSDB, KairosDB) KVS: ΩʔͱͷϖΞͷू߹ͱͯ͠ σʔλΛอଘɺݕࡧɺཧՄೳͳ DBMSɻ Transaction ࣌ܥྻσʔλॲཧʹ࠷దԽ͞ΕͨDBMSɻ λΠϜελϯϓͷִؒੑɺͷ࣌ ؒతۙੑʹணͨ͠ූ߸Խɻ ѹॖ ߏ σΟεΫϕʔεKVSͰ༻͍ΒΕΔLSMπ ϦʔΛجʹ࣌ܥྻߏʹ࠷దԽɻݻఆ ͷ࣌ؒ͝ͱʹϑΝΠϧཧ͞ΕΔɻ (Prometheus, Gorilla, InfluxDBͳͲ) • KVS͘ར༻͞Ε͍ͯΔɻ • DBӡ༻ΛࣗಈԽ͢ΔͨΊͷ”DB as a Service”ͱͯ͠KVSαʔϏε ͕͘ఏڙ͞Ε͍ͯΔɻ ӡ༻ෳࡶੑΛߟྀ͠ɺ TSDAํࣜʹண TSDAํࣜૄ݁߹ੑ͕͋Δͨ Ίɺར༻ऀʹDBMS࣮ͷબ ࢶΛఏڙՄೳɻ -
56 KVSͷऔΓࠐΈޮ ϝϞϦϕʔεKVS ϝϞϦϥϯμϜΞΫ ηεޮʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ ؔ࿈ݚڀ σΟεΫϕʔεKVS ϝτϦΫε͕૿େ͢Δ =
KVSͷΩʔ͕૿େ͢Δ ↳ σʔλΛՃ͢Δ࣌ͷΠϯσοΫεࢀরޮ͕ͱͳΔ Memory Disk ฏߧɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ ιʔτࡁΈͷͨ ΊσΟεΫΞΫ ηεޮ͕ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹσʔλ Λอ࣋͠ͳ͍ɻ ʢίϛοτϩάΛআ͘ʣ Disk File -
57 KVSͷऔΓࠐΈޮ ϝϞϦϕʔεKVS ϝϞϦϥϯμϜΞΫ ηεޮʹ༏ΕΔͨ ΊɺϋογϡදΛ࠾༻ ؔ࿈ݚڀ σΟεΫϕʔεKVS ϝτϦΫε͕૿େ͢Δ =
KVSͷΩʔ͕૿େ͢Δ ↳ σʔλΛՃ͢Δ࣌ͷΠϯσοΫεࢀরޮ͕ͱͳΔ Memory Disk ฏߧɾεΩο ϓϦετͳͲͷ ιʔτࡁΈߏ ιʔτ͞Ε͍ͯ ΔͨΊσΟεΫ ΞΫηεޮ͕ ߴ͍ O(logn) ॻ͖ࠐΈ Flush ॻ͖ࠐΈ Memory O(k) σΟεΫ্ʹσʔλ Λอ࣋͠ͳ͍ɻ ʢίϛοτϩάΛআ͘ʣ Disk ✘ ϝϞϦهԱྔ͋ͨΓͷඅ༻͕େ ͖͍ͨΊɺظอ࣋ʹෆ͖ɻ ✘ Ωʔ͕େ͖͍࣌ʹɺσʔλͷॻ͖ ࠐΈޮ͕Լ͢Δɻ -
58 ߩݙᶄͷ·ͱΊ औΓࠐΈॲཧޮͱظอଘͷཱ྆ ࣌ܥྻσʔλࢦΞϓϦέʔγϣϯʢTSDAʣ ࣌ܥྻDBཧ γεςϜ ʢTSDBMSʣ σΟεΫϕʔε ఏҊख๏ ӡ༻
ෳࡶੑ औΓࠐΈ ޮ ετϨʔδ ༰ྔ ࣌ܥྻѹॖͳͲ ࣌ܥྻσʔλ อଘʹ࠷దԽ ૄ݁߹ੑແ͠ SSD/HDDอଘ σΟεΫΞΫη εޮΛߟྀ ͨ͠ߏ ϥϯμϜΞΫηεޮʹ༏Εͨ ϝϞϦʹ࠷దԽ ݹ͍σʔλͷΈ SSD/HDDอଘ ૄ݁߹ੑ༗Γ ϝϞϦϕʔε ϝϞϦอଘ ߩݙ ɾӡ༻ෳࡶੑͷ͍TSDAํࣜͰɺϝϞϦɾσΟεΫϕʔεͷ֤ಛੑΛ ྆औΓ͢ΔΞʔΩςΫνϟΛઃܭͨ͠ɻ ɾσΟεΫϕʔεͷํࣜͱൺֱ͠ɺ3.98ഒͷऔΓࠐΈੑೳΛୡͨ͠ɻ ߩݙ -
59 ఏҊख๏ HeteroTSDB Client ఏҊख๏ ϝϞϦϕʔεKVS σΟεΫϕʔεKVS App Flusher ۙͷλΠϜελϯϓΛͭσʔ
λ͕֨ೲ͞ΕΔϝϞϦόοϑΝ ϋογϡදʹجͮ͘ߴऔΓࠐΈ ݹ͍λΠϜελϯϓΛͭσʔλ͕ ֨ೲ͞ΕΔσΟεΫετϨʔδ SSD/HDDʹอଘ͢Δ͜ͱʹΑΔ ظอ࣋ίετͷԼ σʔλͷϚΠά Ϩʔγϣϯ ཱ྆ -
60 ϝϞϦϕʔεKVSͱσΟεΫϕʔεKVSͷ֊Խ ϝϞϦϕʔεKVS ϋογϡද O(k) ౸ண M (ingestions/s) cpu_seconds{…} memory_total_bytes{…}
http_requests_count{…} dݸ Lookup Insert σΟεΫϕʔεKVS ฏߧɾεΩοϓϦετ O(logn) dݸͷσʔλΛόονॻ͖ࠐΈ ʹΑΓɺLookupճΛݮ M / d (ingestions/s) cpu_seconds{…} Lookup memory_total_bytes{…} http_requests_count{…} ఏҊख๏ -
61 λΠϚʔʹجͮ͘ϚΠάϨʔγϣϯ ϝϞϦϕʔεKVS σΟεΫϕʔεKVS cpu_seconds{…} cpu_seconds{…} memory_total_bytes{…} http_requests_count{…} memory_total_bytes{…} http_requests_count{…}
3511 934 298 TTL ɾΩʔ͝ͱʹTTLʢTime To LiveʣΛઃఆ͠ɺTTL͕0ʹͳΕҠಈͤ͞Δ ɾTTLηοτ࣌ʹδολʔΛՃ͑ɺҠಈͷλΠϛϯάΛࢄͤ͞Δ όονॲཧʹΑΔσʔλҠಈɺσΟεΫϕʔεKVSͷऔΓࠐΈෛՙ͕ภΔ ఏҊख๏ ʢྫɿ3600ඵʣ -
62 ɾ طଘͷෛՙੜπʔϧ[113]Λ༻͍ͯɺෛՙΛ࠶ݱ͢Δɻ ɾ 1ճͷࢼߦΛ30ͱ͠ɺఏҊख๏ͷTTLΛ10ͱ͢Δɻ ධՁͷઃఆ ධՁ DB servers Load
generation client ϕϯνϚʔΫ ϕʔεϥΠϯ ධՁ߲ ɾ TSDAํࣜΛͱΔKairosDBΛൺֱରͱ͢Δɻ ɾ KairosDBσΟεΫϕʔεKVSͷCassandraΛ༻͍Δɻ 1. औΓࠐΈॲཧޮͷൺֱ 2. ϝτϦΫεͷ૿Ճʹର͢ΔऔΓࠐΈॲཧޮͷൺֱ 3. ఏҊख๏ͷKVSؒϚΠάϨʔγϣϯੑೳͷ֬ೝ ϝϞϦKVS: Redis σΟεΫKVS: Cassandra ఏҊख๏ -
63 ̍. औΓࠐΈॲཧޮͷൺֱ ධՁ ϗετʢ1~8ʣ औ Γ ࠐ Έ ε
ϧ ʛ ϓ ο τ ఏҊख๏ʢHeteroTSDBʣ͕ ϕʔεϥΠϯͷ3.98ഒɻ 420k datapoints/s ੨ɿKairosDB ᒵɿఏҊख๏ Slackࣾͷ12 m/s ͷϫʔΫϩʔυ ʹஔ͖͑Δͱ - ఏҊख๏229ݸ - KairosDB915ݸ ͷϗετΛඞཁͱ͢Δܭࢉʹͳ Δɻ ϝτϦΫεΛ1Mʹݻఆ -
ຊ࣮ݧͰɺ໌֬ʹΠϯσοΫεࢀর ͕ϘτϧωοΫͰ͋ΔͱಛఆͰ͖ͯ ͍ͳ͍ɻ ࠓޙɺՃͷৄࡉͳϓϩϑΝΠϦϯά ͕ඞཁͰ͋Δɻ 64 ̎. ϝτϦΫεͷ૿Ճʹର͢ΔऔΓࠐΈॲཧޮͷൺֱ ධՁ औ
Γ ࠐ Έ ε ϧ ʛ ϓ ο τ ϝτϦΫεʢ100~1,000,000) ੨ɿKairosDB ᒵɿఏҊख๏ 2.32ഒ 3.58ഒ ϝτϦΫε૿ՃͷεέʔϥϏϦςΟ ϕʔεϥΠϯΑΓߴ͍ɻ σʔλͷશମૹ৴Ϩʔτݻఆ -
65 3. ఏҊख๏ͷKVSؒϚΠάϨʔγϣϯੑೳͷ֬ೝ ධՁ औ Γ ࠐ Έ ε ϧ
ʛ ϓ ο τ ܦա࣌ؒʢ0~1800ඵ) ੨ɿҠಈεϧʔϓοτ /s ɿϝϞϦϕʔεKVSͷϝϞϦ ༻ྔ (MB) ϝ Ϟ Ϧ ༻ ྔ TTLͷشൃ ʮฏۉҠಈεϧʔϓοτʢ52k / sʣʯ > ʮϝϞϦKVSͷऔΓࠐΈεϧʔϓο τʢ51k/sʣ ʯ Ҡಈ͕։࢝͞ΕΔͱɺ Ҡಈεϧʔϓοτ͕ଈ࠲ʹ૿Ճ͠ɺ ϝϞϦKVSͷϝϞϦ༻ྔ͕ݮগ͢Δɻ σΟεΫKVS͕ϘτϧωοΫͱͳ͍ͬͯ ͳ͍͜ͱΛࣔ͢ ϗετΛ̍ʹݻఆ ϝτϦΫεΛ1Mݸ ʹݻఆ -
66 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ
ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔτϨʔγϯάͷܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ͷࣗಈͰݮ͢Δલॲཧ๏ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ·ͱΊ औΓࠐΈॲཧޮͱ̍Ҏ ্ͷظσʔλอ࣋Λཱ྆ ϝτϦΫε100͔Β100ສ ݸͷൣғͰϕʔεϥΠϯʹର ͢ΔεέʔϥϏϦςΟ্ 100ສݸͷϝτϦΫεͷऔΓ ࠐΈ࣌ʹɺϕʔεϥΠϯʹର ͯ͠3.98ഒͷੑೳ্ ධՁᶃ ධՁᶄ ӡ༻ෳࡶੑΛߟྀ͠ɺ طଘͷKVS্ʹఏҊ๏Λ ࣮ݱ͢Δɻ త ୈ̏෦ ߩݙᶄ ·ͱΊ -
68 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦγεςϜ
ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔτϨʔγϯάͷܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ΛࣗಈͰݮ͢Δલॲཧ๏ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ -
ϝτϦΫε ΦϖϨʔλʔ 69 ػցֶशʹΑΔނোಛఆͷࣗಈԽ ࣗಈނোಛఆ എܠ োݕ ετϨʔδ 2. ೖྗ
3. ग़ྗ 1. ىಈ ݪҼΛࣔ͢ϝτϦΫε ͷϥϯΩϯά 1. memory_total_bytes{instance=host4,…} 2. disk_write_io{instance=host4,…} 3. net_transmit_bytes{instance=host1,…} 4. … [94,96,124-136] ظ͞ΕΔ࣮ߦ࣌ؒ εέʔϧ -
ϝτϦΫε ΦϖϨʔλʔ 70 ػցֶशʹΑΔނোಛఆͷࣗಈԽ ࣗಈނোಛఆ എܠ োݕ ετϨʔδ 2. ೖྗ
3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ɾϝτϦΫεͱࠜຊݪҼͷϖΞΛେྔʹ ؚΉσʔληοτ͕ͳ͍ɻ ɾओʹڭࢣͳֶ͠श͕࠾༻͞ΕΔɻ ɾϝτϦΫε͝ͱʹҟৗΛࢉग़ɻ ɾϝτϦΫεؒͷҟৗൖΛัଊɻ [94,96,124-136] -
ϝτϦΫε ͕૿େ ΦϖϨʔλʔ 71 ނোಛఆʹ͓͚ΔੑೳԼͷ ࣗಈނোಛఆ എܠ োݕ ετϨʔδ 2.
ೖྗ 3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ϝτϦΫεͷͷ૿େʹΑΓɺਫ਼ ͱ࣮ߦ͕࣌ؒԼ͢Δɻ[23,24] [94,96,124-136] -
ϝτϦΫε ͕૿େ ΦϖϨʔλʔ 72 ނোಛఆʹ͓͚ΔੑೳԼͷ ࣗಈނোಛఆ എܠ োݕ ετϨʔδ 2.
ೖྗ 3. ग़ྗ 1. ىಈ ϥϯΫ 1. … 2. … 3. … ػցֶश ಛྔݮ ϊΠζͱͳΔϝτϦΫε ΛऔΓআ͘ ϝτϦΫεͷ૿େʹΑΓɺਫ਼ͱ ࣮ߦ͕࣌ؒԼ͢Δɻ [23,24] [23,87] [94,96,124-136] -
73 ಛྔݮͷఆٛʢOursʣ Fig. 5.2: Three types of metrics on anomaly
propagation for a failure. ނোʢFaultʣൃੜޙɺϝτϦΫεཻͰͷҟ ৗͷൖϞσϧ োΛݕͨ͠ΒɺͰ͖ΔݶΓૣ͘ɺ Λಛఆ͢Δ͜ͱɻ MA ∪ MB എܠ ɿతʹӨڹ͕ݱΕͨϝτϦΫε ɿؒతʹӨڹ͕ݱΕͨϝτϦΫε ɿແӨڹͷϝτϦΫε MA MB MC ࠜຊݪҼ ͨͩ͠ɺোݕޙ͔Βݻఆͷ࣌ؒൣғ·Ͱ Λೖྗͱ͢Δɻʢ௨ྫͰ30~60ʣ -
74 طଘͷಛݮͱͦͷ՝ എܠ ҟৗੑʹجͮ͘ݮ ো࣌ؒ֎ͷҟৗΛݕ͠͏Δɻ ݪҼϝτϦΫεʢ ʣؒͰྨࣅ͢͠ ͍ͨΊɺޡআ͕ൃੜ͠͏Δɻ MA ҟৗ͕ແ͍࣌ܥྻΛআ
૬ؔੑܗঢ়ྨࣅੑͷߴ͍࣌ܥྻΛআ ੑʹجͮ͘ݮ ຊདྷআ͍ͨ࣌͠ܥྻ ʢِཅੑʣ ʢِӄੑʣ োظؒ [23,124,131] [87,129,133] -
75 طଘͷಛݮͱͦͷ՝ എܠ ҟৗੑʹجͮ͘ݮ ҟৗ͕ແ͍࣌ܥྻΛআ ૬ؔੑܗঢ়ྨࣅੑͷߴ͍࣌ܥྻͷॏෳ আ ੑʹجͮ͘ݮ ຊདྷআ͍ͨ࣌͠ܥྻ ʢِཅੑʣ
ʢِӄੑʣ োظؒ Ұ෦ͷϝτϦΫεʹݱΕΔҟৗੑɾੑͷΈΛѻ͏ɻ ہॴత େҬత γεςϜશମͷʮোʯͷؔ࿈ੑΛଊ͍͑ͨɻ -
76 ؍ͱԾఆ Fig. 5.1: Change points in root fault metric.ΑΓҰ෦ൈਮ
ނোൃੜ࣌ؒ ނোىҼͷมԽ ޓ͍ʹ͍ۙ࣌ؒʹݱΕΔ ؍ ہॴతͳಛ͔Β େҬతͳোΛ ଊ͑Δ มԽ͕࣌ؒ࠷ภΔൣғ͕ɺোظؒͱͳΔ Ծఆ എܠ -
77 ɾຊݚڀͰɺେҬతͳোΛଊ͑Δಛྔݮ๏ΛఏҊͨ͠ɻ ɾఏҊख๏࠷ྑͷਖ਼ղΛୡ͠ɺEnd-to-endͰͷਫ਼ͱ࣮ߦޮΛ্ͤͨ͞ɻ ߩݙͷ֓ཁ ߩݙ ख๏ छผ ֶशछผ େҬੑ FluxInfer-AD
BIRCH K-S test NSigma PairCorr k-Shape HDBS+SBD MetricSifter ҟৗੑ ੑ ڭࢣ͋Γ ʢਖ਼ৗظؒͷࢦఆʣ ڭࢣͳ͠ ҟৗੑ ڭࢣͳ͠ ✘ ✘ ✘ ✔ ଊ͑Δಛ มԽ ਖ਼ৗ - ҟৗظؒͷ ϢʔΫϦουڑ ܗঢ়ྨࣅੑ ͷมԽɾ֎Ε ϐΞιϯ૬ؔੑ ڭࢣͳ͠ ҟछͷಛྔݮ๏Λఆྔൺֱͨ͠ॳͷݚڀ -
79 ఏҊख๏ͲͷΑ͏ʹಈ࡞͢Δ͔ʁ Fig. 5.5: An example of feature reduction using
the MetricSifter framework. STEP 2: มԽ࣌ؒͷ ΛجʹηάϝϯτΛׂ STEP 1: ࣌ܥྻ͝ͱʹɺނো༝དྷͷ มԽީิΛݕग़ STEP 3: ࠷େີͷηάϝϯτΛબ ఏҊख๏ -
80 STEP 1: ୯มྔ࣌ܥྻͷมԽݕग़ ᶃ ίετؔɿݕग़͢ΔมԽͷछྨ มԽݕग़ͷطଘͷΈ[152]ͷ͏ͪɺຊυϝΠϯʹదͨ͠ͷΛબ͢Δɻ ᶄ ୳ࡧ๏ɿมԽͷ୳ࡧΞϧΰϦζϜ ᶅ
ϖφϧςΟ߲ɿݕग़͢ΔมԽͷʹ੍Λ͔͚Δ L2Ϟσϧ ʢฏۉγϑτʣ Pelt๏ɿݫີղΛٻΊΔ͕͖݅ͰࢬמΓߴԽ BICʹج͖ͮώϡʔϦεςΟοΫʹܾఆɻͨͩ͠ಠࣗͷዞҙతͳ ΛՃɻ ω ఏҊख๏ -
81 STEP 2/3: มԽͷີਪఆͱͷׂ Fig. 5.6: An example of segmentation.
ᶅ ࠷େີͷηάϝϯτΛબ ᶄ ηάϝϯςʔγϣϯ ہॴ࠷খʹڥքઢΛҾ͘ ʢਤ10ݸͷηάϝϯτʹׂʣ ఏҊख๏ ᶃ ີͷਪఆ Χʔωϧີਪఆ๏ʢKDEʣΛ༻ ͍ͯࢄܕͷີΛੜ -
82 ɾ߹ɿোͷγϛϡϨʔγϣϯ ɾ࣮ূɿ̎छྨͷఆ൪ධՁ༻ΞϓϦέʔγϣϯͷނোೖʹΑΔো࠶ݱ ධՁͷઃఆ ධՁ σʔληοτ ϕʔεϥΠϯ ධՁ߲ ධՁࢦඪ 1.
ಛྔݮ୯ҐͰͷਖ਼֬ੑ 2. End-to-endͷਫ਼ͱ࣮ߦ࣌ؒ ɾҟৗੑʹجͮ͘ݮͷάϧʔϓ ɾੑʹجͮ͘ݮͷάϧʔϓ 3. ύϥϝʔλͷහײੑͱAblation Study ɾಛྔݮɿྨͷఆ൪ධՁࢦඪʢRecall / Specifically / Balanced Accuracy) ɾ End-to-end: ϥϯΩϯάग़ྗʹਖ਼ղؚ͕·ΕΔׂ߹ʢఆ൪ࢦඪΛ࠾༻ʣ ʢ߹ܭ132ݸͷσʔληοτʣ -
84 ಛྔݮͱނোಛఆ๏ͷΈ߹ͤ ධՁ ࣗಈނোಛఆ ಛྔݮ ɾ ఏҊख๏ ɾ ҟৗੑʹجͮ͘ݮͷάϧʔϓ ɾ
ੑʹجͮ͘ݮͷάϧʔϓ ɾ None ɾ Random Selection ɾ CallGraph + PageRank ɾ PC + PageRank ɾ PC + HT ɾ LiNGAM + PageRank ɾ LiNGAM + HT ɾ RCD શͯͷΈ߹ ͤΛ࣮ݧɻ -
PC+HT ϥϯμϜબ 85 2: End-to-endͷධՁʢ߹ʣ Ұ෦ൈਮ ૯߹ධՁɹ ख๏ ਫ਼ උߟ
Ideal 0.344 ཧ MetricSifter 0.299 ࠷ྑ NSigma 0.241 ࣍ None 0.175 w /o ಛݮ શނোಛఆ๏ͱͷΈ߹ͤʹ ର͢Δtop-5ਫ਼ͷฏۉ ධՁ MetricSifter͕ ཧख๏ʹ ͍ۙਫ਼Λୡ தԝਫ਼ͷ ϥΠϯ -
86 2: End-to-endධՁ -small SS 64 metrics ശͻ͛ਤɿTop-5ਫ਼ ંΕઢɿ࣮ߦ࣌ؒ ධՁ
ʢ࣮ূʣ දతͳҰ෦ͷ Έ߹ͤΛܝࡌ ɾTop-5ਫ਼MetricSifter͕࠷ྑͰɺ࣮ ߦޮҟৗੑݮΑΓߴ͍ɻ ࣮ߦ࣌ؒੑݮʢHDBS-SBD/ HDBS-Rʣ͕࠷ྑ͕ͩਫ਼࠷͍ɻ தԝਫ਼ ͷϥΠϯ -
87 2: ࣮ূσʔλৄࡉʢେن >100 metricsʣ -medium SS -large SS -small
TT -medium TT 184 metrics 1312 383 1349 ಛఆͷނোಛఆ๏ʢRCDʣͷΈ͕ݱ࣮తͳ࣌ؒʢ3600ඵҎʣͰॲཧΛ ऴ͑ͨɻ ධՁ ଞɺނোಛఆΞϧΰϦζϜʹฒྻੑ͕ͳ͍ݱ࣮తͳ࣌ؒʹྃͤͣɻ ϝτϦΫε>1000Ͱɺ͍ͣΕͷέʔεʹ͓͍ͯ ඇৗʹ͍ਫ਼ͱͳͬͨɻ -
88 3: ύϥϝʔλͷහײੑͱAblation Study ධՁ ύϥϝʔλʔ͕దͰ͋Ε ਫ਼ࠩখ͍͞ɻ ߹ͷ͖Ε͍ͳσʔλͰɺ มԽݕग़ਫ਼͕ߴ͗͢Δͨ ΊͰ͋Δͱߟ͑Δɻ
STEP1ʢมԽݕग़ʣͷύϥϝʔ λ ͕͍ͱਖ਼֬ੑ͕Լɻ ω ͔͠͠ɺSTEP2/3ʹΑΓਫ਼ ্ɻ ੨ɿMetricSifter શ൛ ɿMetricSifter STEP1ͷΈ -
1. ࣌ܥྻσʔλ্ͷมԽͱͯ͠ݕग़ՄೳͰ͋Δ͜ͱ 2. γεςϜͰҟৗͷӨڹ͕ൖ͢Δ͜ͱʢਆܦܥɺిྗɺΠϧεײછɺؾͳͲʣ 3. ൖ͕࣌ؒ͋Δఔ͘ɺΒ͖͕ͭখ͍͜͞ͱ 89 ɾϩϘοτֶɿػց͔ΒͷηϯαʔσʔλੳʢԹɺৼಈɺిྲྀɺѹྗʣ ɾӉֶɿӴγεςϜͷࢹʢඦ ~
ઍͷมΛؚΉߴ࣍ݩσʔλʣ ɾҩྍɿױऀͷٸͳ༰ଶมԽݕग़ͷͨΊͷੜମ৴߸ͷੳ ٞɿԣஅͷద༻ੑ ༧උ৹ࠪࢦఠࣄ߲ [173] [174] [175] [140] ؾֶͰ͔Βϲ݄ͷൖ࣌ؒΛཁ͢ΔͨΊɺద༻Ͱ͖ͳ͍Մೳੑ͋Γ ใ௨৴Ҏ֎ͷͷಉܕͷ ఏҊख๏ͷద༻݅ -
90 (Chapter 3) (Chapter 4) (Chapter 5) Path-oriented Time-oriented ςϨϝτϦʔγεςϜ
ܭଌ ετϨʔδ ϚΠχϯά ΦϖϨʔλʔ OSΧʔωϧͷޮతू ʹΑΔτϨʔγϯάͷܭ๏ औΓࠐΈෛՙͷ૿େ ࣮ߦ࣌ؒ૿Ճͱਫ਼ͷԼ ϝτϦΫεͷݸͷ૿େ ϝϞϦͱσΟεΫDBͷ ֊Խ๏ͱ֊ؒҠߦ๏ োʹؔ࿈͠ͳ͍ϝτϦΫε ΛࣗಈͰݮ͢Δલॲཧ๏ ωοτϫʔΫଓϨʔτ૿େ ˠ ܭଌॲཧෛՙ૿େ ୈ̐෦ ߩݙᶅ ·ͱΊ ɾಛݮͷఆྔతͳൺֱධՁΛߦͬͨॳͷݚڀ ɾہॴతͳมԽͷू߹͔ΒେҬతͳোΛଊ͑Δख๏ ΛఏҊɻ ɾ߹ɿ࠷ྑͷਖ਼ղɻEnd-to-endਫ਼Λ24%্ɻ ɾ࣮ূɿEnd-to-endͰਫ਼ͱ࣮ߦޮͷ྆ํ·͍ͨ ͣΕ͔Λ্ɻ -
92 ૯ׅɿςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά ςϨϝτϦʔγεςϜ Ϋϥυ ΞϓϦέʔγϣϯ ΦϖϨʔλʔ Ϣʔβʔ Πϯλʔωοτ ܭଌ ετϨʔδ
ϚΠχϯά Ϧιʔεফඅ Ϧιʔεফඅ ϫʔΫϩʔυͷ૿େ ⾭ ⾭ ߩݙ ᶃ ΧʔωϧωοτϫʔΫϑ ϩʔͷूଋʹΑΔΦʔόʔ ϔουͳܭ๏ͷఏҊɻ ߩݙ ᶄ औΓࠐΈޮͱظอ࣋Λ ཱ྆ՄೳͳҟछKVSͷ֊ ԽΞʔΩςΫνϟͷఏҊɻ ʢैདྷൺ࠷େ3.98ഒͷεϧʔ ϓοτ্ʣ ߩݙ ᶅ োʹؔ࿈͢ΔϝτϦΫε ͷมԽͷूதੑʹண͠ ͨಛݮ๏ͷఏҊɻ ʢैདྷൺฏۉ+4.5%ͷਫ਼্ ฏۉ࣮ߦ࣌ؒ45-52%ͷ্ʣ ʢCPU༻2.2%ҎԼɺRTT Φʔόʔϔου࠷େ6μsʣ -
ΞϓϦέʔγϣϯ ܭଌ 93 جຊݪଇɿαϯϓϦϯάɾूɾಛݮͳͲͷσʔλݮɺίϯςΩετ ͕๛ͳՕॴʢܭɾϚΠχϯάʣͰద༻͢Δ͜ͱɻ ૯ׅɿςϨϝτϦʔγεςϜઃܭࢦ ςϨϝτϦʔγεςϜ ΦϖϨʔλʔ ετϨʔδ ϚΠχϯά
ϓϩηεɺιέοτɺτϥϯβΫ γϣϯͳͲɻ ߩݙᶃͰɺιέοτΛجʹूɻ ΞϓϦέʔγϣϯ ίϯςΩετ োΞϥʔτͳͲɻ ӡ༻ίϯςΩετ σʔλݮΛͤͣɺܭࢉ ࢿݯͷར༻ޮ্Λ ࢦ͢ɻ ߩݙᶅͰɺোൃੜΛ جʹಛݮɻ ༧උ৹ࠪࢦఠࣄ߲ -
94 ɾ ʮӡ༻ෳࡶੑΛ͑͘Δ͜ͱʯΛ੍݅ͱͯ͠ɺʮςϨϝτϦʔϫʔΫϩʔυ εέʔϦϯάʯͱݺͿΛຊݚڀಠࣗʹઃఆͨ͠ɻ ɾ ςϨϝτϦʔγεςϜΛ3ͭͷʹྨ͠ɺ֤ͷ՝Λཧ͠ɺͦΕΒΛղܾ͢ ΔͨΊͷٕज़ఏҊΛࣔͨ͠ɻ ૯ׅɿຊݚڀͷҙٛ ֶज़తߩݙ ࣾձతҙٛ
ɾ DX͕Ճ͢ΔதɺΦϯϥΠϯαʔϏεͷن͕֦ு͞ΕΔʹͭΕͯɺςϨϝτϦʔ γεςϜͷϫʔΫϩʔυ·͢·͢૿େ͢ΔͩΖ͏ɻ ɾ ༗ݶͷܭࢉػͱਓతࢿݯͷதͰɺӡ༻ෳࡶੑΛݮ্ͨ͠ͰͷςϨϝτϦʔϫʔΫ ϩʔυͷॲཧޮͷ্ඞཁͰ͋Δɻ ɾ ຊݚڀɺΦϖϨʔλʔͷ࿑ྗͷݮͱαʔϏεͷ৴པੑͷ্ʹد༩͢ΔͷͰ ͋Δͱߟ͑Δɻ -
95 ຊݚڀͷࣾձ࣮ ※3 https://github.com/ai4sre/metricsifter ※2 https://github.com/yuuki/go-conntracer-bpf ※1 https://mackerel.io/ja/blog/entry/weekly/20180126 ɾαʔόʔࢹSaaSͷΞʔΩςΫνϟͱͯ͠ద༻ࡁΈ ※2
※1 ※3 ※2 ͱ ※3 ࣮ڥͰͷ༻ྫ͕·ͩͳ ͍ͨΊɺࠓޙීٴ׆ಈΛߦ͏ɻ ܭଌɿߩݙᶃ ɾGoݴޠͷϥΠϒϥϦͱͯ͠ެ։ࡁΈ ɾPythonݴޠͷϥΠϒϥϦͱͯ͠ެ։ࡁΈ ɾݱ৬ʹͯಋೖΛݕ౼த ετϨʔδɿߩݙᶄ ϚΠχϯάɿߩݙᶅ -
96 ࠓޙͷల 1. Collect-First͔Β Use-First 2. LLMʹΑΔোཧ 3. ࢄਂֶशΠϯϑϥ ͷͨΊͷςϨϝτϦʔ
σʔλར༻ύλʔϯΛϑΟʔυόοΫ͠ɺඞཁͳσʔλͷΈ Λऩू͢ΔΑ͏ʹࣗಈదԠ͢ΔดϧʔϓγεςϜͷݚڀɻ LLMΛ׆༻ͨ͠ނোಛఆࣗಈԽʹ͍ͭͯɺϓϩϯϓτͷ্ ݶΛߟྀͨ͠ݮɾѹॖʹجͮ͘ʮোεφοϓγϣοτʯ ͷੜख๏ͷݚڀɻ GPUΛ༻͢ΔେنΫϥελʹ͓͍ͯɺࢄֶशϫʔΫ ϩʔυͷ࠷దԽোੑ্ͷͨΊͷ৽͍͠ςϨϝτϦγ εςϜͷݚڀɻ ςϨϝτϦʔ3ͷશମ࠷దԽ ৽ٕज़ʹ͓͚ΔϚΠχϯάͷ ϫʔΫϩʔυεέʔϦϯά ΫϥυΞϓϦέʔγϣϯ Ҏ֎ͷγεςϜ -
97 ݚڀۀɹड ɾ ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜ2020 ༏लจ ௶༎थ, ాതจ, ݹխେ, TSifter: Ϛ
ΠΫϩαʔ ビ εʹ͓͚Δੑೳҟৗͷਝͳஅʹ͍ͨ࣌ܥྻ デ ʔλͷ࣍ݩݮख๏, 202012݄. ɾ ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜ2020 ༏लϓϨθϯςʔγϣϯ ௶༎थ, TSifter: ϚΠΫ ϩαʔ ビ εʹ͓͚Δੑೳҟৗͷਝͳஅʹ͍ͨ࣌ܥྻ デ ʔλͷ࣍ݩݮख๏, 202012݄. ɾ 2020 ใॲཧֶձ ࢁԼه೦ݚڀɼ௶༎थ, Transtracer: ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴ ͷࢹʹΑΔϓϩηεؒґଘؔͷࣗಈ, 2020. ɾ ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜ2019ʢIOTS2019ʣ༏लจ ௶༎थ, ݹխେ, দຊ ྄հ, Transtracer: ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴͷࢹʹΑΔϓϩηεؒґଘؔͷࣗಈ, 201912݄. ɾ ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜ2019ʢIOTS2019ʣף: γʔɾΦʔɾίϯϰ ௶༎ थ, ݹխେ, দຊ྄հ, Transtracer: ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴͷࢹʹΑΔϓϩηεؒґଘ ؔͷࣗಈ, 201912݄. -
98 ɾ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead
TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, March 2022. ݚڀۀɹจࢽɾࠃࡍձٞ จࢽ ࠃࡍձٞ ɾ Y. Tsubouchi, M. Furukawa, R. Matsumoto, Transtracer: Socket-Based Tracing of Network Dependencies among Processes in Distributed Applications, The 1st IEEE International COMPSAC Workshop on Advanced IoT Computing (AIOT 2020), July 2020. ɾ ௶༎थ, ࡔேਓ, ᖛా݈, দխ, খྛོߒ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछࢄKVSؒͷࣗ ಈ֊ԽʹΑΔߴੑೳͳ࣌ܥྻσʔλϕʔε, ใॲཧֶձจࢽ, Vol.62, No.3, pp.818-828, 20213݄. ɾ Y. Tsubouchi, A. Wakisaka, K. Hamada, M. Matsuki, H. Abe, R. Matsumoto, HeteroTSDB: An Extensible Time Series Database for Automatically Tiering on Heterogeneous Key-Value Stores, The 43rd Annual IEEE International Computers, Software & Applications Conference (COMPSAC), pp. 264-269, July 2019. ɾ ௶༎थ, ҏจ, ஔాਅੜ, ࢁ૱, ദַ, ഡݪ݉Ұ, ॏෳഉআετϨʔδͷͨΊͷSHA-1ܭࢉγεςϜͷ SSE໋ྩʹΑΔߴεϧʔϓοτԽ, ిࢠใ௨৴ֶձจࢽ D, 96(10), pp.2101-2109 201310݄. ɾ Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Ef fi cient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. ʢߩݙ̎ʣ ʢߩݙ̍ʣ ʢߩݙ̏ʣ ʢߩݙ̍ʣ ʢߩݙ̎ʣ -
99 ݚڀۀɹࠃγϯϙδϜʢࠪಡʣ ɾ ʢߩݙ̏ʣ௶༎थ, ాതจ, ݹխେ, TSifter: ϚΠΫϩαʔϏεʹ͓͚Δੑೳҟৗͷਝͳஅʹ͍ͨ࣌ ܥྻσʔλͷ࣍ݩݮख๏, ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜจू,
2020, 9-16 (2020- 11-26), 202012݄. ɾ ௶༎थ, ੨ࢁਅ, MeltriaɿϚΠΫϩαʔϏεʹ͓͚ΔҟৗݕɾݪҼੳͷͨΊͷσʔληοτͷಈతੜ γεςϜ, ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜจू, 2021, 63-70 (2021-11-18), 202111݄. ɾ ྛ༑Ղ, দݪࠀ, ݡ, ௶༎थ, Situation Awarenessͱೝ৺ཧֶʹͱ͍ͮͨϚΠΫϩαʔϏεܕγες Ϝ͚ࢹμογϡϘʔυͷઃܭ, ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯϙδϜจू, 2021, 97-98 (2021-11-18), 202112݄. ɾ ాതจ, ௶༎थ, ࢄγεςϜͷੑೳҟৗʹର͢Δػցֶशͷղऍੑʹجͮ͘ݪҼஅख๏, ใॲཧֶձ Πϯλʔωοτͱӡ༻ٕज़γϯϙδϜจू, 2021, 24-31 (2021-11-18), 202111݄. ɾ ʢߩݙ̍ʣ௶༎थ, ݹխେ, দຊ྄հ, Transtracer: ࢄγεςϜʹ͓͚ΔTCP/UDP௨৴ͷऴͷࢹʹΑ Δϓϩηεؒґଘؔͷࣗಈ, Πϯλʔωοτͱӡ༻ٕज़γϯϙδϜจू, 2019, 64-71 (2019-11-28), 201912݄. ɾ ʢߩݙ̎ʣ௶༎थ, ࡔேਓ, ᖛా݈, দխ, Ѩ෦ത, দຊ྄հ, HeteroTSDB: ҟछࠞ߹Ωʔ バ ϦϡʔετΞ Λ༻͍ͨࣗಈ֊ԽͷͨΊͷ࣌ܥྻ デ ʔλ ベ ʔεΞʔΩςΫνϟ, ใॲཧֶձΠϯλʔωοτͱӡ༻ٕज़γϯ ϙδϜจू, 2018, 7-15 (2018-11-29), 201812݄. -
100 ݚڀۀɹࠃձٞʢࠪಡͳ͠ʣ ɾ ྛ༑Ղ, দݪࠀ, ݡ, ௶༎थ, ϚΠΫϩαʔϏεܕγεςϜͷࢹʹ͓͚ΔμογϡϘʔυUIઃܭʹىҼ ͢Δঢ়گೝࣝͷӨڹ, No.2022-IOT-56,
Vol.38, pp.1-8, 20223݄. ɾ দຊ྄հ, ௶༎थ, ΫϥΠΞϯτϓϩηεͷݖݶใʹجͮ͘TCPΛհͨ͠ಁաతͳݖݶํࣜͷઃܭ, ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2020-IOT-49, Vol.11, pp.1-6, 20205݄. ɾ ྛ༑Ղ, ҏా࿇, দݪࠀ, ݡ, ௶༎थ, দຊ྄հ, ಈతదԠੑΛ࣋ͭࢄγεςϜΛରͱͨ͠γεςϜ ঢ়ଶՄࢹԽख๏ͷݕ౼, ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2020-IOT-48, Vol.22, pp.1-8, 20203݄. ɾ ௶༎थ, ݹխେ, দຊ྄հ, ݸମܕσʔληϯλʔΛࢦͨ͠ωοτϫʔΫαʔϏεؒґଘؔͷࣗಈ ͷߏ, ϚϧνϝσΟΞɺࢄɺڠௐͱϞόΠϧʢDICOMO2019ʣγϯϙδϜ, 6A-2, pp. 1169-1174, 2019 7݄. ɾ ௶༎थ, দຊ྄հ, ݸମܕσʔληϯλʔʹ͓͚ΔࢄڠௐΫΤϦΩϟογϡߏ, ใॲཧֶձݚڀใࠂ Πϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2019-IOT-45, Vol.14, pp.1-7, 20195݄. ɾ দຊ྄հ, ௶༎थ, ٶԼ߶ี, ࢄܕσʔληϯλʔOSΛࢦͨ͠ϦΞΫςΟϒੑΛ࣋ͭίϯςφ࣮ߦج൫ٕ ज़, ใॲཧֶձݚڀใࠂΠϯλʔωοτͱӡ༻ٕज़ʢIOTʣ, No.2019-IOT-45, Vol.12, pp.1-8, 20193݄. -
ݚڀ֓ཁ: Scaling Telemetry Workloads in Cloud Applications എܠͱత ՝ ߩݙ
1. ΫϥυΞϓϦέʔγϣϯͷςϨϝτϦʔ 2. ςϨϝτϦʔϫʔΫϩʔυͷ૿େ 3. ςϨϝτϦʔϫʔΫϩʔυεέʔϦϯά 1. ܭଌɿܭଌॲཧΦʔόʔϔουͷ૿େ 2. ετϨʔδɿऔΓࠐΈσʔλྔͷ૿େͱظอଘ 3. ϚΠχϯάɿނোಛఆͷਫ਼ɾ࣮ߦޮͷԼ 1. ໋ͳωοτϫʔΫ௨৴͕૿େ͢ΔͱɺैདྷͷܭଌॲཧͰɺܭଌݩͷOS Χʔωϧ͔Βͷసૹॲཧίετ͕ߴ͍ɻ ϝτϦΫεͷ૿େʹରͯ͠ɺऔΓࠐΈॲཧޮͷ্ͱ̍Ҏ্ͷ ظอଘΛཱ྆͢Δ͜ͱ͕͍͠ɻ ϝτϦΫεͷ૿େʹରͯ͠ɺطଘͷಛݮΛద༻ͨ͠ͱͯ͠ɺγες ϜશମͷোΛଊ͑ΒΕͣɺِཅੑɾِӄੑ͕૿Ճ͢Δɻ ܭଌॲཧͷޮԽ [1] Y. Tsubouchi, M. Furukawa, R. Matsumoto, Low Overhead TCP/UDP Socket-based Tracing for Discovering Network Services Dependencies, Journal of Information Processing (JIP), Vol.30, pp.260-268, March 2022. [2] ௶༎थ, ࡔேਓ, ᖛా݈, দխ, খྛོߒ, Ѩ෦ത, দຊ ྄հ, HeteroTSDB: ҟछࢄKVSؒͷࣗಈ֊ԽʹΑΔߴੑೳͳ ࣌ܥྻσʔλϕʔε, ใॲཧֶձจࢽ, Vol.62, No.3, pp.818- 828, 20213݄. [3] Y. Tsubouchi and H. Tsuruta, MetricSifter: Feature Reduction of Multivariate Time Series Data for Ef fi cient Fault Localization in Cloud Applications, IEEE Access, Vol. 12, pp. 37398-37417, March 2024. 2. औΓࠐΈॲཧͱظอଘͷޮͷ্ 3. ނোಛఆͷલॲཧͰোʹؔ࿈͠ͳ͍มྔͷݮ OSΧʔωϧͰTCP/UDP௨৴ΠϕϯτΛूଋ͢Δ͜ͱʹΑΔసૹॲཧޮͷ্ ҟछKVSΛ֊Խ͠ɺΠϯσοΫεࢀরޮͱ҆ՁͳετϨʔδͷ֨ೲΛ࣮ݱɻ োൃੜ࣌ʹ֤࣌ܥྻͷมԽ͕࣌ؒूத͢Δ͜ͱΛߟྀͨ͠ಛݮʹΑΓɺ ނোಛఆਫ਼ͱ࣌ؒΛվળɻ ֤ͷϫʔΫϩʔυ૿େ࣌ͷ՝ղܾ ςϨϝτϦʔϫʔΫϩʔυ૿େͷ՝ ޮతʹεέʔϧՄೳͳςϨϝτϦʔγ εςϜͷ࣮ݱʹ͚ͯ ΞϓϦέʔγϣϯ͕ෳࡶԽ͓ͯ͠ΓɺςϨϝτϦʔʹΑΔӡ༻ ཧ͕ඞਢͰ͋Δɻ [1] [2] [3] ςϨϝτϦʔγεςϜͰɺܭଌɾετϨʔδɾϚΠχϯάͷ֤ ͰϫʔΫϩʔυ͕૿େ͍ͯ͠Δɻ ܭࢉػࢿݯͷফඅ૿େͳͲͷʹରͯ͠ޮΑ͘εέʔϧͤ͞Δ ͜ͱΛతͱ͢Δɻͨͩ͠ɺӡ༻ෳࡶੑΛߟྀ͢Δ͜ͱɻ