CARVIEW |
Select Language
HTTP/2 200
date: Wed, 08 Oct 2025 18:06:26 GMT
content-type: text/html; charset=utf-8
cache-control: max-age=0, private, must-revalidate
cf-cache-status: DYNAMIC
link: ; rel=preload; as=style; nopush,; rel=preload; as=script; nopush,; rel=preload; as=style; nopush,; rel=preload; as=script; nopush,; rel=preload; as=script; nopush
nel: {"report_to":"heroku-nel","response_headers":["Via"],"max_age":3600,"success_fraction":0.01,"failure_fraction":0.1}
referrer-policy: strict-origin-when-cross-origin
report-to: {"group":"heroku-nel","endpoints":[{"url":"https://nel.heroku.com/reports?s=nMfDfNwrizokCh99pdj54fnXrNHk0Fv%2FaEQCZWmdrKw%3D\u0026sid=e11707d5-02a7-43ef-b45e-2cf4d2036f7d\u0026ts=1759946786"}],"max_age":3600}
reporting-endpoints: heroku-nel="https://nel.heroku.com/reports?s=nMfDfNwrizokCh99pdj54fnXrNHk0Fv%2FaEQCZWmdrKw%3D&sid=e11707d5-02a7-43ef-b45e-2cf4d2036f7d&ts=1759946786"
server: cloudflare
strict-transport-security: max-age=0; includeSubDomains
vary: Accept,Accept-Encoding
via: 2.0 heroku-router
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
x-request-id: 81f249cd-26c3-6edf-cf80-0c9d09e99bc5
x-runtime: 0.140148
x-xss-protection: 0
content-encoding: gzip
set-cookie: _secure_speakerd_session=xnav8IeXwzYDSPdjPKhyMAAEZDk8oujf7pe6LFoTItAUm7lz24egFSY4uRsyJhp5uLhOKmhJyo1TCElhKuku5cTxgUw%2FTajIrIW%2BTAwu27AvhKL8HWfMREsfjwf3crOvE5IFSoChs3sqEH2JSCk2XDTDxf9C9%2BAyACbm4Pl8wiPl%2FgVVZ7%2Fwe4OYimC0Bd1kD8sWHbyaK6w%2FfRp6XL8%2BFMSqc0HvzxZRYCJgwaBHj2E9HSjfWpN4A4ApR%2BgZfiPM4Z1HeOv8t3ktjEprIiGpXVZr6mrJxborE%2F6LBsqutegJnFM4Sf%2BJwg9zcUtytmdSurxO1XnieUlKoZYdBbV%2Fd1bVOVNoOl1aTh96svMk5pvzzm3VRiFVEKHZuPIAEdwArhFbuJqLLru0CBhqvClyYBy8--tcxttO9MljHitgB8--DblexF5Kj09taCLbB2dJDg%3D%3D; HttpOnly; SameSite=Lax; Secure; Path=/; Expires=Wed, 22 Oct 2025 18:06:26 GMT
cf-ray: 98b79277ad9a20c5-BLR
工学としてのSRE再訪 / Revisiting SRE as Engineering - Speaker Deck
工学としてのSRE再訪 / Revisiting SRE as Engineering
SRE NEXT 2024 IN TOKYO.
Yuuki Tsubouchi (yuuk1)
August 03, 2024
More Decks by Yuuki Tsubouchi (yuuk1)
Other Decks in Research
Featured
Transcript
-
ֶͱͯ͠ͷSRE࠶๚ Yuuki TSUBOUCHI / @yuuk1t 2024/08/03 SRE NEXT 2024 IN
TOKYO Revisiting SRE as Engineering -
2 Yuuki TSUBOUCHI / yuuk1 ͘͞ΒΠϯλʔωοτݚڀॴɹ্ڃݚڀһ TopotalɹςΫϊϩδΞυόΠβʔ ژେֶେֶӃ ใֶݚڀՊ Ph.D.
Candidate https://yuuk.io/ SRE NEXTొஃྺ 2020 SREͷ૯ͱऔΓΜͰ͍Δݚڀͷ جௐߨԋ 2022 ެื 2023 ެื AIOpsݚڀ SREจͷট SREͷݚڀऀ -
5 ߨԋλΠτϧͷؚҙɿ ຊޠͰʮΤϯδχΞϦϯάʯ͕࣮తɾٕज़తͳଆ໘ʹযΛͯΔ͜ͱ͕ଟ͘ɺʮֶʯ͕ ΑΓֶज़తͳଆ໘Λڧௐ͢Δ͕͋Γ·͢ɻҰํɺӳޠͷ&OHJOFFSJOHͦͷ྆ํΛแؚ͢Δ ͍ҙຯΛ͍࣋ͬͯ·͢ɻz By ChatGPT 4o ʮֶͱͯ͠ͷSRE࠶๚ʯ “Engineering”ͷ༁ޠʮֶʯ͕ͩ…
ΤϯδχΞϦϯά Engineering ֶ ࣮తɾٕज़తͳଆ໘ ֶज़తͳଆ໘ ϓϥΫςΟεʢ࣮ફʣ ຊߨԋͷ য -
lΣ ブ Φ ペ Ϩʔγϣϯٕܳ で ͋ΓɺՊֶ で ͳ͍z ※1
John AllspawɺJesse Robbinsฤɺ֯ య༁,ΣϒΦϖϨʔγϣϯʔʔαΠτӡ༻ཧͷ࣮ફςΫχοΫ,ΦϥΠϦʔδϟύϯ ※1 -
5ͷΤϯδχΞܦݧΛجʹͨ͠ ࣗͳΓͷSRE 11 2019SREߟ https://blog.yuuk.io/entry/2019/thinking-sre ٕ͔ܳΒֶ SRE৴པੑΛఆྔԽ͠ɺదͳʹ੍ޚ SRESoftware Engineering ※1
Beyer, Betsy, et al., “Site reliability engineering: How Google runs production systems.”, O'Reilly Media, Inc., 2016. ※1 Figure III-1 ৴པੑ੍ޚͷߏΛ֊Խ͠ɺશମ၆ᛌ -
12 ͳʹ͕มΘͬͨͷ͔ʁ SREจ຺ʹ͓͚Δֶੑ ͋Δ͖ঢ়ଶ ࠓͷঢ়ଶ ࠩ ԿΛ͖͔͢ ɾaction 1 ɾaction
2 ɾ… ࠓͷঢ়ଶ खஈࢦ తࢦ ओ؍త શମ၆ᛌ ٬؍త ہॴࢹ ٕܳత ֶత ※1 ”։ൃऀͷͨΊͷγεςϜζΤϯδχΞϦϯάಋೖͷનΊ”, ୈ1.1൛, IPA, 2017. ※1 ※1 -
※1 γʔφɾΞΠΤϯΨʔ (ஶ), ᓎҪ༞ࢠ (༁), “THINK BIGGER ʮ࠷ߴͷൃʯΛੜΉํ๏ɿίϩϯϏΞେֶϏδωεεΫʔϧಛผߨٛ”, χϡʔζϐοΫε, 2023.
14 ԿͷΛ͢Δͷ͔ʁ SREΛֶͱͯ͠࠶๚͢Δ ະղܾͷֶత՝ʢΦʔϓϯνϟϨϯδʣ Λ͍ٞͨ͠ ͔͠͠ɺࠃίϛϡχςΟͰ ֶԽͷഎܠޠΒΕ͍ͯͳ͍ -
15 ࠶๚ͷಓॱ ֶԽͷྺ࢙తഎܠ ະղܾ՝ͷྫ 3. SREʹଓ͞ΕΔ SREcon͔Β୳ࡧ ۙͳ՝ͱࠜຊͷ͍ ະൃݟ՝ͭͳ͕Δ͍ ※1
γʔφɾΞΠΤϯΨʔ (ஶ), ᓎҪ༞ࢠ (༁), “THINK BIGGER ʮ࠷ߴͷൃʯΛੜΉํ๏ɿίϩϯϏΞେֶϏδωεεΫʔϧಛผߨٛ”, χϡʔζϐοΫε, 2023. ശͷ֎Λ୳͢ ※1 -
17 lΣ ブ Φ ペ Ϩʔγϣϯٕܳ で ͋ΓɺՊֶ で ͳ͍z
ΣϒΦϖϨʔγϣϯ (ݪஶ 2010ൃץ) ఆٛɿl*5γεςϜཧͷઐ で ɺΣ ブ Ξ プ Ϧέʔγϣϯͷ։ ൃɾӡӦɾอकɾௐɾमཧΛؚΉz lʮਖ਼͍͠ํ๏ʯ ど ͜ʹଘࡏ͠ͳ͍ɻͦ͜ʹ͋Δͷɺ ͱ Γ͋͑ ず ࠓ ͏·͍͘͘ͱ͍͏ࣄ࣮ͱɺ࣍ͬͱྑ͘͢Δͱ ͍͏֮ޛ だ ͚ だ ɻz lωοτϫʔΫɾϧʔςΟϯ グ ɾεΠονϯ グ ɾϑΝΠΞΥʔ ϧɾෛՙࢄɾߴՄ༻ੑɾো෮چɾ5$16%1ͷαʔ ビ εɾ /0$ͷཧɾϋʔ ド ΣΞ༷ɾෳͷ6OJYڥɾෳͷΣ ブ αʔ バ ٕज़ɾΩϟογϡٕज़ɾ デ ʔλ ベ ʔεٕज़ɾετϨʔ ジ Πϯϑϥɾ҉߸ٕज़ɾΞϧ ゴ Ϧ ズ ϜɾੳɾΩϟ パ γςΟ ܭըཱҊͳ ど Λਂ͘ཧղ͍ͯ͠ͳ͚Ε ば ͳΒͳ͍ɻz ※1 John AllspawɺJesse Robbinsฤɺ֯ య༁,ΣϒΦϖϨʔγϣϯʔʔαΠτӡ༻ཧͷ࣮ફςΫχοΫ,ΦϥΠϦʔδϟύϯ ※1 -
18 SREBook (ݪஶ 2016ൃץʣ l<#VS>ͷಋೖ෦ で ɺγεςϜཧώϡʔϚϯίϯ ピ ϡʔλΤϯ ジ
χΞϦϯ グ ͷܗͷҰͭ だ ͱࢲओு͠· ͨ͠ɻϨ ビ ϡʔΞͷதʹʮ· だ ͦΕΤϯ ジ χΞϦϯ グ ͱݺ べ Δ΄ ど ͷஈ֊ʹདྷ͍ͯͳ͍ʯͱڧ͘൱ఆ͢Δ ਓ͍·ͨ͠ɻ͜ͷ࣌ で ɺࢲ͜ͷݟࣦΘΕ ͯɺಠࣗͷຐज़ࢣతͳจԽʹͱΒΘΕɺਐΉ べ ͖ํ が ݟ͑ͳ͘ͳ͍ͬͯΔͱײ じ ͍ͯ·ͨ͠ɻl Mark Burgess ཧཧֶͷPh.D.ɺChef/PuppetͳͲͷલ ͷCFEngine࡞ऀɻ ίϯϐϡʔλ໔Ӹֶϓϩϛεཧͷఏএऀɻ Network and system administration is a branch of *engineering* that concerns the operational management of human–computer systems. [Bur99]: “Principles of Network and System Administration”, 1999 ΑΓҾ༻ ※1 Betsy Beyer [΄͔] ฤ ; ۄཽ࢘༁, "SREαΠτϦϥΠΞϏϦςΟΤϯδχΞϦϯά : Googleͷ৴པੑΛࢧ͑ΔΤϯδχΞϦϯάνʔϜ”, ΦϥΠϦʔɾδϟύϯ, 2016. ※1 -
19 ɾSREͷલతͳͱͯ͠ʮιϑτΣΞ৴པੑֶʯ͕͋Δ ɾSREͱҟͳΓɺιϑτΣΞͷग़ՙલͷϓϩηεʹண͍ͯ͠Δ ιϑτΣΞ৴པੑֶʢSoftware Reliability Engineering) lʮιϑτΣΞֶʯ͕ਅͷֶͱͯ͠શʹਐԽ͍ͯ͠ͳ͍͔ΒͰ ͋Δɻཧ๏ଇݫີͳखॱͰͳ͘ɺਓؒͷஅओ؍తͳΈ͕ɺιϑ τΣΞֶʹ͓͚Δଟ͘ͷҙࢥܾఆϓϩηεΛࢧ͍ͯ͠Δɻ͜ͷঢ়گ ɺιϑτΣΞͷ৴པੑֶʹ͓͍ͯಛʹਂࠁͰ͋Δɻ৴པੑɺ࣭Λ
ఆྔతʹଌఆ͠ɺͦͷྔΛదʹઃܭ͢Δ͜ͱ͕Ͱ͖ΔͨΊɺ͓ͦΒ͘Ͳͷ ֶʹ͓͍ͯओு͖͢࠷ॏཁͳཁૉͰ͢ɻzʢ༁ʣ ※1 Michael R. Lyu, “Software Reliability Engineering: A Roadmap”, FOSE, 2007. ※1 -
20 ɾ1987͔Β͡·ͬͨγεςϜཧʹؔ͢Δࠃࡍձٞ ɾ2022ʹUSENIX SREcon౷߹͞ΕΔ USENIX LISA “γεςϜཧίϯϐϡʔςΟϯάͷଞͷଟ͘ͷͱॏෳ͍ͯ͠Δ ͨΊɺֶज़քͰҰൠతʹͱͯ͠ΕΒΕ͍ͯΔɻ”ʢ༁ʣ ※2 Mark
Burgess, Computer Immunology, USENIX LISA 1998. USENIX board: “͍ɺզʑֶऀͩ͠ɺγεςϜཧʹՊֶత ͳ͜ͱݚڀతͳ͜ͱ໘ന͍͜ͱԿͳ͍ɻ”ʢ༁ʣ ※2 ※1 Thomas Limoncelli, “LISA made LISA obsolete (That's a compliment!)”, 2022. https://www.usenix.org/publications/loginonline/lisa- made-lisa-obsolete-thats-compliment ※1 -
21 The Morning Paper on Operability l͜ͷߨԋʹ͍ͭͯߟ͑࢝Ίͨͱ ͖ɺจͷେʢগͳ͘ͱࢲ ͕ಡΜͩจͷେʣӡ༻্ͷ ʹ͍ͭͯ͋·Γ৮Ε͍ͯͳ͍
ͱ͍͏ҹΛ࣋ͪ·ͨ͠ɻ͔͠ ͠ɺࣗͷίϨΫγϣϯΛৼΓ ฦͬͯΈΔͱɺӡ༻ʹؔ࿈͢Δ ʹ৮Ε͍ͯΔจ͕͍͔ʹଟ͍ ͔ʹڻ͔͞Ε·ͨ͠ɻzʢ༁ʣ https://blog.acolyer.org/2016/09/21/the-morning-paper-on-operability/ -
22 lͱͱɺೃછΈਂ͍ʮιϑτΣΞΤϯδχΞͳΜ͔ͩΒɺ ܁Γฦ͠ͷ࡞ۀͳΜ͔͜͏ͬͯย͚͍ͨʯͱ͍͏ൃ͕ಈػ ͩͬͨͷͷɺαΠτϦϥΠΞϏϦςΟΤϯδχΞϦϯάࠓͦ ΕҎ্ͷͷɺ͢ͳΘͪҰ࿈ͷࢦɺϓϥΫςΟεɺಈػ͚ɺͦ ͯ͠ιϑτΣΞΤϯδχΞϦϯάͱ͍͏େͳྖҬͷதͷྗ ͱͳͬͨͷͰ͢ɻz SREBook (ݪஶ 2016ʣ
1ষ ΠϯτϩμΫγϣϯ 1.4 ࢝·ΓͷऴΘΓ ͔ΒҰ෦ൈਮͯ͠సࡌ ※1 Betsy Beyer [΄͔] ฤ ; ۄཽ࢘༁, "SREαΠτϦϥΠΞϏϦςΟΤϯδχΞϦϯά : Googleͷ৴པੑΛࢧ͑ΔΤϯδχΞϦϯάνʔϜ”, ΦϥΠϦʔɾδϟύϯ, 2016. -
24 ֶࢥߟʹجͮ͘ʢͱࢥΘΕΔʣදతߩݙͷҰ෦ ৴པੑͷ༧ࢉԽ ։ൃੜ࢈ੑͷࢦඪ ։ൃ৫ͷઃܭ ΤϥʔόδΣοτ ʹجͮ͘ҙࢥܾఆ๏ DORASPACEͳͲʹΑΔ ։ൃੜ࢈ੑͷఆྔతͳࢦඪԽ๏ Team
TopologiesʹΑΔιϑτ ΣΞϓϩμΫτͷదԠܕ৫ ઃܭ๏ ※2 Skelton, Matthew, and Manuel Pais, “Team Topologies: Organizing Business and Technology Teams for Fast Flow”, IT Revolution, 2019. ※1 N. Forsgren, H. Jez Humble, and K. Gene, “Accelerate: The science of lean software and devops: Building and scaling high performing technology organizations”, IT Revolution, 2018. ※2 ※1 ΦϒβʔόϏϦςΟ ςϨϝτϦʔʹجͮ͘ ԋ៷ʹΑΔσόοά๏ -
ϩʔΧϧͷϝϞϦͷόοϑΝϓʔϧʹ τϨʔεΛҰఆྔอ࣋ͭͭ͠ɺݕग़ޙ ʹશϊʔυ͔ΒḪͬͯσʔλऩू 28 τϨʔεσʔλ͕΄ͱΜͲࢀর͞Εͳ͍ γάφϧͷ ࡉཻԽ ίετͱ Φʔόʔϔου ૿େ
Τοδέʔεͷ ݟಀ͠ αϯϓϦϯά ※1 Paige Cruz, “99.99% of Your Traces Are (Probably) Trash", SREcon24 Americas, 2024. ※2 Zhang, Lei et al, “The Bene fi t of Hindsight: Tracing Edge-Cases in Distributed Systems.”, NSDI, 2022. ͋Δ͖ঢ়ଶɿোൃੜલޙ͚ͩτϨʔε͢Ε͍͍ͷͰʁ ※2 ※1 -
29 TTRʢTime to ResolveʣͷϘτϧωοΫΛఆྔతʹಛఆ͢Δ ΠϯγσϯτରԠͷվળ͕ͳ͔ͳ͔Ͱ͖ͳ͍ Xiaoyun Li, et al., “Going
through the Life Cycle of Faults in Clouds: Guidelines on Fault Handling”, ISSRE’22. Fig. 2ΑΓసࡌ 1. ϥΠϑαΠΫϧͷ֤ஈ֊Ͱͷॴཁ࣌ؒΛܭଌ͢Δ 2. ֤ஈ֊ͰɺྨࣅͷཁҼͰॴཁ͕࣌ؒେ͖͍ՕॴΛಛఆ͢Δ 3. ࠷େͷՕॴ͔Β༏ઌͯࠜ͠ຊతͳվળΛߦ͏ -
31 SLO͍ͬͯΖΜͳҙࢥܾఆʹ͑ΔͷͰʁ ΤϥʔόδΣοτ ྔ ݪҼಛఆ or ෮چ༏ઌʁ ࣄޙͷࠜຊରࡦΛ ࣮ࢪ͢Δ or
͠ͳ͍ʁ Λ૿Ճ or ݮগͤ͞Δʁ . . . ྔΛ͍Βͳ͍Α͏ʹ దԠతʹ੍ޚ͢Δ ΞϥʔτͷՃ or আʁ -
33 SLO͔ΒγεςϜΞʔΩςΫνϟΛಋग़Ͱ͖Δͣʁ ※1 ࢁޱ ೳ, “৴པੑඪͱγεςϜΞʔΩςΫνϟʔ”, SRE NEXT 2023.https://speakerdeck.com/ymotongpoo/reliability-objective-and-system-architecture SLOs
※1, ※2 ※2 r9y, https://r9y.dev Workloads System Architecture ΩϟύγςΟ ߴՄ༻ੑ ෛՙࢄ Ωϟογϯά ඇಉظԽ Data Structure ΠϯγσϯτରԠମ੍ -
37 ɾࢄγεςϜ ɾ৴པੑֶ /ϨδϦΤϯεֶ / ҆શֶ ɾࣾձֶ ɾೝՊֶ SREcon͔ΒଓΛ୳͢ SREconʹΞΧσϛοΫͳഎܠΛؚΉϓϨθϯ͕Ұఆ͋Δ
όϦΤʔγϣϯʹΉ ݚڀऀɺPh.D.औಘऀɺത࢜՝ఔֶੜ͕ొஃྫগͳ͘ͳ͍ “site:https://www.usenix.org PhD SREcon“ -
38 ΞϓϦʹো͕ൃੜͯ͠ɺোݕग़ث͕ؾ͔ͮͳ͍ Gray Failure Ze Li, and Ryan Huang, “Gray
Failure: The Achilles’ Heel of Cloud-Scale Systems”, SREcon24 Americas @SREcon24 Americas HotOS’17ͳͲͷֶज़ͷ ࠃࡍձٞͰൃද͞Εͨ Gray Failureʹ͍ͭ ͯͷέʔεελσΟ SREcon23 EMEA, SREcon22 EMEA ͰऔΓ্͛ΒΕͨ ߨԋͰεΩοϓ -
39 ͋ΔτϦΨʔ͕γεςϜΛѱԽͤ͞ɺͦͷτϦΨʔΛऔΓআ͍ͯѱԽ ͨ͠··ʹͳΔ Metastable Failure @SREcon23 Americas Kyle Lexmond, “We're
Still Down: A Metastable Failure Tale”, SREcon23 Americas ※1 Bronson, Nathan Grasso et al., “Metastable failures in distributed systems.” HotOS’21. ݩʑࠃࡍձٞͷHotOS’21 Ͱఏࣔ͞Εͨোύλʔϯ ※1 ߨԋͰεΩοϓ -
40 γεςϜؒ૬ޓ࡞༻ʹ༝དྷ͢ΔোΛମܥԽ͠ɺ͙ͨΊͷςετͱ ݕূख๏ΛఏҊ Cross-System Interaction Failures @SREcon23 Americas ※1 Tang,
Lilia et al., “Fail through the Cracks: Cross-System Interaction Failures in Modern Cloud Systems.”, EuroSys 2023. ※1 ݩจࠃࡍձٞͷ EuroSys’23Ͱൃද͞Εͨɻ ஶऀͷҰਓ͕SREconͰൃ ද͍ͯ͠Δɻ ߨԋͰεΩοϓ -
41 ࣾձֶऀN. LuhmannͷࣾձγεςϜʢෳࡶ͞ɺΦʔτϙΠΤʔγε ͷ֓೦ʣΛհ͠ɺSREʹؔ࿈͚ A Political Scientist’s Insights @SREcon21 Michael
Krax, A Political Scientist's View on Site Reliability, SREcon21, 2021. -
42 MetaͷΤϯδχΞʹΠϯλϏϡʔΞϯέʔτௐࠪΛ͠ɺ৴པੑͷจԽ Λྔతɾ࣭తʹܭଌ͢Δɻ Measuring Reliability Culture @SREcon24 Americas Kathryn (Casey)
Bouskill, “Measuring Reliability Culture to Optimize Tradeoffs: Perspectives from an Anthropologist”, SREcon24 Americas 54%ͷνʔϜ͕ ”Find it hard to identify reliability gaps” ൃදऀਓྨֶͷത࢜߸ͱӸֶͷम࢜߸Λͭɻ ৴པੑ্ͷͨΊͷ۩ମతͳΞΫ γϣϯ͕໌֬Ͱͳ͍ɺ·ͨ༏ઌ ॱҐ͚͕͍͠ͱ͍͏՝ -
43 ਓؒΛഉআͯࣗ͠ಈԽ͢Δ΄ͲɺਓؒʹߴͳεΩϧΛཁٻ͢Δൽ Ironies of Automation Tanner Lund, “Ironies of Automation:
A Comedy in Three Parts”, SREcon19 Asia. @SREcon19 Asia ೝ৺ཧֶऀͷBainbridgeʹ ΑΔ1983ͷจ ※1 L. Bainbridge, “Ironies of Automation”, Automatica, Vol.19, No.6, pp.775–779 1983. ※2 B. Strauch, "Ironies of Automation: Still Unresolved After All These Years". IEEE Transactions on Human-Machine Systems, Vol.48, No.5, pp.419–433 2018. 2018Ͱଓ͘Ͱ͋Δ ※1 ※2 ࣗಈԽγεςϜ͕ਓؒͷೳ ྗෆΛӅṭͯ͠͠·͏৽ ͍͠ൽΛఏࣔ -
44 ɾൃදऀͷDavid WoodsϨδϦΤϯεֶͷେՈ ɾෳࡶͳγεςϜͷঢ়ଶΛײతʹཧղ͢ΔProcess FeelͷॏཁੑΛఏএ Process Feeling @SREcon21 David D.
Woods, Laura Nolan, You've Lost That Process Feeling: Some Lessons from Resilience Engineering, SREcon21 2021. ɾݪࢠྗൃిॴͷΦϖϨʔλʔɺ ੍ޚγεςϜͷΧϯλʔͷҰఆ ϕʔεͷԻͰਖ਼ৗੑΛײ֮తʹཧ ղ͍ͯͨ͠ ɾSLOͷൣғͰਖ਼ৗͰ͋ͬͯɺ ෦తͳҟৗʹ͙͢ʹؾ͚ͮΔ -
45 ɾΠϯγσϯτίϚϯμʔͷใ ूதೝతաෛՙͷͨΊɺΑΓ ࢄܕͷௐϞσϧΛఏএ ɾ2020ʹത࢜จͱͯ͠ެ։ Controlling the Costs of Coordination
@SREcon20 Americas Laura Maguire, The Secret Lives of SREs - Controlling the Costs of Coordination across Remote Teams, SREcon20 Americas, 2020. ※1 Laura Maguire, Controlling the Costs of Coordination in Large-scale Distributed Software Systems, Dissertation, The Ohio State University, 2020. ※1 ɾൃදऀIntegrated Systems Engineeringͷത࢜՝ఔݚڀͰ ̐ͭͷ৫ͷ̒̎ݸͷΠϯγσϯτରԠࣄྫΛௐࠪɻ -
46 1. γεςϜ͕ෳࡶԽ͠ɺ৽ͨͳোύλʔϯ͕ੜ·Ε͍ͯΔ 2. ෳࡶ͞ʹର͢Δղܾɺ୯७ԽͰͳ͍͔͠Εͳ͍ʁ 3. ʮจԽʯͰย͚ͣɺจԽΛܭଌ͢Δ 4. ࣗಈԽʹΑΔΦϖϨʔλʔͷഉআʹݶք͕͋ΔͷͰʁ 5.
͍͔ʹΦϖϨʔλʔͷೝෛՙΛԼ͛Δ͔ʁ SREcon͔ΒಘΒΕͨࣔࠦ [Gray | Metastable | Cross-System Interactions] Failures Only complexity can reduce complexity. Measuring Reliability Culture Ironies of Automation Process Feeling: ΦϖϨʔλʔݸਓͷγεςϜೝෛՙ Controlling the Costs of Coordination: ΦϖϨʔλʔಉ࢜ͷใڞ༗ೝෛՙ ίϯϐϡʔλݻ༗Ͱͳ ͍͍ଞͷֶज़͔ Β͕ࣔࠦಘΒΕΔ -
47 ɾγεςϜͷෳࡶ͞Λࣔ͢ࢦඪͳ͍ͷ͔ʁ ɾProcess FeelΛମಘͰ͖ΔΑ͏ͳ܇࿅๏ͳ͍͔ʁ ɾೝෛՙͬͯܭଌͰ͖ͳ͍ͷ͔ʁ ɾ… SREcon͔ΒಘΒΕͨ৽ͨͳ͍ Human-Computer Engineering /
Sociotechnology ͱͯ͠ൃల͢ΔՄೳੑΛײ͡Δ -
49 ࠶๚ͷ·ͱΊ ֶԽͷྺ࢙తഎܠ ະղܾ՝ͷྫ 3. SREʹଓ͞ΕΔ γεςϜֶɺϨδϦΤϯεֶ ೝՊֶɺਓྨֶɺࣾձֶ ۙͳ՝ͱࠜຊͷ͍ ※1
γʔφɾΞΠΤϯΨʔ (ஶ), ᓎҪ༞ࢠ (༁), “THINK BIGGER ʮ࠷ߴͷൃʯΛੜΉํ๏ɿίϩϯϏΞେֶϏδωεεΫʔϧಛผߨٛ”, χϡʔζϐοΫε, 2023. Human-Computer Engineering ൃలͱͯ͠ͷՄೳੑ USENIX LISAɺιϑτΣΞ৴པੑֶ Πϯλʔωοτӡ༻ٕज़ͳͲͷจݙ͔Β -
ֶ > ٕܳ ͳͷ͔ʁ ٕܳഉআ͞ΕΔͷͰͳ͘ ڞଘͤ͞ΔͷͰͳ͍͔ʁ lγεςϜɾΤϯδχΞϦϯά ɺՊֶͰ͋Δͱಉ࣌ʹܳज़Ͱ ͋ΔͷͰ͢ɻz ※1
”։ൃऀͷͨΊͷγεςϜζΤϯδχΞϦϯάಋೖͷનΊ”, ୈ1.1൛, IPA, 2017. ※2 ݪౡ ത. “จԽֶͱͯ͠ͷֶ”, ిࢠใ௨৴ֶձࢽ Vol.99, No.4, 2016. ※1 lֶจԽֶͰ͋Δz ※2 lʜݩʑٕज़ͱܳज़΄ͱΜͲҰମ Ͱ͋ͬͨɽz ※2 -
53 ֶͱͯ͠ͷSREͷఆٛʢ2024 yuuk1൛ʣ SREͱɺߴසͷมߋΛલఏͱ͢ΔγεςϜΛରʹɺ ɹ1) ར༻ऀࢹͰͷ৴པੑΛܭଌՄೳͳมྔʹؼணͤ͞ɺ ɹ2) ৴པੑΛదͳʹ੍ޚՄೳͱ͢Δ͜ͱʹΑΓɺ ɹ3) ଞͷมྔʢมߋɺඅ༻ͳͲʣΛ·͍͠ʹಋ͘ɺ
͜ͱΛతͱ͢ΔιϑτΣΞֶͷҰͰ͋Δ ࠷దԽͷΑ͏ͳܗͰ ఆ͍ٛͯ͠Δ -
54 ຊߨԋͰఏࣔ͢Δରൺߏ खஈࢦ ֶత ٕܳత తࢦ πʔϧۦಈ ཧۦಈ ओ؍త ٬؍త
ϦΞΫςΟϒ ϓϩΞΫςΟϒ ہॴࢹ શମ၆ᛌ ຐज़త τοϓμϯ ϘτϜΞοϓ -
55 ٕज़ʢٕܳʣͱֶ ※1 ଜ্ ཅҰ, ”ֶͷྺ࢙ͱٕज़ͷྙཧ”, ؠॻళ, 2006. lٕज़ͱɼਓ͕ؒͦͷੜΛશ͏͢ΔͨΊʹɼࣗΒͷతҙࣝʹج ͍ͮͯɼඪͷୡΛࢦͯ͠Ҋग़͠ɼ·ͨ༻͢ΔʮΘ͟ʯͷ૯
ମͱͰఆٛ͢ΕΑ͍ͩΖ͏z lֶͱ͍͏ݴ༿ɼͦ͏ٕͨ͠ज़ΛֶԽͨ͠ͷͱఆٛͰ͖Δz (1) ݴޠͳͲʹΑΓ͘ୡՄೳͳܗʹඋ͞Ε͍ͯΔ (2) ઐྖҬʹΑͬͯମܥԽ͞Εͨʮࣝʯͱ͍͏ܗଶΛͱΔ ※1 Ұൠʹٕज़ɺඞͣ͠ʮࣝԽʯ͞Ε͍ͯͳ͍ -
58 γεςϜֶʢSystems Engineeringʣ ※1 ”։ൃऀͷͨΊͷγεςϜζΤϯδχΞϦϯάಋೖͷનΊ”, ୈ1.1൛, IPA, 2017. (1) తࢦͱશମ၆ᛌ
γεςϜΛޭͤ͞ΔͨΊͷෳͷઐʹ·͕ͨΔ ΞϓϩʔνͱखஈͰ͋Δ ఆٛ (2) ଟ༷ͳઐΛ౷߹ Japan Council on Systems Engineering ʹΑΔ (3) நԽɾϞσϧԽ (4) ෮ʹΑΔൃݟͱਐԽ -
59 ιϑτΣΞֶʢSoftware Engineering) ιϑτΣΞΤϯ ジ χΞϦϯ グ ͱɺιϑτΣΞγεςϜͷ ։ൃɺςετɺ デプ
ϩΠɺ ӡ༻ɺอकʹ͓͍ͯɺମܥత で ن ͷ͋ΔఆྔՄೳͳΞ プ ϩʔνΛద༻͢Δͷ で ͋Δɻ ※1 Ivar Jacobson, et al., ϞμϯɾιϑτΣΞΤϯδχΞϦϯά, 2020, ᠳӭࣾ. ※1 -
60 SREγεςϜཧʹతͱશମ၆ᛌΛ༩͑Δ SRE DevOps ػೳཁ݅ ඇػೳཁ݅ ϓϩμΫτ Ϛωδϝϯτ Ϣʔβʔ ʢ৴པੑʣ
ΦϒβʔόϏϦςΟ / ϞχλϦϯά σϦόϦʔ Πϯγσϯτཧ ԾઆݕূΛߴʹ มߋཧ దʹ੍ޚ SLI/SLO -
61 ૹͷϑϦʔϨϯ ૹͷϑϦʔϨϯͰɺʮਓͱԿ͔ʁʯΛςʔϚͱ͢ΔதͰɺਓ ྨͱຐͱͷରൺΛ௨ͯ͠ɺΑΓ໋ͳਓྨ͕ࣾձͱྺ࢙ͷྗͰຐ ๏ΛߴΈͱಋ͍͍ͯ͘γʔϯ͕ݟड͚ΒΕΔɻ ຐٕܳతɺਓྨֶతʹ ຐ๏ʹΞϓϩʔν͍ͯ͠Δ ※1 ࢁా ਓ
(ݪஶ), Ξϕ πΧα, “ૹͷϑϦʔϨϯ”, খֶؗ, 2020. ※1 -
62 Οʔφʔք໘ ग़యɿ౻Ҫ ܟ. ݱ࣮ͱʁɹͱҙࣝͱςΫϊϩδʔͷະདྷ(ϋϠΧϫ৽ॻ) (p. 14). Kindle Edition. ΟʔφʔʰαΠόωςΟοΫεʱͷຊޠ൛ͷલॻ͖ʹɺ໘ന͍͜ͱΛॻ͍͍ͯ·͢ɻ
> ΘΕΘΕͷঢ়گʹؔ͢Δೋͭͷมྔ͕͋Δͷͱͯ͠ɺͦͷҰํΘΕΘΕʹ੍ޚͰ͖ͳ͍ͷɺଞ ͷҰํΘΕΘΕʹௐઅͰ͖ΔͷͰ͋Δͱ͠·͠ΐ͏ɻͦͷͱ੍͖ޚͰ͖ͳ͍มྔͷաڈ͔Βݱࡏʹ ͍ͨΔ·Ͱͷʹͱ͍ͮͯɺௐઅͰ͖ΔมྔͷΛదʹఆΊɺΘΕΘΕʹ࠷ͭ͝͏ͷΑ͍ঢ়گΛ ͨΒ͍ͤͨͱ͍͏Έ͕ͨΕ·͢ɻͦΕΛୡ͢Δํ๏͕Cyberneticsʹ΄͔ͳΒͳ͍ͷͰ͢ɻ ͜ͷߟ͑ํ͕ඇৗʹ໘നͯ͘ɺ࣮͜ΕΤϯδχΞϦϯάͷతɺ͋Δ͍ςΫϊϩδʔͷతΛ ޠ͍ͬͯΔͱݴ͑Δ͔͠Ε·ͤΜɻΟʔφʔɺੈքʮ੍ޚͰ͖Δੈքʯͱʮ੍ޚͰ͖ͳ͍ੈ քʯʹ͔ͭ͜ͱ͕Ͱ͖Δͱݴ͏͚ΕͲᴷᴷࢲͦͷڥքΛʮΟʔφʔք໘ʯͱ໊͚͍͍ͯͷͰ ͳ͍͔ͱࢥ͍ͬͯΔΜͰ͚͢Ͳ ʢҴݟণʣ -
63 Computing is pop culture lίϯϐϡʔςΟϯάϙοϓΧϧνϟʔͩʜɻϙοϓΧϧνϟʔ ྺ࢙ΛܰΜ͍ͯ͡ΔɻϙοϓΧϧνϟʔɺΞΠσϯςΟςΟͱࣗ ͕ࢀՃ͍ͯ͠ΔΑ͏ʹײ͡Δ͜ͱ͕ͯͩ͢ɻͦΕڠྗաڈ ະདྷͱԿͷؔͳ͍ɻ͓ۚͷͨΊʹίʔυΛॻ͘ਓͷଟ͘ ಉͩ͡ͱࢥ͏ɻ൴ΒʢࣗͨͪͷจԽ͕ʣͲ͔͜Βདྷͨͷ͔ɺ
·ͬͨ͘Θ͔͍ͬͯͳ͍ͷͰ͢zʢ༁ʣ ɹ "MBO,FZɺ%S%PCCT+PVSOBMͷΠϯλϏϡʔʹͯɺ -
64 Πϯλʔωοτٕज़ ※1 Ԭ෦ णஉ, ”Πϯλʔωοτٕज़ͷ࣍ੈݚڀऀҭ”, ৴ֶٕใ, vol. 111, no.
321, IA2011-40, p. 31-32, 2011. l͔ͭͯʮΠϯλʔωοτٕज़ʯֶͰͳ͍ɺ͋Δ͍จ ʹͳΒͳ͍ɺͱݴΘΕ͕ͨ࣌͋ͬͨɻz ※1 -
65 ॻ੶ʮੈքඪ४ͷܦӦཧʯʹΑΔͱɺཧͦͷͷ͕ͳʹ͔ʹ͍ͭ ͯॾઆ͋Δ͕ɺҎԼίϯηϯαε͕΄΅ͱΕ͍ͯΔɺͱͷ͜ͱɻ ཧͱͳʹ͔ ཧͷతHowɺWhenɺWhyʹԠ͑Δ͜ͱ ɾhow: ʮX -> YʯͷΑ͏ͳҼՌؔɻ ɾwhen:
ͦͷཧ͕௨༻͢Δൣғɻ(boundary condition) ɾwhyɿҼՌ͕ؔͳͥͦ͏ͳͷ͔ʹର͢Δઆ໌ɻ ※1 ”The primary goal of a theory is to answer the questions of how, when, and why, unlike the goal of desciption, which is to answer the question of what". (Bacharach, 1989, pp.498) ※2 ೖࢁ ষӫ, ੈքඪ४ͷܦӦཧ, μΠϠϞϯυࣾ, 2019. ※2 ※1 -
66 ɾࢀߟจݙҾ༻ͷཏྻʢwhyʹԠ͍͑ͯͳ͍ʣ ɾσʔλΛهड़͚ͨͩ͠ͷͷʢwhyͱhowʹԠ͍͑ͯͳ͍ʣ ɾ֓೦ͷઆ໌ʢwhyͱhowʹԠ͍͑ͯͳ͍ʣ ɾਤදʢwhyͱhowʹԠ͍͑ͯͳ͍ʣ ɾ໋ԾઆʢwhyʹԠ͍͑ͯͳ͍ʣ ཧͰͳ͍ͷ ϑϨʔϜϫʔΫࣗମ΄΅ͯ͢ཧͰͳ͍ ʢwhatʹԠ͍͑ͯΔʣ ※1
Sutton, Robert I. and Barry M. Staw. “What Theory is Not.” Administrative Science Quarterly 40, p.371, 1995. ※2 ೖࢁ ষӫ, ੈքඪ४ͷܦӦཧ, μΠϠϞϯυࣾ, 2019. ※̍ ※2 -
67 User Uptime ※1 Hauer, et al., “Meaningful Availability”, USENIX
NSDI 2020. ɾγεςϜܥτοϓձٞUSENIX NSDIͰఏҊ͞ΕͨGoogleͷG SuiteͰ༻͍ΒΕ͍ͯΔՄ༻ੑࢦඪ Anika Mukherji, User Uptime in Practice, SREcon, 2021. ※1 ֤Ϣʔβʔͷ࣮ࡍͷuptimeΛجʹɺෳͷ࣌ؒͰՄ༻ੑΛಉ࣌ʹධՁ @SREcon21 ैདྷͷՄ༻ੑࢦඪͰɺΞΫςΟϒϢʔβʔͷภΓɾҰ෦ఀࢭ͕ະߟྀ