Tag

software reliability

1 views collected around this technical thread.

DeWu Technology
DeWu Technology
Mar 17, 2025 · Operations

Stability and Its Significance: Challenges and Practices for Building System Reliability

Building system stability requires quantifying risk through formulas, confronting challenges like low short‑term value and resource competition, and implementing a consensus‑driven framework that sets clear goals, cultivates awareness, enforces safety standards, ensures emergency response, conducts routine inspections, and applies sound architecture governance to continuously reduce inherent and change‑related risks.

System Stabilityoperationsprocess improvement
0 likes · 25 min read
Stability and Its Significance: Challenges and Practices for Building System Reliability
FunTester
FunTester
Sep 19, 2024 · Fundamentals

Software Antifragility: Rethinking Error Handling and Reliability

This paper introduces the concept of software antifragility, drawing on Taleb’s theory to argue that embracing errors through fault tolerance, automatic runtime repair, and fault injection can transform software systems into self‑improving, more robust entities, and discusses implications for development processes and product reliability.

Chaos Engineeringantifragilityfault tolerance
0 likes · 13 min read
Software Antifragility: Rethinking Error Handling and Reliability
Ele.me Technology
Ele.me Technology
May 28, 2024 · Operations

Automated Mock for E2E Testing: Design and Implementation of Unmanned MOCK

Unmanned MOCK automatically generates intelligent, context‑aware mock responses for downstream services in end‑to‑end tests by collecting sub‑call data, extracting knowledge, and applying dynamic rules, so failures in downstream systems are isolated, raising test success rates toward near‑100 % without manual mock configuration.

Continuous IntegrationE2Eautomated testing
0 likes · 12 min read
Automated Mock for E2E Testing: Design and Implementation of Unmanned MOCK
Efficient Ops
Efficient Ops
Mar 25, 2024 · Operations

How CAICT’s SRE Standards Strengthen System Reliability and Continuity

This article outlines the rising frequency of system outages, explains the key characteristics and challenges of modern large‑scale distributed systems, introduces China’s CAICT SRE framework and its two‑part reliability model, showcases a successful SRE case, and announces the 2024 SRE maturity assessment program.

Digital GovernanceSREoperations
0 likes · 12 min read
How CAICT’s SRE Standards Strengthen System Reliability and Continuity
Tencent Cloud Developer
Tencent Cloud Developer
Jan 10, 2024 · Operations

The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'

Building continuously available systems in 2023 is hampered by entropy‑driven technical debt and Murphy’s Law failures, and the “Divine Doctor Paradox” shows that successful availability work goes unnoticed while blame follows any outage, making cultural commitment—not just technology—the essential solution.

High AvailabilityMurphy's LawSRE
0 likes · 14 min read
The Challenges of Building Continuously Available Systems: Entropy, Murphy's Law, and the 'Divine Doctor Paradox'
Bilibili Tech
Bilibili Tech
Jan 5, 2024 · Cloud Native

ChangePilot: Bilibili’s Unified Change Management Platform and Practices

ChangePilot is Bilibili’s unified change‑management platform that standardizes change definition, lifecycle, and risk governance through a platform‑scenario model and five control levels (G0‑G4), offering built‑in checks, searchable records, subscription alerts, intelligent correlation, and emergency channels to boost production stability while maintaining operational efficiency.

Change ManagementSREcloud-native
0 likes · 29 min read
ChangePilot: Bilibili’s Unified Change Management Platform and Practices
JD Tech
JD Tech
Jun 7, 2023 · Operations

Practical Guide to Achieving High Availability in Software Delivery

This article explains the concept of high availability, outlines the challenges of collaborative delivery, architectural design, coding practices, secure release, and deployment operations, and provides concrete steps, process standards, emergency plans, and self‑check tools to ensure reliable, fault‑tolerant software systems.

CollaborationDeploymentHigh Availability
0 likes · 13 min read
Practical Guide to Achieving High Availability in Software Delivery
JD Retail Technology
JD Retail Technology
Mar 16, 2023 · Operations

Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices

This article explains the concept of high availability, outlines the challenges of achieving it in complex software delivery chains, and provides practical guidance on improving collaboration efficiency, establishing process standards, designing robust architecture, implementing disciplined coding, executing safe releases, and maintaining operational safeguards.

CollaborationDeploymentHigh Availability
0 likes · 11 min read
Ensuring High Availability in Software: Collaboration, Architecture, Implementation, and Operational Practices
DeWu Technology
DeWu Technology
Feb 28, 2022 · Operations

DeWu Tech Salon – Quality Assurance Sessions Summary

The DeWu Tech Salon, co‑hosted by DeWu App Quality Platform and TesterHome, brought senior engineers from Alibaba Cloud, ByteDance, Lagou and DeWu together to share practical QA insights on end‑side monitoring, traffic replay, full‑link stress testing, and industry‑scale chaos engineering, while announcing a PPT collection, a testing‑expert recruitment drive, and a preview of the next wireless‑technology salon.

Chaos EngineeringPerformance Monitoringquality assurance
0 likes · 6 min read
DeWu Tech Salon – Quality Assurance Sessions Summary
DevOps
DevOps
May 10, 2021 · Backend Development

Automated Unit Test Generation for Exception Recall in C/C++ Services

This article presents a white‑box, unit‑test‑driven approach for automatically generating C/C++ test cases that detect and recall runtime stability issues, detailing problem analysis, solution design, code‑analysis, test‑data generation, code generation, failure analysis, and deployment results across large‑scale backend modules.

C++Static Analysisfuzzing
0 likes · 19 min read
Automated Unit Test Generation for Exception Recall in C/C++ Services
Baidu Intelligent Testing
Baidu Intelligent Testing
Jan 6, 2021 · Backend Development

Automated Unit Test Generation for Exception Recall in C/C++ Backend Systems

This article presents a comprehensive approach to automatically generate unit tests for C/C++ backend services, leveraging static code analysis, white‑box techniques, and fuzzing to create high‑coverage test cases that proactively detect stability issues without manual effort.

C++Static Analysisfuzzing
0 likes · 20 min read
Automated Unit Test Generation for Exception Recall in C/C++ Backend Systems
DevOps Cloud Academy
DevOps Cloud Academy
Aug 27, 2020 · Cloud Native

Step-by-Step Guide to Building More Reliable Software with Kubernetes and DevOps

This article presents a practical, multi‑stage approach for improving software reliability in Kubernetes‑based microservice environments, covering static analysis, testing pyramids, CI/CD observability, performance testing, deployment strategies, and feedback loops to help engineering teams deliver faster, higher‑quality releases.

CI/CDDevOpsKubernetes
0 likes · 11 min read
Step-by-Step Guide to Building More Reliable Software with Kubernetes and DevOps
360 Tech Engineering
360 Tech Engineering
Jul 11, 2018 · Fundamentals

Static Program Analysis, Gödel’s Incompleteness, and the Halting Problem: Foundations of Software Reliability

This article explains how redundancy and voting schemes improve system reliability, introduces Gödel’s incompleteness and consistency concepts, describes the undecidable halting problem, and outlines static program analysis techniques—including data‑flow, inter‑procedural, pointer analysis, and constraint solving—while discussing practical heuristic rules and tools.

GödelStatic Analysisdecision problems
0 likes · 8 min read
Static Program Analysis, Gödel’s Incompleteness, and the Halting Problem: Foundations of Software Reliability