1 - 20
Next
Online 2. 3KG v2: Universal Electrocardiogram Representations for Label-Efficient Phenotype Discovery [2023]
- Gopal, Bryan (Author)
- November 20, 2023
- Description
- Book
- Summary
-
We propose 3KG v2, a new self-supervised learning method for universal representation learning of electrocardiograms. This method builds upon its predecessor by featuring a new contrastive objective and transformation space. We assess the quality of representations generated by this algorithm by transferring pre- trained models from a large public dataset to various downstream tasks, including some with- out prior clinical association with ECGs. Performance evaluation is conducted in both few- shot and full data settings to account for limited and complete training data availability, respectively. For each task, we perform a linear evaluation to assess the effectiveness of the pretrained representations. For tasks in the full- data setting, we additionally perform a full-fine tuning to determine the performance ceiling of each method in a practical deployment scenario. Our results demonstrate that 3KG v2 consistently outperforms a randomly initialized model trained solely on the target task across all downstream tasks and settings. Specifically, we achieve state-of-the-art performance in few- shot diagnosis of right ventricular function and aortic valve stenosis, two conditions that typically require large amounts of labeled ECGs for effective model training. Moreover, we find that fine-tuning 3KG v2 on our source task’s labels can lead to exceptional transfer capability across a variety of tasks. Notably, our model demonstrates a high level of accuracy in predicting left atrial volume index, achieving a 0.720 C-index even in a few-shot setting. This achievement appears to be unprecedented, as we are not aware of any other ECG model that performs well on this task. While 3KG v2 out- performs our baselines and shows promising results on the majority of phenotype discovery tasks, there is still room for improvement in the absolute performance of any ECG model on many of these complex tasks. Further research is warranted to continue developing and improving such methods for phenotype discovery tasks on ECGs.
- Digital collection
- Undergraduate Theses, School of Engineering
Online 3. A Dangerous Game: China's State Media Perceptions of Strategic Competition with the United States [2023]
- LaRocca, Andrew (Author)
- August 22, 2023; August 21, 2023
- Description
- Book
- Summary
-
The perception of a “rising China” and “declining US” among Western academic and policymaking circles has sparked fears of an emboldened China that views military conflict as beneficial to their interests. This research aims to provide clarity on this issue by analyzing how Chinese state media portrays strategic competition with the US. Drawing from online Chinese-language news articles from media institutions that reflect the Chinese Communist Party’s official party line (People’s Daily, PLA Daily, Global Times, and Xinhua), I examine how these outlets depict economic, technological, military, political, and international competition with the US. This research analyzes these depictions against the backdrop of pivotal moments in 21st century US-China relations: the 2008 Great Recession, the 2012 Power Transition, the 2018 US-China Trade War, and 2020 COVID-19 Pandemic. Taking a random sample of articles from each time period, I assign each article a score ranging from “-1” to “1” to measure tone and calculate the percentage of positive, negative, and neutral stories within each time period and category of strategic competition. Second, I conduct a qualitative analysis of the articles within each random sample to identify general themes and messaging. This research identified a drastic rise in negative coverage of the US across the four time periods and five categories. This research also determined that Chinese state media has propagated a “peak-US” theory. According to this theory, the overall decline of the US has made it a more hostile power that aims to suppress China’s development and resurrect a “New Cold War.” Despite these views of a hostile US, state media narratives urged the US to return to cooperation and avoid war due to a recognition that China benefits more from overall cooperation with the US, especially in terms of economic and technological development.
- Digital collection
- Stanford Center for East Asian Studies Thesis Collection
Online 4. A Living, Controllable Device:The Political Police and Informant Network in Socialist Hungary, 1956-1989 [2023]
- Kisiday, Matyas (Author)
- May 23, 2023; [ca. September 2022 - May 2023]; May 8, 2023
- Description
- Book
- Summary
-
After coming to power in the wake of the infamous Hungarian Revolution of 1956, János Kádár built his regime upon a delicate balance between upholding socialist ideals and appeasing a disgruntled, scarred nation. The slogan that became Kádár’s central doctrine perfectly represented this balance: “those who are not against us, are with us.” Existing literature on Kádárist Hungary portrays the period’s stability as a product of the leader’s compromising personality and political cunning. While accurate, these accounts largely neglect the role of Kádár’s political police as a primary organ of two-way interaction between the party and populace. Internal informational documents from the Ministry of the Interior III. reveal that the mass informant network served as the spearhead of interaction between the party and population in the realm of national security, which was viewed as essential to the building of socialism. The informant network’s transformation alongside the political police from a crude, coercive, and inherently dishonest body into a more focused, sophisticated, and interactive machine reveal much about the implementation of János Kádár’s vision for socialist Hungary. Described as a “living, controllable device,” the political police’s informant network provided Hungarian citizens with an accessible and personally beneficial means of cooperation with the state. The network allowed Hungarians to prove themselves “not against” Kádár’s Party.
- Digital collection
- Undergraduate Honors Theses, Department of History, Stanford University
Online 5. A Mechatronic Solution for Time-Resolved Cryogenic Electron-Microscopy Sample Preparation [2023]
- Di Perna, Maximus (Author)
- May 16, 2023; May 14, 2023
- Description
- Book
- Summary
-
Cryogenic Electron Microscopy (Cryo-EM) is an established method of imaging biomolecules with an electron microscope. The current process of preparing samples for Cryo-EM involves loading the grid into a plunge-freezer and plunging it into liquid ethane to freeze the grid. As the grid falls into the liquid ethane, two samples are mixed and deposited on the grid using a microfluidic spraying device. In order to improve the practicality and repeatability of the plunge-freezer, a mechatronic device was built to simplify the process of replacing the microfluidic spraying device.
- Digital collection
- Undergraduate Theses, Program in Engineering Physics
Online 6. A Novel Application of Theoretical Models to Intimate Partner Violence Survivors’ Help-Seeking [2023]
- Nies, Ashley (Author)
- October 3, 2023; August 31, 2023
- Description
- Book
- Summary
-
Introduction: Intimate partner violence (IPV) survivors’ help-seeking is influenced by a complex interplay of barriers and facilitators. This includes both internal and external factors that affect a survivor’s decision to seek and ability to obtain resources. While understanding these factors is crucial to ensuring appropriate care, there is currently no agreed upon, comprehensive framework for capturing IPV survivors’ help-seeking. Objectives: We seek to answer the question: How can the Three Delays Model (3DM) and the Behavioral Model of Health Services Use (BMHSU), as well as proposed adaptations of these models, provide a framework for understanding barriers and facilitators of IPV survivors’ help-seeking? Methods: This secondary qualitative analysis was performed on transcripts from nine focus groups obtained as part of a larger study on IPV survivors’ help-seeking as influenced by the COVID-19 pandemic. Qualitative codes representing the theoretical constructs of each model and its adaptation were identified and set a priori, and to maintain a deductive coding approach, no new codes were added throughout data analysis. Manual coding of the transcripts was done by the first author. Once all transcripts were coded, code frequency and code co-occurrence analyses were performed to determine how well the different models mapped onto IPV survivors’ experiences. Results: Codes from the BMHSU adaptation were applied most often, and codes from the 3DM adaptation were applied least often. The 3DM captured IPV survivors’ barriers to initially accessing care; however, it did not allow for a nuanced understanding of how the care system affected help-seeking. The BMHSU adaptation provided the richest understanding of IPV survivors’ barriers and facilitators to help-seeking with its expanded inclusion of psychosocial factors. Conclusion: These findings highlight the strengths and limitations of two theoretical models and their adaptations, allowing for a better framework for understanding barriers and facilitators of IPV survivors’ help-seeking. Future research is needed to integrate these models into one comprehensive model for capturing factors contributing to IPV survivors’ help-seeking.
- Digital collection
- Community Health and Prevention Research (CHPR) Master of Science Theses
Online 7. A Precinct-Level Analysis of Latino Voting Behavior During The 2016 And 2020 Presidential Elections [2023]
- Argueta, Allison (Author)
- June 12, 2023; June 2023
- Description
- Book
- Summary
-
The goal of this paper is to understand Latino voting behavior during the 2016 and 2020 presidential elections using precinct-level analysis. The 2016 and 2020 presidential elections were cases of elections where Latinos were demonstrated to have voted for Republican candidates in surprising numbers. Analysis of data from precincts with high proportions of Latino citizen voting age population was utilized to provide a comprehensive description of the Latino electorate and address the ecological inference problem that emerges when attempting to infer voting behavior of electorate subgroups like Latinos using aggregate data. Using precinct-level Latino demographic data created for this analysis, Latinos were found to have voted more Republican in 2020 than in 2016. Additionally, in 2020, Latinos were found to lean more Republican than non-Latinos in precincts with high proportions of Latino citizen voting age population.
- Digital collection
- Stanford University, Department of Economics, Honors Theses
- Liongson, Ivan (Author)
- May 4, 2023
- Description
- Book
- Summary
-
The ability to predict which protein sequences can act as transcriptional activators or repressors is important for understanding the function of human and viral transcription factors (TFs) inside human cells and for building synthetic biology tools for gene control. Here, I integrate multiple high-throughput data sets acquired using a recently developed method (HT-Recruit) that tests hundreds of thousands of protein sequences for their effect on reporter genes in live human cells. I first created a data processing pipeline using ground truth validations to regularize results from multiple HT-Recruit screens, allowing cross-screen comparisons as well as proper model training. After processing these datasets, I built and trained convolutional neural network machine learning models that predict both activation and repression for protein sequences across the human transcription factors. These are the first models to be trained on human TF data, as well as the first to predict repressors. Some protein sequences are bifunctional in that they both activate and repress, so it is important to be able to predict both.
- Digital collection
- Undergraduate Theses, Department of Biology, 2022-2023
- Wang, Kelsey (Author)
- June 24, 2023; June 8, 2023
- Description
- Book
- Summary
-
For much of recent history, the thought of machines being in any semblance within reach of human cognition has been one of imagination. But now, technology has speedily caught up to these aspirations, and society faces the daunting task of accepting, learning, and utilizing such tools. Specifically in the lens of design and architecture, what needs to be done at this point is to (1) reflect on how technology currently plays a role within it, and (2) determine how technology should continue to impact it. In this thesis, I will survey artificial intelligence applications dedicated to image generation and explore how they serve the purposes of architectural design. In particular, I will utilize recently popular text-to-image applications, such as Midjourney alongside existing GANs. Given these current various outputs available from machine learning tools, there is yet no current framework or methodology defined for architects to utilize these tools to manifest a 3D structure out of them. I will propose a framework for how architects can use various generative architectural outputs, namely massing and programming, to realize an actual 3D structure that serves as inspiration in the early design stages. However, besides being a guideline for individuals who wish to use these tools, ultimately this framework could also be referenced as a preliminary model for those working on the continuous improvement of 3D machine learning algorithms. After all, as the design process inevitably becomes more digitized, it is important to maintain a humanistic and an authentically architectural approach within the algorithms that oversee our design outputs.
- Digital collection
- Undergraduate Theses, School of Engineering
Online 10. A Representative Role for the Alternative Splicing of Synaptic Genes [2023]
- Choeb, Reyan (Author)
- May 4, 2023
- Description
- Book
- Summary
-
Alternative splicing enables the differential expression of multiple mRNA transcripts and multiple functionally unique protein isoforms derived from the same gene. Interestingly, genes encoding synaptic regulators are both alternatively spliced and implicated in the development of several neuropsychiatric disorders including autism spectrum disorders, schizophrenia, and Tourette’s syndrome. Mechanisms by which synapses are formed and dynamically regulated remain unclear, but the alternative splicing of trans-synaptic regulators is thought to play a decisive role in mediating neuronal communication. To deduce a representative model for neuronal alternative splicing in the making and shaping of synapses (as reflected in changes to synaptic RNA and protein levels), Khdrbs (Sam68, Slm1, Slm2) and Nova splice factors (Nova1, Nova2) were virally overexpressed in primary neurons cultured from neonatal mice. In-vitro validation of RNA-seq-reported changes in synaptic splicing demonstrated that Khdrbs factors exclusively regulate the alternative splicing of multiple Neurexin (Nrxn) homologs, cell-adhesion molecules crucial for the development of functional synapses. Additionally, immunoblot analysis revealed a strikingly consistent loss of key synaptic proteins, coupled with decreased expression of astrocytic markers, in Slm1-overexpressed cultures, suggesting a splice factor-specific role in maintaining tripartite synapses by which glial contributions are likely paramount. In short, the experiments performed here capture the discrete effects of neuronal alternative splicing in the regulation of core synaptic components and offer insight into the molecular bases underpinning a broad range of animal behaviors.
- Digital collection
- Undergraduate Theses, Department of Biology, 2022-2023
Online 11. A Socio-hydrological Framework to Assess Rate Design for Urban Water Affordability through Drought [2023]
- Nayak, Adam (Author)
- May 23, 2023; May 22, 2023; May 22, 2023
- Description
- Book
- Summary
-
Unaffordable water threatens water access in the United States, particularly for low-income households that struggle to pay the rising cost for water. In water-scarce cities, water shortages necessitate either expensive infrastructure development or costly emergency measures to meet demand, which in turn increase household water costs. Rate design plays a key role in determining whether these costs threaten water affordability for low-income households, but water utilities are often constrained by local and state policy in their ability to set progressive rates. Therefore, new approaches to design rates that optimize water affordability within the local legal and hydrological context are needed in drought-prone regions. To address this gap, we design a socio-hydrological modeling framework that fuses legal analysis, behavioral economics, and hydrologic modeling to assess the impacts of rate design on household water affordability. We demonstrate this framework in an illustrative application in Santa Cruz, California, where droughts threaten water supplies and California Proposition 218 deters public water utilities in setting progressive rate design. Initial results demonstrate that flat drought surcharges reduce affordability, particularly under increasing block rate tariffs. This framework can both help utilities design rates to improve water affordability in their socio-hydrological context and also illuminate the impacts of state policy on affordability outcomes.
- Digital collection
- Undergraduate Theses, School of Engineering
Online 12. A Syringe Tumbler for Ink Resuspension (STIR) [2023]
- Sanabria, Coco (Author)
- July 1, 2023; May 2023
- Description
- Book
- Summary
-
Embedded 3D bioprinting is advancing the field of regenerative medicine through the ability to create patient-specific tissues and organs that can fulfill the organ donor shortage. However, the process of 3D bioprinting is currently limited by cell settling within bioinks since printing of human-scale organs can take hours–or even days–to complete. To address the problem of cell settling, I designed and developed Syringe Tumbler for Ink Resuspension (STIR), a mixing system with a Zero-Dead-Volume (ZDV) magnetic tumbler that can prevent settling and maintain high cell viability throughout long-duration prints. In this thesis, cell settling at various mixing speeds and patterns are characterized by mixing of fluorescent beads that resemble embryoid bodies–a particularly challenging object for cellular mixing due to their large size. True cellular mixing is further characterized by the use of fibroblast cells. The effects of mixing are analyzed through cell concentration analysis and staining to ascertain cell viability while using the device. From the system characterization, I identified back-and-forth mixing as the most consistent method as it allows for improved ink homogeneity at lower mixing speeds, thus reducing the shear stress induced on cells. More specifically, by defining the degree of mixing through thresholding in image analysis, the back-and-forth mixing improved the homogeneity of a 10 mg/mL fibrinogen ink embedded with fluorescent beads by 33.68% over a 15.6 second window, whereas settling reduced the ink homogeneity by 48.84% over the same time frame. Similarly, for long-term cellular prints with and without STIR, a reduced variance in cell concentration was experienced. This suggests that STIR can yield more predictable tissue compositions for both short-duration and long-duration prints. To test the effects of tissue composition, I designed a tissue in the form of a flower, where each petal was to be printed with a cellular layer. The flower-shaped tissue was successfully printed without cells, and two leaflets were printed with fibroblast cells. Evidence of compaction was detected for the cellular constructs, highlighting the first successful tissue created with STIR. Future works will include repeating the tests developed in this thesis in order to confirm my initial findings.
- Digital collection
- Undergraduate Theses, School of Engineering
Online 13. A Systematic Analysis of Model Sensitivity: Investigating the Effect of Wildfire Smoke PM2.5 on Mortality [2023]
- Kaplan, Jordan (Author)
- June 2, 2023; June 1, 2023
- Description
- Book
- Summary
-
Wildfire smoke-related fine particulate matter pollution (PM2.5) is a health hazard known to affect human health and has made up an increasingly large burden of overall PM2.5 pollution in the US over the past two decades. Previous literature studying PM2.5 and mortality has primarily relied on cohort studies and daily time series, with a smaller subset utilizing annually aggregated time series data. We identified several papers that used coarser, ecological data and two-way fixed effects (TWFE) models to study the association between PM2.5 and mortality. Within that literature, there are a variety of different preprocessing and modeling choices. These “researcher degrees of freedom” can lead to a “garden of forking paths”, in which reasonable a priori decisions can produce qualitatively different research findings. To investigate the potential sensitivity of these researcher decisions, we utilized county-month level all-cause mortality data from the CDC’s Multiple Cause of Death files along with new smoke PM2.5 estimates to systematically vary parameters within the TWFE modeling approach used by previous studies. We tested 75 models, 56 of which had negative point estimates. 59 of the models reported statistically significant results when using IID-based standard errors (SEs) while 8 were statistically significant when using robust SEs. The results of our systematic analysis suggest that applying TWFE models to non-daily time series data produces estimates that are minimally robust to different specifications and too wide to detect an effect when robust SEs are used. Based on our replication of Ma et al. 2023 and comparison of the `gnm` and `fixest` packages in R, we believe it is likely that studies finding significant associations between PM2.5 and mortality using non-daily data rely on the IID assumption, and that their confidence intervals may be insufficiently conservative. We recommend that future research studies in this area clearly define the assumptions behind their modeling choices, carefully choose appropriately conservative standard errors, and publish their code to maximize replicability and transparency.
- Digital collection
- Epidemiology Clinical Research Masters Theses
Online 14. The absent flesh of law : legal bodies and juridical choreographies [2023]
- Kimmel, Anna Jayne, author.
- [Stanford, California] : [Stanford University], 2023.
- Description
- Book — 1 online resource.
- Summary
-
The Absent Flesh of Law: Legal Bodies and Juridical Choreographies sutures dance studies—kinesthetic awareness, somatic memory, and performative potential—to the field of the legal humanities. This represents an epistemic shift that aims to resist the logocentric norms of knowing that reproduce colonial hierarchies. The dissertation simultaneously recuperates the agency of constituent power, especially in moments of public assembly. Specifically, it frames constitutional text as corporeal by exposing the body's aporetic disappearance: from French legal codes of the 17th century that structured race, to the contemporary elision of the freedom of assembly, to human rights discourse on bodily integrity. Through this disciplinary crossing, it presents a decolonial orientation to the legal subject, animated with all of its humanisms. Separate but entangled images of protesting ballerinas serve as the motivator for each chapter. Their internationally prolific circulation across publics enforces a cross-cultural method of comparison as the events of the image shuttle between the United States, Algeria, and France—nations linked by racial legacies and postcolonial histories. As such, this project draws upon Francophone and Arab material and furthers research in a comparative race studies. Attending to contemporary Black Lives Matter marches in the United States, demonstrations of the Hirak movement in Algeria, and pension protests in Paris—referenced in each photograph—the protesting ballerina grounds this framework to reveal themes of the disciplined body that is both compliant (to a technique) and resistant (to legal expectations). Each of these images begins well into the 21st-century yet points backward in time to more complex histories in which the body was obscured under law: 1) of confederate legacies constructed in the United States that propagate a racialized history, 2) of imperial rule in France during which ballet served state opulence and secured whiteness as property, and 3) of decolonial hope and postcolonial violence in Algeria. The final chapter opens toward international human rights discourse to call for renewed attention to bodily integrity, beyond a priori concepts of dignity that are circumscribed by a Western aesthetic tradition. Theoretically grounded in histories of the archive and the ephemerality of performance, this research draws upon interdisciplinary methods to communicate to scholars in both the legal humanities and in performance and dance studies. I supplement archival material—of police reports, juridical documents, and constitutional revision—with embodied perspectives learned from more than twenty years as a disciplined body in a dance studio. The tension of this pairing allows me to articulate what is lost when the law presumes linguistic form entirely, while the project's geopolitical triangulation reflects a commitment to postcolonial theory, francophone culture, and comparative race studies. As such, it dares to bring together otherwise disparate interlocuters, forcing a reconsideration of entrenched socio-political hierarchies of discipline.
- Also online at
-
Online 15. Abstractions for efficient and reliable serverless computing [2023]
- Li, Qian (Researcher in computer science) author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
Serverless, also known as function-as-a-service (FaaS), is an increasingly important paradigm in cloud computing. Developers register functions to a managed FaaS platform to serve user requests without the need to maintain their own servers. FaaS abstracts away the complexity of managing infrastructure, offers high availability, and automatically scales. However, today's FaaS platforms are often inefficient and unreliable, leaving developers with several complex application management challenges. Specifically, there are three key challenges: (1) minimizing cost while maintaining performance under varying load, (2) providing strong fault-tolerance guarantees in the presence of failures, and (3) improving debuggability and observability for distributed ephemeral functions. In this dissertation, we describe three new abstractions and build three systems to enhance the cost-efficiency, reliability, and debuggability of FaaS applications. We focus on two important categories of FaaS applications: compute-intensive, such as image recognition services, and data-centric, such as e-commerce web services. First, we address the challenge of cost efficiency for ML inference serving, a growing category of compute-intensive tasks. In particular, we tackle the key question of how to automatically configure and manage resources and models to minimize cost while maintaining high performance under unpredictable loads. Existing platforms usually require developers to manually search through thousands of model-variants, incurring significant costs. Therefore, we propose INFaaS, an automated model-less system where developers can easily specify performance and accuracy requirements without the need to specify a specific model-variant for each query. INFaaS generates model-variants from already trained models and efficiently navigates the large trade-off space of model-variants on behalf of developers to achieve application-specific objectives. By leveraging heterogeneous compute resources and efficient resource sharing, INFaaS guarantees application requirements while minimizing costs. Second, we address the challenge of providing fault tolerance while achieving high performance for data-centric applications. Existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in interactive transactions accessing remote databases. Physical separation harms performance, and logical separation complicates efficiently providing fault tolerance. To solve this issue, we propose Apiary, a high-performance database-integrated FaaS platform for deploying and composing fault-tolerant transactional functions. Apiary wraps a distributed database engine and uses it as a unified runtime for function execution, data management, and operational logging. By physically co-locating and logically integrating function execution and data management, Apiary delivers similar or stronger transactional guarantees as comparable systems while significantly improving performance, cost, and observability. Finally, we delve into the challenge of debugging distributed data-centric applications. These applications are hard to debug because they share data across many concurrent requests. Currently, developers need to unravel the complex interactions of thousands of concurrent events to reproduce and fix bugs. To make debugging easier, we extend the tight integration between compute and data in Apiary and explore the synergy between the way people develop and debug their database-backed applications. We propose R^3, a "time travel" tool for data-centric FaaS applications that access shared data through transactions. R^3 allows for faithful replay of past executions in a controlled environment and retroactively execution of modified code on past events, making applications easier to maintain and debug. By recording concurrency information at transaction-level granularity, R^3 enables practical time travel with minimal overhead and supports most production DBMSs. We demonstrate how R^3 simplifies debugging for real, hard-to-reproduce concurrency bugs from popular open-source web applications
- Also online at
-
Online 16. Abstractions for scaling stateful cloud applications [2023]
- Kraft, Peter (Researcher in computer science) author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
As the scale of both computing and data grows, developers are increasingly building distributed stateful systems in the cloud. However, these systems are challenging to build at scale because they must provide fault tolerance and consistency for stateful computations while managing both compute and data resources. Thus, we need new high-level abstractions that hide the complexity of distributed state management from developers. This dissertation proposes three such abstractions at multiple levels of the stack of a stateful cloud application. The first part of this dissertation targets cloud application developers, proposing Apiary, a database-oriented transactional function-as-a-service (FaaS) platform for stateful cloud applications. FaaS is an increasingly popular programming model because it abstracts away resource management concerns and reduces the complexity of cloud deployment, but existing FaaS platforms struggle to efficiently or reliably serve stateful applications. Apiary solves this problem by tightly integrating function execution with data management, improving FaaS performance on stateful applications by 2-68x while providing fault tolerance and strong transactional guarantees. The second part of this dissertation targets developers of the data management systems on which stateful cloud apps depend, proposing data-parallel actors (DPA), a framework for scaling data management systems. DPA targets an increasingly important class of data management systems called query serving systems, which are characterized by data-parallel, low-latency computations and frequent bulk data updates. DPA allows developers to construct query serving systems from purely single-node components while automatically providing critical properties such as data replication, fault tolerance, and update consistency. We use DPA to build a new query serving system, a simplified data warehouse based on MonetDB, and port existing ones, such as Druid, Solr, and MongoDB, enhancing them with new features such as a novel parallelism-optimizing data placement policy that improves query tail latency by 7-64%. The third part of this dissertation targets application developers utilizing multiple data management systems, proposing Epoxy, a protocol for providing ACID transactions across diverse data stores. Such applications are increasingly common because developers often use multiple data stores to manage heterogeneous data, for example doing transaction processing in Postgres and text search in Elasticsearch while storing image data in a cloud object store like AWS S3. To provide transactional guarantees for these applications, Epoxy adapts multi-version concurrency control to a cross-data store setting. We implement Epoxy for five data stores: Postgres, Elasticsearch, MongoDB, Google Cloud Storage, and MySQL, finding it outperforms existing distributed transaction protocols like XA while providing stronger guarantees and supporting more systems
- Also online at
-
Online 17. Accelerating machine learning algorithms with adaptive sampling [2023]
- Tiwari, Mohit, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes, existing techniques subsample the data or use other methods to improve computational efficiency, at the expense of incurring some approximation error. This thesis demonstrates that it is often sufficient, instead, to substitute computationally intensive subroutines with a special kind of randomized counterparts that results in almost no degradation in quality. The results in this thesis are based on techniques from the adaptive sampling literature. Chapter 1 begins with an introduction to a specific adaptive sampling problem: that of best-arm identification in multi-armed bandits. We first provide a formal description of the setting and the best-arm identification problem. We then present a general algorithm, called successive elimination, for solving the best-arm identification problem. The techniques developed in Chapter 1 will be applied to different problems in 2, 3, and 4. In Chapter 2, we discuss an how the k-medoids clustering problem can be reduced to a sequence of best-arm identification problems. We use this observation to present a new algorithm, based on successive elimination, that matches the prior state-of-the-art in clustering quality but reaches the same solutions much faster. Our algorithm achieves an n/logn reduction in sample complexity over prior state-of-the-art, where n is the size of the dataset, under general assumptions over the data generating distribution. In Chapter 3, we analyze the problem of training tree-based models. The majority of the training time for such models is in splitting each node of the tree, i.e., determining the feature and corresponding threshold at which to split each node. We show that the node-splitting subroutine can be reduced to a best-arm identification problem and present a state-of-the-art algorithm for training trees. Our algorithm depends only on the relative quality of each possible split, rather than explicitly depending on the size of the training dataset, and reduces the explicit dependence on dataset size n from O(n), for the most commonly-used prior algorithm, to O(1). Our algorithm applies generally to many tree-based models, such as Random Forests and XGBoost. In Chapter 4, we study the Maximum Inner Product Search problem. We observe that, as with the k-medoids and node-splitting problems, the Maximum Inner Product Search problem can be reduced to a best-arm identification problem. Armed with this observation, we present a novel algorithm for the Maximum Inner Product Search problem in high dimensions. Our algorithm reduces the explicit scaling with d, the dimensionality of the dataset, O(sqrt(d)) to O(1) under reasonable assumptions on the data. Our algorithm has several advantages: it requires no preprocessing of the data, naturally deals with the addition or removal of new datapoints, and includes a hyperparameter to trade off accuracy and efficiency. Chapter 5 concludes this thesis with a summary of its contributions and possible directions for future work
- Also online at
-
Online 18. Accountability or Appeasement? An Exploration of the World Bank's Inspection Panel [2023]
- Crooks, Gabrielle (Author)
- July 19, 2023; [ca. March 2022 - May 10, 2023]
- Description
- Book
- Summary
-
The World Bank’s Inspection Panel was the first accountability mechanism of its kind at an international finance institution. The Panel was created in response to civil society protests regarding the human rights violations that were increasing in frequency because of Bank-funded projects. Since its inception in 1993, the Inspection Panel has received over 160 complaints citing concerns about project implementation and design. The central question of this thesis surfaced when it became apparent that, despite the Panel’s promises to address the concerns of communities adversely affected by Bank-funded projects, the Panel has taken decisive action in a concerningly low number of cases. There is a clear gap between the Panel’s mandate and the actions that it took. But why? As such, the question emerged: What determines how cases move through the World Bank’s Accountability Mechanism complaint process? Through a combination of case studies and Panel data, supplemented by interviews with Accountability Counsel staff members and a former Inspection Panel member, I argue that both time and civil society organization (CSO) involvement were significant factors that influenced a complaint’s mobility through the Panel’s process. The thesis found that, while these factors could play a role, ultimately, it boils down to the whims of the World Bank’s Management, who are heavily involved in the Inspection Panel process despite the Panel’s design as an independent body. Management’s overt involvement in the Panel’s process severely undermines the ability of the Panel to practice its mandate and blurs the necessary lines of separation between the Panel and the Bank. This major finding raises concerns on the practice of accountability at these institutions and highlights the need for the Panel, and the Bank, to amend the ways in which communities can seek redress within its walls. Evidently, the Inspection Panel still has a long way to go in the way of meeting its original mandate.
- Digital collection
- Stanford University, Fisher Family Honors Program in Democracy, Development, and the Rule of Law
- Liu, Jingxiao, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
The objective of this research is to achieve accurate and scalable bridge health monitoring (BHM) by learning, integrating, and generalizing the monitoring models derived from drive-by vehicle vibrations. Early diagnosis of bridge damage through BHM is crucial for preventing more severe damage and collapses that could lead to significant economic and human losses. Conventional BHM approaches require installing sensors directly on bridges, which are expensive, inefficient, and difficult to scale up. To address these limitations, this research uses vehicle vibration data when the vehicle passes over the bridge to infer bridge conditions. This drive-by BHM approach builds on the intuition that the recorded vehicle vibrations carry information about the vehicle-bridge interaction (VBI) and thus can indirectly inform us of the dynamic characteristics of the bridge. Advantages of this approach include the ability for each vehicle to monitor multiple bridges economically and eliminating the need for on-site maintenance of sensors and equipment on bridges. Though the drive-by BHM approach has the above benefits, implementing it in practice presents challenges due to its indirect measurement nature. In particular, this research tackles three key challenges: 1) Complex vehicle-bridge interaction. The VBI system is a complex interaction system, making mathematical modeling difficult. The analysis of vehicle vibration data to extract the desired bridge information is challenging because the data have complex noise conditions as well as many uncertainties involved. 2) Limited temporal information. The drive-by vehicle vibration data contains limited temporal information at each coordinate on the bridge, which consequently restricts the drive-by BHM's capacity to deliver fine-grained spatiotemporal assessments of the bridge's condition. 3) Heterogeneous bridge properties. The damage diagnostic model learned from vehicle vibration data collected from one bridge is hard to generalize to other bridges because bridge properties are heterogeneous. Moreover, the multi-task nature of damage diagnosis, such as detection, localization, and quantification, exacerbates the system heterogeneity issue. To address the complex vehicle-bridge interaction challenge, this research learns the BHM model through non-linear dimensionality reduction based on the insights we gained by formulating the VBI system. Many existing physics-based formulations make assumptions (e.g., ignoring non-linear dynamic terms) to simplify the drive-by BHM problem, which is inaccurate for damage diagnosis in practice. Data-driven approaches are recently introduced, but they use black-box models, which lack physical interpretation and require lots of labeled data for model training. To this end, I first characterize the non-linear relationship between bridge damage and vehicle vibrations through a new VBI formulation. This new formulation provides us with key insights to model the vehicle vibration features in a non-linear way and consider the high-frequency interactions between the bridge and vehicle dynamics. Moreover, analyzing the high-dimensional vehicle vibration response is difficult and computationally expensive because of the curse of dimensionality. Hence, I develop an algorithm to learn the low-dimensional feature embedding, also referred to as manifold, of vehicle vibration data through a non-linear and non-convex dimensionality reduction technique called stacked autoencoders. This approach provides informative features for achieving damage estimation with limited labeled data. To address the limited temporal information challenge, this research integrates multiple sensing modalities to provide complementary information about bridge health. The approach utilizes vibrations collected from both drive-by vehicles and pre-existing telecommunication (telecom) fiber-optic cables running through the bridge. In particular, my approach uses telecom fiber-optic cables as distributed acoustic sensors to continuously collect bridge dynamic strain responses at fixed locations. In addition, drive-by vehicle vibrations capture the input loading information to the bridge with a high spatial resolution. Due to extensively installed telecom fiber cables on bridges, the telecom cable-based approach also does not require on-site sensor installation and maintenance. A physics-informed system identification method is developed to estimate the bridge's natural frequencies, strain and displacement mode shapes using telecom cable responses. This method models strain mode shapes based on parametric mode shape functions derived from theoretical bridge dynamics. Moreover, I am developing a sensor fusion approach that reconstructs the dynamic responses of the bridge by modeling the vehicle-bridge-fiber interaction system that considers the drive-by vehicle and telecommunication fiber vibrations as the system input and output, respectively. To address the heterogeneous bridge properties challenge, this research generalizes the monitoring model for one bridge to monitor other bridges through a hierarchical model transfer approach. This approach learns a new manifold (or feature space) of vehicle vibration data collected from multiple bridges so that the features transferred to such manifold are sensitive to damage and invariant across multiple bridges. Specifically, the feature is modeled through domain adversarial learning that simultaneously maximizes the damage diagnosis performance for the bridge with available labeled data while minimizing the performance of classifying which bridge (including those with and without labeled data) the data came from. Moreover, to learn multiple diagnostic tasks (including damage detection, localization, and quantification) that have distinct learning difficulties, the framework formulates a feature hierarchy that allocates more learning resources to learn tasks that are hard to learn, in order to improve learning performance with limited data. A new generalization risk bound is derived to provide the theoretical foundation and insights for developing the learning algorithm and efficient optimization strategy. This approach allows a multi-task damage diagnosis model developed using labeled data from one bridge to be used for other bridges without requiring training data labels from those bridges. Overall, this research offers a new approach that can achieve accurate and scalable BHM by learning, integrating, and generalizing monitoring models learned from drive-by vehicle vibrations. The approach enables low-cost and efficient diagnosis of bridge damage before it poses a threat to the public, which could avoid the enormous loss of human lives and property
- Also online at
-
Online 20. Achieving order with two-photon lithography : colloidal self-assembly and direct laser writing [2023]
- Doan, David, author.
- [Stanford, California] : [Stanford University], 2023
- Description
- Book — 1 online resource
- Summary
-
Structural or spatial order at the nanometer/micron regime is an avenue to improve material properties. The field of photonics and metamaterials have shown that size-effects at these regimes, in combination with purposefully designed architected structures, can enhance mechanical and optical performance. A common approach to achieve these types of ordered structures is through colloidal self-assembly or direct laser writing of 3D structures. In this work, I propose using direct laser writing to fabricate colloidal particles and to fabricate complex 3D structures that have enhanced mechanical properties. In the first part of my work, I focus on colloidal self-assembly as a method to achieve order. Due to the limited chemistries and shapes of colloids available to self-assemble, a large majority of self-assembled structures remain elusive. I propose using two-photon lithography to fabricate micron-sized particles and experimentally study the effect of shape (both concave and convex) on the final self-assembled structure. This method allows for highly monodisperse fabrication of colloidal particles which can then be imaged using optical techniques due to their micron size. I fabricate colloidal conical shapes that self-assemble under entropic conditions (depletants) and tune the degree of assembly by changing the effective driving force through size. I then use a custom machine learning framework to identify these assembled structures (columnar grains) and recover self-assembly trends in which larger particles show a higher degree of self-assembly. Building upon this work, convex particles, specifically the Archimedean truncated tetrahedron, are also fabricated using the same framework and studied under another entropic condition (hard-particle interaction). These particles assemble in a six-fold symmetry upon interaction with an interface and transition to a three-fold symmetry upon application of a potential field. Analytical and computational results show that the six-fold symmetry state is a quasi-stable state and upon additional energy input, a transition occurs to achieve the lower energy state. In the second part of my work, I use two-photon lithography in conjunction with nanoclusters to enhance the direct laser writing process and improve the mechanical properties. I fabricate lattices with micron sized features and test them mechanically. The resulting nanocomposite lattices shows high stiffness and best-of-class energy absorbance by suppressing layer by layer collapse that is commonly seen with these types of structures
- Also online at
-