Shaswot Shresthamali

I am a Project Assistant Professor (特任助教) in Kondo Laboratory at the Faculty of Science and Technology, Keio University. I am currently involved in projects related to Computer Architecture and Quantum Computing with Prof. Masaaki Kondo (近藤 正章).

I received my doctorate degree from the Graduate School of Information Science and Technology, The University of Tokyo under the supervision of Prof. Hiroshi Nakmaura (中村 宏).

You can find my CV here.

I can be reached at shaswot[AT]acsl.ics.keio.ac.jp

GitHub  /  Google Scholar  /  LinkedIn

profile photo

Research

I am currently researching on novel techniques and architectures to accelerate Deep Neural Networks (DNN) by leveraging reduced mixed precision. I am also a part of the QC-HPC Group of the SQAI project where I am researching on integration of High Performance Computing (HPC) systems with Quantum Computers (QC).

My doctoral thesis is about using RL-based methods for energy scheduling in Energy Harvesting Wireless Sensor Nodes. The focus is on applied RL and its relation to neural network function approximation, off-policy learning and distributed learning.



Publications

project image

DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN Training and Inference (to be published)


Lorenzo Sonnino, "Shaswot Shresthamali, Yuan He, and Masaaki Kondo"
2024 Design, Automation and Test in Europe Conference (DATE 2024), 2024
link / paper /

DNNs are widely used but face significant computational costs due to matrix multiplications, especially from data movement between the memory and processing units. One promising approach is therefore Processing-in-Memory as it greatly reduces this overhead. However, most PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations that have significant performance overhead and scalability issues. Our work proposes an in-SRAM digital multiplier, that uses a conventional memory to perform bit-parallel computations, leveraging multiple wordlines activation. We then introduce DAISM, an architecture leveraging this multiplier, which achieves up to two orders of magnitude higher area efficiency compared to the SOTA counterparts, with competitive energy efficiency.

project image

Enhancing Deep Reinforcement Learning with Compressed Sensing-based State Estimation


Shaswot Shresthamali, and Masaaki Kondo
16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC (2023), 2023
link / slides / paper /

In various real-world applications, sensor data collected for adaptive control using Reinforcement Learning (RL) often suffer from missing information due to sensor failures, data transmission errors, or other sources of noise. Such missing data can significantly hinder the agent’s ability to make informed decisions and degrade performance. In this paper, we propose a novel approach to address this challenge by leveraging Compressed Sensing (CS) techniques to recover missing information from the sensor data and reconstruct the state observation. The reconstructed state is then fed to the RL agents. As a result, they exhibit enhanced robustness and intelligence, surpassing the performance achievable when solely presented with noisy data as state input.

project image

Quantum Circuit Fidelity Improvement with Long Short-Term Memory Networks


Yikai Mao, Shaswot Shresthamali, Masaaki Kondo
arXiv, 2023
link /

Quantum computing has entered the Noisy Intermediate-Scale Quantum (NISQ) era. Currently, the quantum processors we have are sensitive to environmental variables like radiation and temperature, thus producing noisy outputs. Although many proposed algorithms and applications exist for NISQ processors, we still face uncertainties when interpreting their noisy results. Specifically, how much confidence do we have in the quantum states we are picking as the output? This confidence is important since a NISQ computer will output a probability distribution of its qubit measurements, and it is sometimes hard to distinguish whether the distribution represents meaningful computation or just random noise. This paper presents a novel approach to attack this problem by framing quantum circuit fidelity prediction as a Time Series Forecasting problem, therefore making it possible to utilize the power of Long Short-Term Memory (LSTM) neural networks. A complete workflow to build the training circuit dataset and LSTM architecture is introduced, including an intuitive method of calculating the quantum circuit fidelity. The trained LSTM system, Q-fid, can predict the output fidelity of a quantum circuit running on a specific processor, without the need for any separate input of hardware calibration data or gate error rates. Evaluated on the QASMbench NISQ benchmark suite, Q-fid’s prediction achieves an average RMSE of 0.0515, up to 24.7x more accurate than the default Qiskit transpile tool mapomatic. When used to find the high-fidelity circuit layouts from the available circuit transpilations, Q-fid predicts the fidelity for the top 10% layouts with an average RMSE of 0.0252, up to 32.8x more accurate than mapomatic.

project image

局所グラフ情報を用いた強化学習によるAGVの経路スケジューリング手法の検討 (A study of reinforcement learning-based AGV route scheduling using local graph information)


杉本 寛直, シュレスタマリ サソット, 近藤 正章
研究報告組込みシステム(EMB), 2023-EMB-62(31), 1-6, 2023
link /

本稿では,複数台の無人搬送車 (AGV) を実環境で使用する際に必要な経路計画を強化学習により行う手法を検討する.特に,AGV 間の行動や経路情報となる局所グラフの情報を利用し,かつ入力次元の抑制のため各 AGV の行動可能範囲に限定して抽出したノード情報を用いた強化学習によるスケジューリング手法を提案する.本手法により,全ノードの情報を利用するよりも実環境に近い問題設定においてタスク処理のスループット向上を期待できることがわかった.

project image

FAWS: Fault-Aware Weight Scheduler for DNN Computations in Heterogeneous and Faulty Hardware


Shaswot Shresthamali, Yuan He, Masaaki Kondo
2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications (ISPA), 2022
link / video / slides / paper /

The idea of using inexact computation for overprovisioned DNNs (Deep Neural Networks) to decrease power and la-tency at the cost of minor accuracy degradation has become very popular. However, there is still no general method to schedule DNN computations on a given hardware platform to effectively implement this idea without loss in computational efficiency. Most contemporary methods require specialized hardware, ex-tensive retraining and hardware-specific scheduling schemes. We present FAWS: Fault-Aware Weight Scheduler for scheduling DNN computations in heterogeneous and faulty hardware. Given a trained DNN model and a hardware fault profile, our scheduler is able to recover significant accuracy during inference even at high fault rates. FAWS schedules the computations such that the low priority ones are allocated to inexact hardware. This is achieved by shuffling (exchanging) the rows of the matrices. The best shuffling order for a given DNN model and hardware fault profile is determined using Genetic Algorithms (GA). We simulate bitwise errors on different model architectures and datasets with different types of fault profiles and observe that FAWS can recover up to 30% of classification accuracy even at high fault rates (whichcorrespond to approximately 50% power savings).

project image

Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning


Shaswot Shresthamali, Masaaki Kondo, and Hiroshi Nakamura
Journal of Low Power Electronics and Applications 12.4 (2022): 53, 2022
link / paper /

IoT embedded systems have multiple objectives that need to be maximized simultaneously. These objectives conflict with each other due to limited resources and tradeoffs that need to be made. This requires multi-objective optimization (MOO) and multiple Pareto-optimal solutions are possible. In such a case, tradeoffs are made w.r.t. a user-defined preference. This work presents a general Multi-objective Reinforcement Learning (MORL) framework for MOO of IoT embedded systems. This framework comprises a general Multi-objective Markov Decision Process (MOMDP) formulation and two novel low-compute MORL algorithms. The algorithms learn policies to tradeoff between multiple objectives using a single preference parameter. We take the energy scheduling problem in general Energy Harvesting Wireless Sensor Nodes (EHWSNs) as a case example in which a sensor node is required to maximize its sensing rate, and transmission performance as well as ensure long-term uninterrupted operation within a very tight energy budget. We simulate single-task and dual-task EHWSN systems to evaluate our framework. The results demonstrate that our MORL algorithms can learn better policies at lower learning costs and successfully tradeoff between multiple objectives at runtime.

project image

Multi-objective Reinforcement Learning for Energy Harvesting Wireless Sensor Nodes


Shaswot Shresthamali, Masaaki Kondo, and Hiroshi Nakamura
14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, MCSoC (2021), 2021
link / video / slides / paper /

Modern Energy Harvesting Wireless Sensor Nodes (EHWSNs) need to intelligently allocate their limited and unreliable energy budget among multiple tasks to ensure long-term uninterrupted operation. Traditional solutions are ill-equipped to deal with multiple objectives and execute a posteriori tradeoffs. We propose a general Multi-objective Reinforcement Learning (MORL) framework for Energy Neutral Operation (ENO) of EHWSNs. Our proposed framework consists of a novel Multi-objective Markov Decision Process (MOMDP) formulation and two novel MORL algorithms. Using our framework, EHWSNs can learn policies to maximize multiple task-objectives and perform dynamic runtime tradeoffs. The high computation and learning costs, usually associated with powerful MORL algorithms, can be avoided by using our comparatively less resource-intensive MORL algorithms. We evaluate our framework on a general single-task and dual-task EHWSN system model through simulations and show that our MORL algorithms can successfully tradeoff between multiple objectives at runtime.

project image

Power Management of Wireless Sensor Nodes with Coordinated Distributed Reinforcement Learning


Shaswot Shresthamali, Masaaki Kondo, and Hiroshi Nakamura
37th IEEE International Conference on Computer Design, ICCD (2019), 2019
link / video / slides / paper /

Energy Harvesting Wireless Sensor Nodes (EHWSNs) require adaptive energy management policies for uninterrupted perpetual operation in their physical environments. Contemporary online Reinforcement Learning (RL) solutions take an unrealistically long time exploring the environment to converge on working policies. Our work accelerates learning by partitioning the state-space for simultaneous exploration by multiple agents. We achieve this by using a novel coordinated e-greedy method and implement it via Distributed RL (DiRL) in an EHWSN network. Our simulation results show a four-fold increase in state-space penetration and reduction in time to achieve optimal operation by an order of magnitude (50x). Moreover, we also propose methods to reduce instances of disastrous outcomes associated with learning and exploration. This translates to reducing the downtimes of the nodes in simulations corresponding to a real-world scenario by one thirds.

project image

Adaptive power management in solar energy harvesting sensor node using reinforcement learning


Shaswot Shresthamali, Masaaki Kondo, and Hiroshi Nakamura
International Conference on Embedded Software (EMSOFT 2017), 2017
link / poster / slides / paper /

In this paper, we present an adaptive power manager for solar energy harvesting sensor nodes. We use a simplified model consisting of a solar panel, an ideal battery and a general sensor node with variable duty cycle. Our power manager uses Reinforcement Learning (RL), specifically SARSA(λ) learning, to train itself from historical data. Once trained, we show that our power manager is capable of adapting to changes in weather, climate, device parameters and battery degradation while ensuring near-optimal performance without depleting or overcharging its battery. Our approach uses a simple but novel general reward function and leverages the use of weather forecast data to enhance performance. We show that our method achieves near perfect energy neutral operation (ENO) with less than 6% root mean square deviation from ENO as compared to more than 23% deviation that occur when using other approaches.

project image

適応的電力制御を行う環境発電駆動センサノードの強化学習戦略の比較評価


シュレスタマリ サソット, 近藤 正章 , 中村 宏
研究報告システム・アーキテクチャ(ARC), 2017-ARC-227(28), 1-8, 2017
link / slides /

太陽光発電などの環境発電により駆動するセンサノードでは,バッテリ切れによるノードダウンを防ぎつつ,発電電力に応じてセンシング間隔を調整するなどの電力制御を行う必要がある.本稿では,環境発電駆動センサノードとして,太陽光パネル,バッテリー,汎用のセンサノードデバイスからなる単純なシステムモデルを仮定し,機械学習の一手法である強化学習を用いた適応的な電力管理手法の比較評価を行う.強化学習を用いることで,天候やバッテリの劣化などの環境の変化に対して適応的に電力管理を行うことが可能になると期待される.比較に際しては,SARSA 学習と Q 学習アルゴリズムを用い,また適正度の履歴の有無による強化学習手法を評価する.評価の結果,SARSA (λ) 手法が他の手法に比べて優れた性能を達成できることがわかった.

project image

Adaptive power management in solar energy harvesting sensor node using reinforcement learning


Shaswot Shresthamali, Masaaki Kondo, and Hiroshi Nakamura
54th Design and Automation Conference (DAC 2017), 2017
poster /

In the near future, Internet of Things (IoT) will consist of billions and trillions of nodes. Energy Harvesting Wireless Sensor Nodes (EHWSN) play a critical role in forming a sustainable, maintenance-free network of perpetually communicating autonomous devices for the IoT infrastructure. Energy autonomy (neutrality) of the sensor nodes needs to be ensured for perpetual operation. Here we consider a case of a solar energy harvesting sensor node.

project image

強化学習を用いた環境発電駆動センサノードの適応的電力制御手法の検討


シュレスタマリ サソット, 近藤 正章 , 中村 宏
研究報告組込みシステム(EMB), 2017-EMB-44(26), 1-6, 2017
link / slides /

太陽光発電などの環境発電により駆動するセンサノードでは,バッテリ切れによるノードダウンを防ぎつつ,発電電力に応じてセンシング間隔を調整するなどの電力制御を行う必要がある.本稿では,機械学習の一手法である強化学習を用い,発電電力量や天気予報の情報をもとにセンサノードのセンシング間隔を適応的に制御する手法を提案し,気象データを用いた評価を行う.評価の結果,提案手法は設置場所の変化などの運用環境の変化にも自動的に順応し高い性能を達成できることがわかった.

project image

Implementation of Audio Effect Generator in FPGA


Sujit Chhetri, Bikash Poudel, Sandesh Ghimire, Shaswot Shresthamali, and Dinesh Sharma
Nepal Journal of Science and Technology 15, no. 1 (February 3, 2015): 89-98, 2015
link / video /

This paper describes the theory and implementation of audio effects such as echo, distortion and pitch-shift in Field Programmable Gate Array (FPGA). At first the mathematical formulation for generation of such effects is explained and then the algorithm is described for its implementation in FPGA using Very high speed integrated circuit hardware descriptive language (VHDL). The digital system being designed, which is synthesizable and reconfigurable, offers a great flexibility and scalability in designing and prototyping in FPGAs. The system is divided into three HDL blocks, each for echo, distortion, and pitch-shift effect generation, which are multiplexed in order to share the common ADC and DAC. The audio effect generator designed in this paper was successfully implemented in Spartan-3E FPGA utilizing the resources available effectively. There has been tremendous research being carried out in the field of IP core. Efficient IP cores designed to carry out digital signal processing are implemented in every modern device using configurable logics. This trend hasn’t yet been realized in Nepal. Through the design and implementation of audio effect generator, this paper also aims at bringing the field of IP core development to limelight among scholars of Nepal.

project image

An Introduction to Parallel Processing Using FPGAs



KEC Journal of Science and Engineering (November 2014), 2014
link / paper /

This paper discusses how FPGAs offer true hardware parallelism and deliberates on some areas which benefit from FPGAs’ parallel processing such as DSP (Digital Signal Processing), Data Acquisition and Processing, Text Parsing and Image/Video Processing. To demonstrate the possibility and the consequent advantages of parallel processing, a matrix multiplier was designed for a Spartan-3E FPGA with the help of High Level Synthesis (HLS) tools. Two possible solutions, with and without parallel processing, were obtained which are briefly discussed here.





Design and source code from Leonid Keselman's website