siddharth's space

Learn how to use Tmux

Posted on November 16, 2018 by Siddharth Jain

why use it?
https://unix.stackexchange.com/questions/479/keep-ssh-sessions-running-after-disconnection
another good alternative is to use nohup (instead of tmux esp. when you want to copy-paste) but with it remember to delete the log files. see
https://unix.stackexchange.com/questions/45913/is-there-a-way-to-redirect-nohup-output-to-a-log-file-other-than-nohup-out for how to direct the output to a file other than nohup.out

tmux list-sessions
tmux attach -t

in case its not clear, you run tmux on the server (not the client)

scrolling in tmux:
press Ctrl+b and then [
now you should be able to scroll up. Press q to quit scroll mode.
note you have to press Ctrl not Cmd
source: https://superuser.com/a/209608/84951

copy-paste in tmux or rather how to copy from tmux to system clipboard:
for this you have to use iterm2 and Go to iTerm2 > Preferences > “General” tab, and in the “Selection” section, check the box which says “Applications in terminal may access clipboard”. source: http://www.rushiagr.com/blog/2016/06/16/everything-you-need-to-know-about-tmux-copy-pasting/
1. Ctrl+b, [ Enter copy(?) mode like before.
2. Move to start/end of text to highlight.
3. Ctrl + space
4. Move to opposite end of text to copy.
5. On Mac use Esc+w
6. this should copy the text to system clipboard source: https://unix.stackexchange.com/a/58765/108601

ChatGPT:

tmux (Terminal Multiplexer) is a powerful tool that allows you to manage multiple terminal sessions within a single terminal window. It’s especially useful for working on remote servers, multitasking, and organizing your workflow.

Here are the basics you need to know to get started with tmux:

1. Starting a `tmux` Session

To start tmux, simply type:

  tmux

This creates a new tmux session with a single window.

2. Key Concepts

Sessions: A session is the overall tmux instance. You can have multiple sessions running in tmux.
Windows: Within a session, you can create multiple windows. Each window is like a full-screen terminal.
Panes: You can split windows into multiple panes (like tiling windows), allowing you to see and work on multiple terminals at once.

3. Basic Commands

tmux commands are usually invoked by pressing Ctrl+b (the prefix), followed by a key to execute the command. Here are some essential commands:

Session Management:

Start a new session:

  tmux new -s session_name

List active sessions:

  tmux ls

Attach to an existing session:

  tmux attach -t session_name

Detach from the current session: (this keeps the session running in the background) Press Ctrl+b, then d.

Window Management:

Create a new window: Press Ctrl+b, then c.
Switch between windows: Press Ctrl+b, then a number (like 0, 1, etc.) to switch to a specific window. Or use Ctrl+b, then n to move to the next window, or p to move to the previous window.
Rename a window: Press Ctrl+b, then , to rename the current window.
Close a window: Press Ctrl+d (or just type exit in the terminal to close that window).

Pane Management:

Split window horizontally: Press Ctrl+b, then ".
Split window vertically: Press Ctrl+b, then %.
Navigate between panes: Press Ctrl+b, then the arrow keys (or o to cycle between panes).
Resize panes: Press Ctrl+b, then hold the arrow keys to resize the current pane.
Close a pane: Press Ctrl+d (or type exit).

4. Saving and Restoring Sessions

One of the great features of tmux is that you can detach from sessions, leaving them running in the background, and reattach later:

Detach from the current session: Press Ctrl+b, then d.
Reattach to a detached session:

  tmux attach -t session_name

Kill a session:

  tmux kill-session -t session_name

5. Customizing `tmux`

You can customize tmux by editing its configuration file, typically located at ~/.tmux.conf. Common customizations include setting the prefix to something other than Ctrl+b or changing the appearance of the status bar.

Here’s an example of setting Ctrl+a as the new prefix:

# Change prefix to Ctrl+a
unbind C-b
set-option -g prefix C-a
bind C-a send-prefix

After making changes, you can reload the configuration by running:

tmux source-file ~/.tmux.conf

6. Useful Commands

List all tmux key bindings: Press Ctrl+b, then ? to bring up the help screen with all available commands.
Kill a pane or window: Press Ctrl+b, then x to close a pane. Press Ctrl+b, then & to kill a window.

7. Session Persistence

One of the most powerful features of tmux is session persistence. Even if you close your terminal or get disconnected from a remote server, your tmux sessions will continue running in the background. You can reattach later and pick up right where you left off.

Summary of Commands:

Start tmux: tmux
Detach from a session: Ctrl+b then d
Reattach to a session: tmux attach -t session_name
Create a new window: Ctrl+b then c
Split pane vertically: Ctrl+b then %
Split pane horizontally: Ctrl+b then "
Navigate between panes: Ctrl+b then arrow keys

By mastering these basics, you’ll be able to manage multiple terminal sessions efficiently, especially when working with remote servers or complex projects. Let me know if you need more advanced tips!

Posted in Software | Leave a comment

ankits@uber

Posted on November 16, 2018 by Siddharth Jain

This gallery contains 8 photos.

More Galleries | Leave a comment

Best of Blind

Posted on November 15, 2018 by Siddharth Jain

Posted in Career | Leave a comment

What is a blockchain?

Posted on November 14, 2018 by Siddharth Jain

Best paper on blockchain: https://nvlpubs.nist.gov/nistpubs/ir/2018/NIST.IR.8202.pdf

Start by watching this video. Then read these two articles:

and the original bitcoin article. Stop. Do not read more – you will learn more by reading and understanding few high quality papers than reading all the million posts on medium.

A blockchain is an append-only distributed database where each record has a link to the previous record just like Git.

What is the business case for blockchain?

Removes the middleman from transactions thereby reducing cost and time to transact
Very secure

There are 4 key components to a blockchain:

Distributed Ledger: every participant will have a copy of the ledger and there is no master copy. This topology is also known as P2P (peer-to-peer) to emphasize the fact there is no master node. Everyone is an equal.
Cryptographic Chain: This is very similar to Git. All commits are signed by public key and linked together to form a hash chain. This + consensus is what makes the blockchain secure. If someone tries to tamper with the ledger the hashes will change and other nodes will reject the changes. The data structure which enables all of this is known as Merkle tree.
Consensus: Changes to the ledger are audited by other nodes. Different consensus policies can apply. Consensus also covers serializing commits i.e., establishing a total order on the commits (which need not be chronological order). The purpose of consensus is to ensure all peers have the same copy of the ledger – they all see the same truth.
Smart Contract: Ability to embed business rules as executable code built into the blockchain itself without any additional layer of software. This paves the way for “programmable money”. One can embed conditional statements such as if condition X is satisfied money should be transferred e.g., automating real estate contracts without need for escrow.

Blockchain vs. Git – a comparison

Posted on November 10, 2018 by Siddharth Jain

Functional comparison of Blockchain and Git

	Blockchain	Git
Single source of truth	✔	✔
Know who committed a change	✔	✔
Track history of an asset	✔	✔ (asset could be a file)
Consensus	✔	✔ (code review + unit and integration tests)
Distributed Ledger	✔	✔ (the ledger is called repository in this case)

What gives?

In blockchain there is no master ledger whereas in Git there is arguably a master repository even though Git is called a distributed version control system
Blockchain can execute a smart contract when a transaction is submitted whereas Git by itself does not execute anything when a pull request is submitted
Consensus is automated and built into the blockchain. If we think of executing tests in Git as the functional equivalent of consensus in blockchain, then Git by itself does not run any tests. These are run by another layer in CI – Jenkins etc.

Posted in Software | Leave a comment

Modern Physics and Cosmology

Posted on October 7, 2018 by Siddharth Jain

Here is a collection of 10 books on modern physics and cosmology where I took care not to repeat any author – i.e., 10 books by 10 different authors:

Not Even Wrong by Peter Woit
Now: The Physics of Time by Richard M. Muller
The First Three Minutes by Steven Weinberg
Farewell to Reality by Jim Baggott
QED: The Strange Theory of Matter and Light by Richard Feynman
The Inflationary Universe by Alan H. Guth
Endless Universe by Paul Steinhardt and Neil Turok
The Black Hole War by Leonard Susskind
Lost In Math by Sabine Hossenfelder
A Brief History of Time by Stephen Hawking

result

I think my favorite book has to be Farewell to Reality. I also liked QED by Feynman and Now by Prof. Muller a lot. I have a confession to make. I don’t believe current physics theories on cosmology and particle physics. I am a proud denier of Big Bang Theory, Inflation and Multiverse. Its not possible that all the matter in the universe today could exist in an infinitesimally small point at temperature of 10^22 degrees and that its size doubled every 10^(-37) seconds. I don’t believe in String Theory or the world as a Hologram or the Endless Universe or the Mathematical Universe. I think legitimate physics ended somewhere in the 70s. At one point I would get very excited when I read things like “Not only is the universe stranger than we think, it is stranger than what we can think” but now I am more disillusioned and really feel that maybe the universe is much simpler than what current scientists would like us to believe. Now I understand why theoretical physicists are also known as crackpots. Why am I obsessed with modern physics and why do I keep reading these books then? Because they give all the stimulation of taking LSD without any harmful side effects. And perhaps that explains why such theories are postulated in the first place.

Posted in Science | Leave a comment

Notes on installing Hyperledger Fabric

Posted on September 25, 2018 by Siddharth Jain

ran into several issues while trying to install hyperldger fabric on a fresh mac following the instructions at https://hyperledger-fabric.readthedocs.io/en/release-1.2/install.html. writing this post to remember how to fix the errors in future.


$ curl -sSL http://bit.ly/2ysbOFE | bash -s 1.2.0

bash: line 181: [: too many arguments

Installing hyperledger/fabric-samples repo

===&gt; Cloning hyperledger/fabric-samples repo and checkout v1.2.0

xcode-select: note: <b>no developer tools were found at '/Applications/Xcode.app', requesting install. Choose an option in the dialog to download the command line developer tools.</b>

Installing Hyperledger Fabric binaries

===&gt; Downloading version 1.2.0 platform specific fabric binaries

===&gt; Downloading:<span class="Apple-converted-space">  </span>https://nexus.hyperledger.org/content/repositories/releases/org/hyperledger/fabric/hyperledger-fabric/darwin-amd64-1.2.0/hyperledger-fabric-darwin-amd64-1.2.0.tar.gz

<span class="Apple-converted-space">  </span>% Total<span class="Apple-converted-space">    </span>% Received % Xferd<span class="Apple-converted-space">  </span>Average Speed <span class="Apple-converted-space">  </span>Time<span class="Apple-converted-space">    </span>Time <span class="Apple-converted-space">    </span>Time<span class="Apple-converted-space">  </span>Current

<span class="Apple-converted-space">                                 </span>Dload<span class="Apple-converted-space">  </span>Upload <span class="Apple-converted-space">  </span>Total <span class="Apple-converted-space">  </span>Spent<span class="Apple-converted-space">    </span>Left<span class="Apple-converted-space">  </span>Speed

100 29.7M<span class="Apple-converted-space">  </span>100 29.7M<span class="Apple-converted-space">    </span>0 <span class="Apple-converted-space">    </span>0<span class="Apple-converted-space">  </span>6695k<span class="Apple-converted-space">      </span>0<span class="Apple-converted-space">  </span>0:00:04<span class="Apple-converted-space">  </span>0:00:04 --:--:-- 7282k

==&gt; Done.

===&gt; Downloading version 1.2.0 platform specific fabric-ca-client binary

===&gt; Downloading:<span class="Apple-converted-space">  </span>https://nexus.hyperledger.org/content/repositories/releases/org/hyperledger/fabric-ca/hyperledger-fabric-ca/darwin-amd64-1.2.0/hyperledger-fabric-ca-darwin-amd64-1.2.0.tar.gz

<span class="Apple-converted-space">  </span>% Total<span class="Apple-converted-space">    </span>% Received % Xferd<span class="Apple-converted-space">  </span>Average Speed <span class="Apple-converted-space">  </span>Time<span class="Apple-converted-space">    </span>Time <span class="Apple-converted-space">    </span>Time<span class="Apple-converted-space">  </span>Current

<span class="Apple-converted-space">                                 </span>Dload<span class="Apple-converted-space">  </span>Upload <span class="Apple-converted-space">  </span>Total <span class="Apple-converted-space">  </span>Spent<span class="Apple-converted-space">    </span>Left<span class="Apple-converted-space">  </span>Speed

100 3864k<span class="Apple-converted-space">  </span>100 3864k<span class="Apple-converted-space">    </span>0 <span class="Apple-converted-space">    </span>0<span class="Apple-converted-space">  </span>2309k<span class="Apple-converted-space">      </span>0<span class="Apple-converted-space">  </span>0:00:01<span class="Apple-converted-space">  </span>0:00:01 --:--:-- 2310k

==&gt; Done.

Installing Hyperledger Fabric docker images

===&gt; Pulling fabric Images

==&gt; FABRIC IMAGE: peer

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-peer/manifests/1.2.0: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-peer:1.2.0

==&gt; FABRIC IMAGE: orderer

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-orderer/manifests/1.2.0: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-orderer:1.2.0

==&gt; FABRIC IMAGE: ccenv

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-ccenv/manifests/1.2.0: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-ccenv:1.2.0

==&gt; FABRIC IMAGE: tools

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-tools/manifests/1.2.0: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-tools:1.2.0

===&gt; Pulling fabric ca Image

==&gt; FABRIC CA IMAGE

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-ca/manifests/1.2.0: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-ca:1.2.0

===&gt; Pulling thirdparty docker images

==&gt; THIRDPARTY DOCKER IMAGE: couchdb

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-couchdb/manifests/0.4.10: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-couchdb:0.4.10

==&gt; THIRDPARTY DOCKER IMAGE: kafka

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-kafka/manifests/0.4.10: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-kafka:0.4.10

==&gt; THIRDPARTY DOCKER IMAGE: zookeeper

Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-zookeeper/manifests/0.4.10: unauthorized: incorrect username or password

Error response from daemon: No such image: hyperledger/fabric-zookeeper:0.4.10

===&gt; List out hyperledger docker images

There are two issues in above:

git is not installed on the machine and that is what below error is about
xcode-select: note: no developer tools were found at ‘/Applications/Xcode.app’, requesting install. Choose an option in the dialog to download the command line developer tools.

You don’t have to install XCode to get git. You could try running
```
xcode-select --install
```
to install xcode CLI tools (git comes as part of it) but changes are you get this error https://apple.stackexchange.com/questions/309461/unable-to-run-xcode-select-install/309469#309469. If so just follow the steps described in the answer: https://apple.stackexchange.com/a/309469/139331
The second issue is this error: Error response from daemon: Get https://registry-1.docker.io/v2/hyperledger/fabric-zookeeper/manifests/0.4.10: unauthorized: incorrect username or password. To fix this you need to login with a docker username before running the hyperledger install script. Log in to docker by running
```
docker login
```
The username will be the username without the email suffix.

Posted in Software | Leave a comment

To GET or not to GET

Posted on August 29, 2018 by Siddharth Jain

This post is an analysis of the Washington State GET (Guaranteed Education Tuition) program. The main question we want to answer is whether to invest in GET or Vanguard 529 or both.

GET Resources:

https://www.get.wa.gov/sites/default/files/documents/2017-18-GET-Enrollment-Guide.pdf https://www.get.wa.gov/sites/default/files/documents/Program%20Details.pdf

Click to access Tuition-Payout-Over-Time-Chart-2018.pdf

Refer to docs above if you are wondering from where I am getting some of the numbers in the analysis below:

600 units will cost 600*113 = $67,800 if you were to buy them lump sum today. Spread this cost over 18 years or 216 months and you get 67800/216 = $314 per month. But because of the finance charge and other fees you lock in at $570 per month effectively resulting in a “premium” of 82%. This does not sound like a good deal.

If we assume 5.75% increase in tuition every year, then 18 years from now, the payout of 600 units will be (1.0575^18)*106.01*600=$173,998. The payout of 1 GET unit today is $106.01 source: https://www.get.wa.gov/pricepayoutfees

compare to the amount you put in 570*18*12 = $123,120. So the bet is that you can save $50,878 if you start contributing to GET now @$570/month.

Or you could even purchase all 600 units today @$67,800. If you do that, then the money will effectively earn a ROI of 5% as shown below:

>> math.exp(math.log(173998.0/67850)/18)
1.0537

In above I added $50 enrollment fee to $67,800

5% ROI is not that great but guaranteed assuming tuition rises 5.75% per year. If tuition increases by more than 5.75% ROI will be even more. E.g., if we assume 6% increase in tuition per annum, that gives an ROI of 5.6% and the value becomes $181,553 at end of 18 years.

It can be shown (see https://www.key.com/personal/calculators/annual-rate-of-return-calculator.jsp) that if you invest $475 per month in vanguard and assume 6% annual rate of return then after 18 years that money will grow into $181,000.

Conclusion:

If one were to invest a fixed amount monthly then vanguard 529 is definitely better than investing in GET since investing $475 per month in vanguard has same effect as investing $570 per month in GET per above. The reason why GET is so expensive is because of their 7.5% finance charge. When you setup monthly payments to purchase 600 units over 18 years what happens is that GET gives you a loan and you have to pay 7.5% interest on that loan – its the same as buying a house. Using the calculator at https://www.bankrate.com/calculators/mortgages/loan-calculator.aspx

Unfortunately GET has no mechanism to run a credit on the applicant and set the finance charge based on applicant’s creditworthiness. This is the reason for taking one star out of its rating at savingforcollege.com. Otherwise its a 5 star program equivalent to vanguard 529.
But if one wants to make a single lump sum contribution then both options (vanguard vs GET) are comparable since in both cases I expect the money to earn a ROI of approx 6%. And so in this case the decision of choosing vanguard vs GET should be based on customer service – things like how easy it will be to take out distributions from the two accounts etc. Note that if you are willing to bet that you can earn a higher ROI on the money you put in vanguard than the expected rise in cost of tuition, then you should invest in vanguard 529 instead (and vice-versa) i.e. let
$e = V - G$
where
$V =$ your expected rate of return if you put money in vanguard
$G =$ the rate at which you expect tuition cost to increase in state of Washington
Go with Vanguard if you think $e > 0$ and vice-versa. For context, the annualized rate of return for VTMFX if we look back at last 10 years is 7.5%

However as with mutual funds, past performance is not an indicator of future success and actual returns will vary.

Historical data shows tuition is rising @6% per annum in Washington state. source: https://www.get.wa.gov/sites/default/files/documents/Tuition-Payout-Over-Time-Chart-2018.pdf

>> math.exp(math.log(10601.0/3641)/18)
1.0611695668730086

Historical data shows University of California tuition is increasing @7.3% annually
>>> math.exp(math.log(14419.0/4034)/18)
1.0733300576326157
source: http://budget.ucdavis.edu/studentfees/historical/archive.html

So ultimately GET and Vanguard 529 come in at two different angles. With GET you want Washington state college tuition to rise in future to maximize your ROI whereas with Vanguard its the opposite. Vanguard 529 is a savings plan whereas GET is a prepaid tuition program – it essentially allows you to pay for future education at today’s price.

Further Reading:
https://www.bogleheads.org/forum/viewtopic.php?t=229180
https://www.savingforcollege.com/529-plans/washington/guaranteed-education-tuition-get?821=t

Should I rollover my GET units to Washington’s new 529 Dream Ahead Investment Plan?

Posted in Money | Leave a comment

Tips on Decision Making

Posted on September 23, 2017 by Siddharth Jain

When faced with a difficult decision, making a detailed list of pros and cons between the two choices is a perfect recipe for disaster and prolonged indecisiveness. Scoring the individual attributes in hopes of making a decision based on a total score is even worse. What is the key to effective decision making then? Need to focus on what is the most important factor (what is the single thing that matters the most i.e., Math.Max) and make a decision based on that.

I don’t remember it now, but I have read the same thing echoed in one of the business books I read – maybe it was Ben Horowitz’s The Hard Thing About Hard Things

Posted in Career | Leave a comment

How to choose spark settings?

Posted on June 10, 2017 by Siddharth Jain

Spark comes with a plethora of settings. Good luck trying to make sense of them and setting their values. There is ton of documentation but not one good page describing in step by step form how to choose the values of important parameters like:

number of executors
number of cores per executor
executor memory
executor memory overhead
spark.default.parallelism
spark.sql.shuffle.partitions
spark.memory.storageFraction
spark.memory.fraction
spark dynamicAllocation

Spark processes a job in stages. Your goal as a developer is to minimize the # of stages since data needs to be shuffled between stages and that is an expensive operation; also the stages cannot be executed in parallel – one stage has to complete before the next one can be executed. Within a stage, spark processes data in tasks. Thus N tasks need to be completed to complete a stage. Tasks can be executed in parallel. A task is processed by a thread. And a task processes a partition of the data. Executor is a JVM instance (i.e., its a process) that lives for the lifetime of the spark job and executes tasks. Just as multiple threads live in a process and share the process memory, so multiple tasks can live in a executor and share the executor memory. Task has a 1:1 relationship with a thread (a task is executed in a thread) and thread has 1:1 relationship with CPU core (a thread consumes one CPU core when running)

In general, it is recommend to use 2-3 tasks per CPU core in the cluster [link]. we can start with the total number of virtual cores available in the cluster. Lets say that number is 500 and we want to use 80% of the cluster’s capacity. This means we want to use total 0.8*500 = 400 VCores. That settles one number. So total # of tasks = pick a number between 2 and 3 * 400 = 2.5 * 400 = 1000. That settles another number. Since a task has a 1:1 relationship with a partition, that means spark.default.parallelism and spark.sql.shuffle.partitions should be set to 1000.

Sandy Ryza notes that

I’ve noticed that the HDFS client has trouble with tons of concurrent threads. A rough guess is that at most five tasks per executor can achieve full write throughput, so it’s good to keep the number of cores per executor below that number.
Running tiny executors (with a single core and just enough memory needed to run a single task, for example) throws away the benefits that come from running multiple tasks in a single JVM. For example, broadcast variables need to be replicated once on each executor, so many small executors will result in many more copies of the data.

So lets set number of cores per executor = 5 based on above notes (a constant which does not need to be changed based on cluster properties etc.) Any number between 3-5 should be fine in general.

as we have calculated total # of cores and # of cores per executor, we can now calculate the # of executors as 400 / 5 = 80.

the executor memory can be set equal to the max. executor memory provided by the administrator minus the overhead. the default non-heap overhead in spark is 384MB. One might want to bump this to 1GB if one encounters the “container killed by YARN for exceeding memory limits” error.

If you are using Spark SQL, you might sometimes have to forcefully repartition the dataframe into the # of partitions calculated above. Be careful because in our case spark would not repartition the data even when we asked it to by making a call to repartition. To force it to repartition the data, we followed the call to repartition with persist. When spark has to shuffle data between stages it will use the # of partitions set by spark.default.parallelism or spark.sql.shuffle.partitions, but during the first stage it does not automatically use the # of partitions set by spark.default.parallelism or spark.sql.shuffle.partitions. And if the # of partitions is only 20 whereas 400 VCores were allocated in spark-submit, only 20 tasks will be active at a time and 380 VCores will be idle. Don’t let this happen! The executors tab in the spark UI shows how many tasks are active at a given point of time. make sure the # of tasks here are equal to the # of vcores – otherwise there are CPUs sittle idle.

If data is not being cached or persisted, then the storage memory fraction can be set to 0. in our experience we found spark dynamic allocation is more of a hassle. our jobs would sometimes get stuck when using dynamic allocation. moreover, the time to process becomes more unpredictable. static allocation translates to a predictable time to process.

So here we are again:

number of cores per executor = 4 (average of 3 and 5)
number of executors = # of cores in cluster * load factor / number of cores per executor
executor memory = can set it to max. available – the overhead
executor memory overhead = default is 384MB. bump it to 1GB if getting “container killed by YARN for exceeding memory limits”
spark.default.parallelism = 2.5 * # of cores in cluster * load factor
spark.sql.shuffle.partitions = 2.5 * # of cores in cluster * load factor
spark.memory.storageFraction = 0 if not doing any caching or persisting
spark.memory.fraction
spark dynamicAllocation = false (gave us more trouble than the promised benefit)

More Notes:

Do use kryo serialization when running spark jobs

siddharth's space

Learn how to use Tmux

1. Starting a `tmux` Session

2. Key Concepts

3. Basic Commands

Session Management:

Window Management:

Pane Management:

4. Saving and Restoring Sessions

5. Customizing `tmux`

6. Useful Commands

7. Session Persistence

Summary of Commands:

ankits@uber

Best of Blind

What is a blockchain?

Blockchain vs. Git – a comparison

Modern Physics and Cosmology

Notes on installing Hyperledger Fabric

To GET or not to GET

Tips on Decision Making

How to choose spark settings?

Recent Posts

Categories

Archives

1. Starting a tmux Session

2. Key Concepts

3. Basic Commands

Session Management:

Window Management:

Pane Management:

4. Saving and Restoring Sessions

5. Customizing tmux

6. Useful Commands

7. Session Persistence

Summary of Commands:

Recent Posts

Categories

Archives

1. Starting a `tmux` Session

5. Customizing `tmux`