Wednesday, October 23, 2019

My Learning Plan for Getting Into Data Science from Scratch

My decision to get into data science started way back when I was still in college in early 2015. I actually didn’t plan to become a data scientist originally, but a quant — someone who is essentially a financial analyst that uses advanced math and coding in their functions (e.g. risk management and algorithmic trading); however, a 9-month quant internship made me realize that I wanted to apply these skills to a wider context. A few blog post readings later, I concluded that data science was the field for me.
Coming from a background in Applied Economics, I felt my econometrics heavy curriculum already gave me a decent foundation for the math; however, I still had no background in the models used in machine learning (e.g. neural networks, random forests). In addition, I looked through the courses for the rest of my stay in university and found nothing that taught us how to code our own algorithms.
The gap was clearly my lack of knowledge in 1) coding and 2) machine learning models.
For the rest of this blog post, I will go through a short list of online resources I’ve used to fill this gap and attain my dream of being a data scientist.
Let’s get started!

Conceptual learning

Courses and books will give the foundation of your data science skills
Below are the books and courses that I recommend you study to understand how data science works. Take note that the learning resources below are shown in the exact order that I recommend you take them (based on both my experience and feedback from other people). An asterisk (*) indicates that I myself haven’t taken the course but it’s been strongly recommended by other data scientists to be taken at this stage.
  1. Python for Everybody Specialization — This series of courses is great for the absolute beginner who wants to get started. Best course to take in order to get you over your fear of learning how to code.
  2. Machine Learning by Andrew Ng — This course gave me the core foundation of my understanding of different machine learning models. Andrew Ng literally inspired me to pursue a career in machine learning.
  3. Learn Python 3 the Hard Way — This book will create a solid foundation for your python skills (and coding skills in general). I cannot stress enough how great this book is at teaching basic concepts with practical lessons and well designed exercises.
  4. Applied Data Science with Python Specialization— This series of courses is a good way to glue your understanding of machine learning models with your coding skills. I personally know people who were able to get jobs in data science right after this specialization, since by then, they already had a decent toolkit of data science skills that they could use to solve real world problems.
  5. Introduction to Machine Learning for Coders (fast.ai) — This course is taught by Jeremy Howard and he gives a very practical walkthrough on how to do machine learning properly with code. Get ready to learn how to code the random forest algorithm from scratch!
  6. Practical Deep Learning for Coders (fast.ai) — This two part course is the best resource out there for both 1) aspiring data scientists trying to get into deep learning and 2) more experienced data scientists trying to get deeper into what it takes to get state-of-the-art results in deep learning. In the first lesson, Jeremy Howard shows you right away how to get cutting edge accuracy in the ImageNet dataset using the fastai library. In later episodes, you will get more and more used to implementing models directly on PyTorch. Highly recommended!

Practical learning

Real world projects will teach you how to turn your skills into actual value for people
Some would argue that true learning only happens when you are working on a concrete project and solving real world problems with your data science skills. Below are recommended ways to gain experience by applying your knowledge (i.e. learn by doing).
  1. CodeSignal — When I was new to coding, I had a difficult time understanding how my basic skills could be used to solve real world problems. Thankfully, CodeSignal (formerly called CodeFights) had fun coding challenges that allowed me to compete against bots and real people. This made me comfortable with the process of solving problems with code. The website started out as a platform for competitive coding but now focuses on preparing developers for the coding exams during interviews with tech companies.
  2. Kaggle — This is a platform where data scientists come together to 1) share data and code, and 2) compete on training ML algorithms that best reach a target objective (e.g. predict housing prices most accurately). Even if you don’t explicitly compete, I think the biggest value add from Kaggle is the availability of “code solutions” from competitions. Reading the code of other more experienced data scientists is one of the fastest ways to get better because it teaches you best practices while getting you comfortable with reading and writing ML code from scratch yourself.
  3. Passion projects — Even if you don’t have a data science job but want to get into the field, think of a cool project to execute! Identify a problem you want to solve or even something fun you want to do, then create a machine learning model for this. It’s even better if you decide to deploy it as an app accessible on the internet! (e.g. I recently made a joke generator bot since I like jokes and I plan to deploy it publicly soon!)
  4. Internship / full-time job — This one should be obvious. The best way to learn by doing is to get yourself a job in data science. The cold start problem is when companies want you to have data science skills, but how do you get these skills when you don’t have work experience? All the steps enumerated above should equip you with the necessary skills to be immediately useful to a data science team. So get to work!

Conclusion

And that’s how I gained the skills that I possess today and I still have a lot to learn! It has been a long arduous journey but every single piece of effort was worth it. Everyday, I feel so privileged to be working in a profession that is both interesting and impactful at the same time. I am so happy in this profession that I took the time to create this guide so that more people can get into the same field.
As a final note, I will leave you with the message below:
The biggest determinant of whether or not you will succeed in getting into data science is your willingness to 1) learn lessons, 2) persevere through challenges, and 3) take the opportunities that are available.

DevOps Series: Running Oracle Database 19C in a Docker Container

Introduction

In this article, we will learn the prerequisites to install Docker follows with Docker installation and then using Docker container we will see how to run the database on it along with the listener.
In our environment, we have used Linux 7.5 and Docker version as 1.23.1. We just need a very lightweight Linux machine to use the Docker.
First, we are going to install the Docker engine and Docker compose.
i) Install Docker Engine:
sudo yum install docker-engine -y
Before installing Docker Engine, ensure to make changes in your public-yum-ol7.repo file as mentioned by, A Simple Guide to docker installation on Oracle Linux 7.5, Puneeth Prakash ( https://blogs.oracle.com/blogbypuneeth/a-simple-guide-to-docker-installation-on-oracle-linux-75)
ii) Start Docker:
sudo service docker start
docker — version
Docker version 18.09.1-ol, build c3ab8a8
iii) Install Docker-compose:
Docker Compose is a tool that allows you to define and run multi-container Docker applications. Start by downloading the Docker Compose binary into the /usr/local/bin directory using the
# sudo curl -L “https://github.com/docker/compose/releases/download/1.23.1/docker-compose-$(uname -s)-$(uname -m)” -o /usr/local/bin/docker-compose
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 617 0 617 0 0 791 0 — : — : — — : — : — — : — : — 792
100 11.1M 100 11.1M 0 0 842k 0 0:00:13 0:00:13 — : — : — 930k
Once the download is complete, make the binary executable by typing:
# sudo chmod +x /usr/local/bin/docker-compose
You have new mail in /var/spool/mail/root
To verify the installation type the following command to print the Compose version:
# docker-compose — version
docker-compose version 1.23.1, build b02f1306
# service docker status
Redirecting to /bin/systemctl status docker.service
● docker.service — Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/docker.service.d
└─docker-sysconfig.conf
Active: active (running) since Tue 2019–09–10 18:49:30 AEST; 11min ago
Docs: https://docs.docker.com
Main PID: 3706 (dockerd)
Memory: 101.5M
CGroup: /system.slice/docker.service
└─3706 /usr/bin/dockerd — selinux-enabled — insecure-registry 172.30.0.0/16 — storage-driver devicemapper -…
Sep 10 18:49:01 opsdev.localdomain dockerd[3706]: time=”2019–09– —
Sep 10 18:49:30 opsdev.localdomain systemd[1]: Started Docker Application Container Engine.
Hint: Some lines were ellipsized, use -l to show in full.
iv) A new Unix group docker was created during installation. Adding oracle user to Docker group.
# usermod -a -G docker oracle
id oracle
uid=600(oracle) gid=54321(oinstall) groups=54321(oinstall),54322(dba),54323(oper),54324(backupdba),54325(dgdba),54326(kmdba),54330(racdba),54327(asmdba),982(vboxsf),976(docker)
For this article, I am going to pull a docker image with Oracle version 19.3.0 which I already built based on Oracle 19.3.0 Enterprise Edition. I am using that for my testing. Please use as of your licence requirements.
V) Connect to the docker hub
# docker login -u banglamon
Password: xxxxx
Login Succeeded
If you don’t have any account then you can sign up for an account on https://hub.docker.com/.
Vi) Pull image from the docker hub
docker pull banglamon/oracle193db:19.3.0-ee
Trying to pull repository docker.io/banglamon/oracle193db …
19.3.0-ee: Pulling from docker.io/banglamon/oracle193db
ad18637d63f5: Pull complete
4aebb26cf7d5: Pull complete
30325de6313a: Pull complete
17163f94892b: Pull complete
ae28d7c83673: Pull complete
e925f4d276bf: Pull complete
Digest:sha256:2afff2ab1a5ecd7f40767a39398b3e7598bab92ff68793d81699c6ea0008f04f
Status: Downloaded newer image for banglamon/oracle193db:19.3.0-ee
VII) Validate docker image for the Oracle Database
# docker image ls|grep oracle
Figure: docker image
VIII) Setup Oracle database as a container
At this stage, we have a docker image ready and did not set up any container yet. Let’s set up first Oracle 19c Database container named “oracle19db
docker run -d — name 19.3.0-ee -p 1521:1521 -p 8081:8081 -e ORACLE_SID=MORAL -e ORACLE_PDB=MORALPDB -e ORACLE_PWD=Oracle123 -v /u01/app/oracle/oradata:/opt/oracle/oradata banglamon/oracle193db:19.3.0-ee
where,
-d To start a container in detached mode, you use -d=true or just -d option
-p Mapped port 1521 and 8081 on my laptop to the corresponding ports inside the container
-e env list Set environment variables
-v Mapped my local directory (/u01/app/oracle/oradata ) to the default location where the data files will be stored (:/opt/oracle/oradata) to ensure the files are persisted outside my container.
# docker ps -a | grep oracle
4541cab89244 banglamon/oracle193db:19.3.0-ee “/bin/sh -c ‘exec $O…” 2 minutes ago Up 2 minutes (health: starting) 0.0.0.0:1521->1521/tcp, 0.0.0.0:8081->8081/tcp, 5500/tcp 19.3.0-ee
Figure: Container process id
Though it is showing container is up and healthy but it is not ready yet.
IV) Checking log information
As I am running container in background — checking progress log using below command.
Figure: container creation log
X) Database Connection:
Connect to the database container “oracle19db”—
docker exec -it oracle19db bash -c “source /home/oracle/.bashrc; sqlplus /nolog”
SQL*Plus: Release 19.0.0.0.0 — Production on Tue Sep 10 11:05:42 2019
Version 19.3.0.0.0
Copyright © 1982, 2019, Oracle. All rights reserved.
Some testing to the database level
SQL> conn / as sysdba
Connected.
SQL> select name from v$database;
NAME
— — — —
MORAL
SQL> show con_name
CON_NAME
— — — — — — — — — — — — — — —
CDB$ROOT
SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
2 PDB$SEED READ ONLY NO
3 MORALPDB READ WRITE NO
SQL> alter session set container=MORALPDB;
Session altered.
SQL> create table DockerTst (id int,name varchar2(20));
Table created.
SQL> select tablespace_name from dba_tablespaces;
TABLESPACE_NAME
— — — — — — — — — — — — — — —
SYSTEM
SYSAUX
UNDOTBS1
TEMP
USERS
SQL> select name from v$datafile;
NAME
— — — — — — — — — — — — — — — — — — — — — — — — — — -
/u01/app/oracle/oradata/MORAL/system01.dbf
/u01/app/oracle/oradata/MORAL/sysaux01.dbf
/u01/app/oracle/oradata/MORAL/undotbs01.dbf
/u01/app/oracle/oradata/MORAL/pdbseed/system01.dbf
/u01/app/oracle/oradata/MORAL/pdbseed/sysaux01.dbf
/u01/app/oracle/oradata/MORAL/users01.dbf
/u01/app/oracle/oradata/MORAL/pdbseed/undotbs01.dbf
/u01/app/oracle/oradata/MORAL/MORALPDB/system01.dbf
/u01/app/oracle/oradata/MORAL/MORALPDB/sysaux01.dbf
/u01/app/oracle/oradata/MORAL/MORALPDB/undotbs01.dbf
/u01/app/oracle/oradata/MORAL/MORALPDB/users01.dbf
11 rows selected.
SQL> create tablespace testdocker datafile ‘/u01/app/oracle/oradata/MORAL/testdocker01.dbf’ size 100M;
Tablespace created.
Note: Using the same image we can deploy multiple containers with quick provisioning.
XI) To stop container we can execute below command —
docker ps -a | grep oracle
54c750349cbd oracle/database:19.3.0-ee “/bin/sh -c ‘exec $O…” About an hour ago Up About an hour (healthy) 0.0.0.0:1521->1521/tcp, 0.0.0.0:8081->8081/tcp, 5500/tcp oracle19db
By default, Docker process waits for 10 seconds to stop a container and then kills it. We can execute Docker wait with passing time value in seconds as below -
docker stop -t 50 oracle19db
oracle19db
# docker ps -a | grep oracle
54c750349cbd oracle/database:19.3.0-ee “/bin/sh -c ‘exec $O…” About an hour ago Exited (143) 6 seconds ago oracle19db
XII) Play with Docker commands
To start the container again —
docker start oracle19db
oracle19db
To get the container size and consumed OS resources —
To rename a container —
docker rename oracle19db oracle19test
To remove a container , we need use below command —
# docker rm -f 54c750349cbd
Figure: docker remove

Summary

In this article we have seen how to Install docker and how to use it and then we have used Docker container to create the Oracle 19c Database and explored various commands with docker.

Appendix

Currently there is an official docker image (Oracle Database Server 12.2.0.1 Docker Image)is available link of Oracle Docker Container
To download that, you need to create an account and accept the license [ORACLE]
========================================
docker login -u asiandevs@gmail.com -p XXXXXX container-registry.oracle.com
[oracle@oel75 ~]$ docker login -u monowar.mukul@gmail.com -p XXXX container-registry.oracle.com
For this article, tree structure of the docker file before building image was as below —
I download and staged Oracle database software as below path, no need to unzipped —
[root@opsdev 19.3.0]#
cd /u01/app/odbdocker/docker-images-master/OracleDatabase/SingleInstance/dockerfiles/19.3.0
[root@opsdev 19.3.0]# cp /media/sf_software/Software/LINUX.X64_193000_db_home.zip .
[root@opsdev dockerfiles]# ./buildDockerImage.sh -v 19.3.0 -e
dockercontainer.sh
for d in `docker ps | awk ‘{print $1}’ | tail -n +2`; do
d_name=`docker inspect -f {{.Name}} $d`
echo “=========================================================”
echo “$d_name ($d) container size:”
# sudo du -d 2 -h /var/lib/docker/devicemapper | grep `docker inspect -f “{{.Id}}” $d`
echo “$d_name ($d) volumes:”
for mount in `docker inspect -f “{{range .Mounts}} {{.Source}}:{{.Destination}}
{{end}}” $d`; do
size=`echo $mount | cut -d’:’ -f1 | sudo xargs du -d 0 -h`
mnt=`echo $mount | cut -d’:’ -f2`
echo “$size mounted on $mnt”
#echo “=========================================================”
#echo “$d_name ($d) container OS statistics:”
#docker stats $(docker ps — format={{.Names}}) — no-stream
done
done
echo “=========================================================”
echo “$d_name ($d) container OS statistics:”
docker stats $(docker ps — format={{.Names}}) — no-stream

A Simple Guide to docker installation on Oracle Linux 7.5





Below are the steps to install docker using Oracle YUM repository:

Step 1. Set your proxy :
Command : export http_proxy=xxx.xxx.xxx.xxx:80

Command : export https_proxy=xxx.xxx.xxx.xxx:80
Step 2. Take a backup of existing public-yum-ol7.repo :
Command : cd /etc/yum.repos.d/
Optional Command : mv /etc/yum.repos.d/public-yum-ol7.repo /etc/yum.repos.d/public-yum-ol7.repo_org1
Step 3. Download the latest public-yum-ol7.repo from Oracle YUM repository:
Command : wget http://yum.oracle.com/public-yum-ol7.repo
Step 4. Make the following changes in your public-yum-ol7.repo file:
Command : vi public-yum-ol7.repo
[ol7_latest]
name=Oracle Linux $releasever Latest ($basearch)
baseurl=https://yum.oracle.com/repo/OracleLinux/OL7/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
[ol7_UEKR4]
name=Latest Unbreakable Enterprise Kernel Release 4 for Oracle Linux $releasever ($basearch)
baseurl=https://yum.oracle.com/repo/OracleLinux/OL7/UEKR4/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
[ol7_optional_latest]
name=Oracle Linux $releasever Optional Latest ($basearch)
baseurl=https://yum.oracle.com/repo/OracleLinux/OL7/optional/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
[ol7_addons]
name=Oracle Linux $releasever Add ons ($basearch)
baseurl=https://yum.oracle.com/repo/OracleLinux/OL7/addons/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
Step 5. Reboot your machine:
Command : systemctl reboot
Step 6. Install docker using Oracle yum repository:
Command :  yum install docker-engine
NOTE:
Refrence: https://blogs.oracle.com/linux/modularizing-the-oracle-linux-yum-server-repository-configurations-breaking-up-is-hard-to-do
"
Beginning on 18 January 2019:
- the existing repo files (public-yum-ol7.repo and public-yum-ol6.repo) for yum.oracle.com will no longer be updated, in favor of smaller repo files that are more targeted in scope
- running a yum update on an Oracle Linux 6 or Oracle Linux 7 system will automatically install .repo files relevant to your system
- it will be easier to enable specific repositories for Oracle Linux yum server and to keep your yum repository definitions up to date
- to complete the transition from the legacy .repo file you must run the script: /usr/bin/ol_yum_configure.sh after it is installed.
"
Some users have reported that they see the error "no package docker-engine available" at this stage :
Follow the below commands to resolve this issue :
a) Check if the addons repository is enabled.
Command : yum repolist
b) If the addons repository is not enabled then execute the below command :
Command : yum-config-manager --enable *addons
c) Now check if the addons repository is enabled using the command "yum repolist".
d) Try to download/install docker from yum repository again, using the command "yum install docker-engine".
Step 7. Enable docker service:
Command : systemctl enable docker
Step 8. Start docker service:
Command : systemctl start docker
Step 9. Check the status:
Command : systemctl status docker.service
Step 10. Check if docker client and server was installed successfully:
Command : docker version
Step 11. Try to login to Docker hub:
Command : docker login
Run the following commands If you are unable to connect to the Docker hub:
  • Command : mkdir -p /etc/systemd/system/docker.service.d
  • Command : vi /etc/systemd/system/docker.service.d/http-proxy.conf and append the following:
[Service]
Environment="HTTP_PROXY=xxx.xxx.xxx.xxx:80" "HTTPS_PROXY=xxx.xxx.xxx.xxx:80" "NO_PROXY=localhost,127.0.0.1"
  • Command : systemctl daemon-reload
  • Command : systemctl restart docker
  • check if proxy was set properly:
Command : systemctl show --property=Environment docker
  • Try to login again using :
Command : docker login





Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...